python正則運算式的變數在包含特定字符時不起作用-有解無憂

我正在使用包含一些藥物的資料框，我想從產品描述中提取的完整句子中提取劑量。每種活性物質 (DCI) 都有一個劑量，以串列形式提供。每個 DCI 的劑量通常在其名稱之后description。

我正在使用：

teste=[]
for x in listofdci:
   teste2 = [f"{x}{y}" for x,y in re.findall(rf"(?:{x})\s*(\d (?:[.,]\d )*)\s*(g|mg|)",strength)]
   teste.extend(teste2)

()除了變數包含or的情況外，它運行良好，例如：

listofdci = [' Acid. L( )-lacticum D4']
description = ' Acid. L( )-lacticum D4 250 mg'
#error: nothing to repeat

#

listofdci = ['Zinkoxid', '( /–)-α-Bisabolol', 'Lebertran (Typ A)', 'Retinol (Vitamin A)', 'Colecalciferol (Vitamin D3)']
description = 'Zinkoxid 13 g, ( /–)-α-Bisabolol 0,026 g (eingesetzt als Dragosantol-Zubereitung), Lebertran (Typ A) 5,2 g, Retinol (Vitamin A) 24,5 mg (entspr. 41 600 I.E. Retinolpalmitat [enth. Butylhydroxyanisol, Butylhydroxytoluol]), Colecalciferol (Vitamin D3) 10,4 mg (entspr. 10 400 I.E. mittelkettige Triglyceride [enth. all-rac-α-Tocopherol])'
#error: nothing to repeat
#Here he collects the first dosage -> ['13g'] and then outputs the error

#

listofdci = [' Efeubl?tter-Trockenextrakt']
description = ' Efeubl?tter-Trockenextrakt (5-7,5:1) 65 mg - Auszugsmittel: Ethanol 30% (m/m)'
#[]
#here it outputs an empty list

理想情況下，我想要：

listofdci = [' Acid. L( )-lacticum D4']
description = ' Acid. L( )-lacticum D4 250 mg'
#['250mg']

#

listofdci = ['Zinkoxid', '( /–)-α-Bisabolol', 'Lebertran (Typ A)', 'Retinol (Vitamin A)', 'Colecalciferol (Vitamin D3)']
description = 'Zinkoxid 13 g, ( /–)-α-Bisabolol 0,026 g (eingesetzt als Dragosantol-Zubereitung), Lebertran (Typ A) 5,2 g, Retinol (Vitamin A) 24,5 mg (entspr. 41 600 I.E. Retinolpalmitat [enth. Butylhydroxyanisol, Butylhydroxytoluol]), Colecalciferol (Vitamin D3) 10,4 mg (entspr. 10 400 I.E. mittelkettige Triglyceride [enth. all-rac-α-Tocopherol])'
#['13g','0,026','5,2g','24,5','10,4']

#

listofdci = [' Efeubl?tter-Trockenextrakt']
description = ' Efeubl?tter-Trockenextrakt (5-7,5:1) 65 mg - Auszugsmittel: Ethanol 30% (m/m)'
#[65mg]

我不知道如何躲避這個特定問題，除了可能從資料集中洗掉每個()或。另外，因為這些字符可以出現在字串的每個部分，我認為我不能使用集合來識別它們：'[]'

uj5u.com熱心網友回復：

如果關鍵字和數字之間的括號內可以有一個可選的子字串，則可以使用

teste=[]
for x in listofdci:
    test2 = [f"{x}{y}" for x,y in re.findall(rf"{re.escape(x)}(?:\s*\([^()]*\))?\s*(\d (?:[.,]\d )*)\s*(m?g\b|)", description)]
    if test2:
        teste.extend(test2)

請參閱Python 演示。

詳情：

{re.escape(x)}- 轉義關鍵字
(?:\s*\([^()]*\))?- 零個或多個空格、(零個或多個字符的可選序列，除了(and)然后是)
\s*- 零個或多個空格
(\d (?:[.,]\d )*)- 一個或多個數字，然后是零個或多個./序列,和一個或多個數字
\s*- 零個或多個空格
(m?g\b|)- m，mg作為整個單詞或空字串。

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/478129.html

標籤：Python 正则表达式细绳

上一篇：如何在熊貓資料框中搜索字串并與另一個匹配？

下一篇：如何識別在點“。”之前有空格的句子并洗掉這個空間