我在python中有以下資料框,
Text = provide written informed consent healthy male or female age between 31 to 59 years fluent in german language
它需要查找年齡并在該單詞之前和之后添加5個詞匯。目標值 = 年齡我想要的輸出:
result = healthy male or female age between 31 to 59 years
我的代碼:
Text = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
r1 = re.search(r"(?:[a-zA-Z'-] [^a-zA-Z'-] ){0,3} age (?:[^a-zA-Z'-] [a-zA-Z'-] ){0,3}", text)
r1.group()
我的結果是
age 16 years old
我的資料有一些詞,如管理或代理,應該被忽略。
謝謝
uj5u.com熱心網友回復:
一種不使用正則運算式的方法可能是將文本拆分為單詞并檢索age單詞串列中的位置。
Text = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
Text = Text.split()
result = Text[Text.index("age") - 4:Text.index("age") 5]
print(result) # ['healthy', 'male', 'or', 'female', 'age', 'between', '31', 'to', '59']
uj5u.com熱心網友回復:
"age"如果給定令牌(例如)多次出現,則將找到所有出現的另一種解決方案。此外,即使它之前或之后沒有 5 個標記,它也會保持匹配,因為字串不包含那么多標記。從您的問題中不清楚在這種情況下應該發生什么,并且其他人排除了我認為我會提供一個解決方案來包括這個以防萬一。
# one age
text1 = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
# two age
text2 = "provide written informed consent another age appeared in healthy male or female age between 31 to 59 years fluent in german language"
# age at beginning
text3 = "age at beginning provide written informed consent healthy male or female age between 31 to 59 years fluent in german language"
# age at end
text4 = "provide written informed consent healthy male or female age between 31 to 59 years fluent in german language age at the end"
def find_token(text, sep, search_token, l_from_token, r_from_token):
"""
Find all occurrences of a token in a string with specified amount of tokens before and after token.
:param text: text to search for given token
:param sep: separator
:param search_token: token to search for
:param l_from_token: how many tokens left from token to include in result
:param r_from_token: how many tokens right from token to include in result
:return: all occurrences of token in text with specified amount of tokens before and after token
"""
tokens = text.split(sep)
matches = list()
for i, token in enumerate(tokens):
if token != search_token:
continue
start = max(i - l_from_token, 0)
end = min(i r_from_token 1, len(tokens))
matches.append(sep.join(tokens[start:end]))
return matches
search_for = "age"
separator = " "
left_from_token = 4
right_from_token = 5
print(find_token(text1, separator, search_for, left_from_token, right_from_token))
print(find_token(text2, separator, search_for, left_from_token, right_from_token))
print(find_token(text3, separator, search_for, left_from_token, right_from_token))
print(find_token(text4, separator, search_for, left_from_token, right_from_token))
預期輸出:
['healthy male or female age between 31 to 59 years']
['written informed consent another age appeared in healthy male or', 'healthy male or female age between 31 to 59 years']
['age at beginning provide written informed', 'healthy male or female age between 31 to 59 years']
['healthy male or female age between 31 to 59 years', 'fluent in german language age at the end']
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/453894.html
上一篇:以類似樞軸的樣式重新格式化資料框
下一篇:如何使用函式按一列按兩列排序
