我想??? ??? ????? ??? ???? ???從段落中找到一個像 ( ) 這樣的子字串,但是段落行與子字串行不完全相同,所以如果從段落行中匹配兩個以上的單詞,則將該行作為匹配行
fullstringlist =(" ???? ?? ??? ??? ??? ????? ?? ??- ???? ?? ?? ??????? ??? ????? ?? ??- ??? ??? ??? ????? ???? ??? ")
test_list = fullstringlist.split("-")
print("The original list is : " str(test_list))
subs_list = ['???? ??? ??? ??? ????? ?? ??','??? ??? ????? ??? ???? ???']
res = []
for sub in test_list:
flag = 0
for ele in subs_list:
# checking for non existence of
# any string
if ele not in sub:
flag = 1
break
if flag == 0:
res.append(sub)
# printing result
print("The extracted values : " str(res))
uj5u.com熱心網友回復:
您可以使用Threshold變數來實作,該變數指示每個子字串中單詞數的一半加一個。
例子:
???? ??? ??? ??? ????? ?? ??包含 7 個單詞,因此它的閾值約為 5 個單詞,如果我們找到 5 個匹配單詞或更多,我們將認為它是一個匹配子串
fullstringlist = " ???? ?? ??? ??? ??? ????? ?? ??- ???? ?? ?? ??????? ??? ????? ?? ??- ??? ??? ??? ????? ???? ??? "
subs_list = ['???? ??? ??? ??? ????? ?? ??','??? ??? ????? ??? ???? ???', '????? ???? ??? ????? ??? ?????????? ']
def find_matches(full_str, sub_list):
matches = []
for str in sub_list:
n_words = 0
threshold = round(len(str.split()) / 2) 1
words = str.split()
for word in words:
if full_str.find(word) != -1:
n_words = 1
if n_words >= threshold:
matches.append(str)
return matches
print(find_matches(fullstringlist, subs_list))
輸出:
['???? ??? ??? ??? ????? ?? ??', '??? ??? ????? ??? ???? ???']
注意:您可以根據需要更改閾值計算方法。
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/533920.html
上一篇:如何將第一個字符轉換為小寫
下一篇:sql中的求和字串
