我試圖將分數定義為任何給定兩個串列中每組單詞的交集/聯合。我知道聯合和交叉點僅適用于集合型別的容器,我一直在努力設定正確但無法正確設定,有人可以幫忙嗎?
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tags = ["a","b","c"]
dct_keys = {
"a":1,
"b":2,
"c":3
}
corpus_tags = dict(zip(tags,corpus))
from itertools import combinations
my_keys = list(combinations(tags, 2))
goal_dct = {}
for i in range(len(my_keys)):
goal_dct[(my_keys[i])] = {"id_alpha":(dct_keys[my_keys[i][0]]),
"id_beta" :(dct_keys[my_keys[i][1]]),
"socore" : (len(set1&set3))/(len(set1|set3))} # THIS IS WHAT I WAS TRYING TO ACHIEVE HERE
print(goal_dct)
這就是我試圖定義為分數,以設定示例:
set1 = {"i","did","not","like","the","service"}
set2 = {"the","service","was","ok"}
set3 = {"i","was","ignored","when","i","asked","for","service"}
(len(set1&set3))/(len(set1|set3))
uj5u.com熱心網友回復:
這不會像您認為的那樣做:
(len(set1)&len(set3))/(len(set1)|len(set3))
len回傳一個int. 您可以在整數上使用&and|運算子,但它執行按位運算,這不是您要尋找的。相反,您想在sets上使用這些運算子,然后采用len這些結果集中的 :
len(set1 & set3)/len(set1 | set3)
因此,為任意兩個字串(句子)串列生成分數的函式如下所示:
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
您可以使用它來為以下所有組合建立分數corpus:
from itertools import combinations
from string import ascii_lowercase
corpus = [
["i","did","not","like","the","service"],
["the","service","was","ok"],
["i","was","ignored","when","i","asked","for","service"]
]
tagged_corpus = dict(zip(ascii_lowercase, corpus))
def score(s1: list[str], s2: list[str]) -> float:
set1, set2 = set(s1), set(s2)
return len(set1 & set2) / len(set1 | set2)
goal = {
(a, b): score(tagged_corpus[a], tagged_corpus[b])
for a, b in combinations(tagged_corpus, 2)
}
print(goal)
# ('a', 'b'): 0.25,
# ('a', 'c'): 0.18181818181818182,
# ('b', 'c'): 0.2222222222222222}
uj5u.com熱心網友回復:
從您的串列中制作套裝。
set1 = set(some_list)
set2 = set(other_list)
common_items = set1.intersection(set2)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/447416.html
上一篇:如何使用具有不同結尾的鏈接遍歷rselenium中的不同頁面
下一篇:Python-修改for回圈變數
