我有一個字典,其中包含項集的鍵和它們的計數值。我想計算項集在資料框中出現的次數(作為完全匹配)。資料框有 ~10k 行
第一項集的字典(dict_of_items):
{'apple','banana','pear'}: 0,
{'banana', 'orange', 'squash'}: 0
第二項集(df)的資料框:
Index | basket
1 | ['apple','banana',pear']
2 | ['banana']
3 | ['banana', 'orange','squash']
4 | ['apple','banana',pear']
...
所需的輸出(其中字典的值是實際計數):
{'apple','banana','pear'}: 2,
{'banana', 'orange', 'squash'}: 1
我已經嘗試過 .iterrows(),但值仍然為 0,例如:
for item in dict_of_items:
if item in df['basket']:
dict_of_item[item] = 1
uj5u.com熱心網友回復:
已發布解決方案的問題:
- 字典不能包含集合作為鍵,因為集合不可散列(使用frozenset)
if item in df['basket']:不起作用,因為籃子包含串列并且專案是一個集合。
代碼
import pandas as pd
from collections import Counter
# Initialization
dict_of_item = {
frozenset({'apple','banana','pear'}): 0,
frozenset({'banana', 'orange', 'squash'}): 0}
data = {'basket': [['apple','banana', 'pear'],
['banana'],
['banana', 'orange','squash'],
['apple','banana', 'pear']]}
df = pd.DataFrame(data)
# Processing
# Get count of sets in basket by convert each list to a frozen set and counting each frozen set appears in column basket.
basket_set_count = Counter(df['basket'].apply(frozenset))
# Find intersection of keys in basket_set_count and dictionary of keys
# Use the count from basket_set_count as the number of elements
result = {k:basket_set_count[k] for k in set(basket_set_count.keys()) & set(dict_of_item.keys())}
print(result)
# Output: {frozenset({'pear', 'banana', 'apple'}): 2,
frozenset({'orange', 'squash', 'banana'}): 1}
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/375714.html
