我正在嘗試遍歷在 pandas 資料框列中找到的串列,并回傳與新資料框中其他行中包含的串列匹配三倍以上的結果。
以下是資料的外觀:

期望的輸出:

(輸出是因為在至少其他三行的串列中找到了這些特定關鍵字)。
最小可重現示例:
import pandas as pd
# initialize data of lists.
data = {'url': ["www.bbc.co.uk", "www.cabinzero.com", "www.cntraveller.com", "www.forbes.com", "www.gov.scot", "www.gov.uk", "www.ons.gov.uk"],
'keyword': ["['amber travel list', 'travel amber list', 'amber list countries uk travel', 'travel amber list countries', 'amber list countries travel']", "['amber list countries uk travel', 'travel amber list countries', 'amber travel list', 'travel amber list', 'amber list countries travel']", "['travel amber list', 'amber list countries uk travel', 'amber travel list', 'amber list countries travel', 'travel amber list countries']", "['amber travel list', 'travel amber list countries', 'travel amber list', 'amber list countries travel', 'amber list countries uk travel']", "['amber list countries travel', 'travel amber list countries', 'amber list countries uk travel', 'travel amber list', 'amber travel list']", "['amber list countries travel', 'amber list countries uk travel', 'amber travel list']", "['amber list countries uk travel', 'amber travel list', 'travel amber list countries', 'amber list countries travel']"]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
print(df)
我已經嘗試 過將串列列轉儲到單個串列并迭代以計算出現次數,但無法使其作業并且不確定這是否是最佳方法。
uj5u.com熱心網友回復:
如果在同一個串列中每個關鍵字都是唯一的,那么您可以:
from itertools import chain
listed_keywords = df.keyword.apply(lambda x: eval(x)).values # returns array of list
all_keywords = list(chain.from_iterable(listed_keywords)) # Concat all the lists into 1 global list of keywords
unique_keyword, nunique_keyword = np.unique(all_keywords, return_counts = True)# Return unique keywords and their respective frequency among all the keywords
df_keywords = pd.DataFrame(dict(keyword = unique_keyword, frequency = nunique_keyword)) # Create a DataFrame so you can easily filter according to keyword frequency.
希望這回答了你的問題!
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/484949.html
