我正在嘗試以任何順序在另一列中找到的 pandas 資料框列中的單詞(字串)計數。
我已經嘗試了以下,這是接近的,但它不計算出現次數(它只告訴我是否以任何順序找到了單詞)。
words='|'.join(df['Cluster Name'].unique())
df['frequency']=df['Keyword'].str.contains(words).astype(int)
最小可重現示例:
data = {'Keyword' : ['Nike', 'Nike Socks', 'Nike Stripy Socks', 'Socks Nike', 'Adidas Socks'],
'Cluster' : ['Nike Socks', 'Nike Socks', 'Nike Socks', 'Nike Socks', 'Nike Socks']}
# Create DataFrame
df = pd.DataFrame(data)
預期產出
Keyword Cluster Frequency
0 Nike Nike Socks 1
1 Nike Socks Nike Socks 2
2 Nike Stripy Socks Nike Socks 2
3 Socks Nike Nike Socks 2
4 Adidas Socks Nike Socks 1
uj5u.com熱心網友回復:
您可以創建一個自定義函式,該函式將一行作為輸入,然后apply使用引數將其逐行傳遞給資料框axis=1:
def count_keywords(row):
freq = 0
for word in row['Keyword'].split(" "):
if word in row['Cluster']:
freq = 1
return freq
df['Frequency'] = df.apply(lambda row: count_keywords(row), axis=1)
輸出:
>>> df
Keyword Cluster Frequency
0 Nike Nike Socks 1
1 Nike Socks Nike Socks 2
2 Nike Stripy Socks Nike Socks 2
3 Socks Nike Nike Socks 2
4 Adidas Socks Nike Socks 1
uj5u.com熱心網友回復:
我的回答類似于@Derek,但如果Cluster列中的單詞不僅用空格分隔,它也會正常作業
from re import findall
import pandas as pd
def count_corresponding(row):
keywords = row.Keyword.split(' ')
count = sum([len(findall(keyword,row.Cluster)) for keyword in keywords])
return count
data = {'Keyword' : ['Nike', 'Nike Socks', 'Nike Stripy Socks', 'Socks Nike', 'Adidas Socks'],
'Cluster' : ['Nike Socks', 'Nike Socks', 'Nike Socks', 'Nike Socks', 'Nike Socks']}
df = pd.DataFrame(data)
df['Frequency'] = df.apply(count_corresponding, axis=1)
uj5u.com熱心網友回復:
explode然后我們可以計算 word 的出現次數,然后sum回傳
x = df.assign(Keyword = df.Keyword.str.split(' ')).explode('Keyword')
df['freq'] = x.apply(lambda y : y['Keyword'] in y['Cluster'],axis=1).groupby(level=0).sum()
df
Keyword Cluster freq
0 Nike Nike Socks 1
1 Nike Socks Nike Socks 2
2 Nike Stripy Socks Nike Socks 2
3 Socks Nike Nike Socks 2
4 Adidas Socks Nike Socks 1
uj5u.com熱心網友回復:
apply無論如何你都必須使用,所以你可以直接使用set交集:
df['Frequency'] = df.apply(lambda x: len(set(x['Keyword'].split()).intersection(x['Cluster'].split())), axis=1)
輸出:
Keyword Cluster Frequency
0 Nike Nike Socks 1
1 Nike Socks Nike Socks 2
2 Nike Stripy Socks Nike Socks 2
3 Socks Nike Nike Socks 2
4 Adidas Socks Nike Socks 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/406012.html
標籤:
