我有idf如下資料框。我有另一個資料框df
idf
Output-
feature_name idf_weights
2488 kralendijk 11.221923
3059 night 0
1383 ebebf 0
df
Output-
message Number of Words in each message
0 night kralendijk ebebf 3
我想為新列中“df”資料框中的每個單詞從 idf 資料框中添加“idf 權重”。
輸出將如下所示 -
df
Output-
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 1
我嘗試在下面的代碼中進行計數,但它不起作用。但它給出的單詞總數而不是單詞 idf_weight>0
Code-
words_weights = dict(idf[['feature_name', 'idf_weights']].values)
df['> zero'] = df['message'].apply(lambda x: count([words_weights.get(word, 11.221923) for word in x.split()]))
Output-
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 3
謝謝你。
uj5u.com熱心網友回復:
嘗試使用串列推導:
# set up a dictionary for easy feature->weight indexing
d = idf.set_index('feature_name')['idf_weights'].to_dict()
# {'kralendijk': 11.221923, 'night': 0.0, 'ebebf': 0.0}
df['> zero'] = [sum(d.get(w, 0)>0 for w in x.split()) for x in df['message']]
## OR, slighlty faster alternative
# df['> zero'] = [sum(1 for w in x.split() if d.get(w, 0)>0) for x in df['message']]
輸出:
message Number of Words in each message > zero
0 night kralendijk ebebf 3 1
uj5u.com熱心網友回復:
您可以使用str.findall:這里的目標是創建一個權重大于 0 的特征名稱串列,以便在每條訊息中查找。
pattern = fr"({'|'.join(idf.loc[idf['idf_weights'] > 0, 'feature_name'])})"
df['Number of words with idf_score>0'] = df['message'].str.findall(pattern).str.len()
print(df)
# Output
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/453418.html
