我正在嘗試使用以下方法匹配兩個不同資料框中的兩列:
res = mergedStuff = pd.merge(df1, df2, on=['text'])
但是由于某種原因,它還會回傳不匹配的行。
理想情況下,我會回傳兩個新的資料幀,其中只包含每個匹配的行。
一個示例如下所示:
df1 = text | feature1 | feature2 | feature3
bananas are great | 0 | 1 | 0
apples are better | 1 | 1 | 0
grapes are okay | 0 | 0 | 1
ice cream for the win | 1 | 0 | 1
df2 = text | feature1 | feature2 | feature3
bananas are great | 0 | 1 | 0
apples are better | 1 | 1 | 0
berries are yummy | 0 | 0 | 1
ice cream for the win | 0 | 1 | 1
理想情況下,我現在將回傳每個資料幀,但只回傳與列匹配的行text。
預期結果:
df1 = text | feature1 | feature2 | feature3
bananas are great | 0 | 1 | 0
apples are better | 1 | 1 | 0
ice cream for the win | 1 | 0 | 1
df2 = text | feature1 | feature2 | feature3
bananas are great | 0 | 1 | 0
apples are better | 1 | 1 | 0
ice cream for the win | 0 | 1 | 1
uj5u.com熱心網友回復:
你可以使用set.intersection; 然后過濾常見的文本:
common = set(df1['text']) & set(df2['text'])
df1 = df1[df1['text'].isin(common)]
df2 = df2[df2['text'].isin(common)]
然后df1看起來像:
text feature1 feature2 feature3
0 bananas are great 0 1 0
1 apples are better 1 1 0
3 ice cream for the win 1 0 1
看起來df2像:
text feature1 feature2 feature3
0 bananas are great 0 1 0
1 apples are better 1 1 0
3 ice cream for the win 0 1 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/455767.html
標籤:Python python-3.x 熊猫 数据框
