檢查包含句子的兩列是否匹配并僅回傳每個資料幀的匹配項-有解無憂

我正在嘗試使用以下方法匹配兩個不同資料框中的兩列：

res = mergedStuff = pd.merge(df1, df2, on=['text'])

但是由于某種原因，它還會回傳不匹配的行。

理想情況下，我會回傳兩個新的資料幀，其中只包含每個匹配的行。

一個示例如下所示：

df1 = text                 | feature1 | feature2 | feature3 
     bananas are great     | 0        |  1       | 0  
     apples are better     | 1        |  1       | 0   
     grapes are okay       | 0        |  0       | 1  
     ice cream for the win | 1        |  0       | 1   


 df2 =      text               | feature1 | feature2 | feature3 
         bananas are great     | 0        |  1       | 0  
         apples are better     | 1        |  1       | 0   
         berries are yummy     | 0        |  0       | 1  
         ice cream for the win | 0        |  1       | 1

理想情況下，我現在將回傳每個資料幀，但只回傳與列匹配的行text。

預期結果：

df1 =     text                 | feature1 | feature2 | feature3 
         bananas are great     | 0        |  1       | 0  
         apples are better     | 1        |  1       | 0   
         ice cream for the win | 1        |  0       | 1 




 df2 =      text               | feature1 | feature2 | feature3 
         bananas are great     | 0        |  1       | 0  
         apples are better     | 1        |  1       | 0   
         ice cream for the win | 0        |  1       | 1

uj5u.com熱心網友回復：

你可以使用set.intersection; 然后過濾常見的文本：

common = set(df1['text']) & set(df2['text'])
df1 = df1[df1['text'].isin(common)]
df2 = df2[df2['text'].isin(common)]

然后df1看起來像：

                    text  feature1  feature2  feature3 
0      bananas are great         0         1         0
1      apples are better         1         1         0
3  ice cream for the win         1         0         1

看起來df2像：

                    text  feature1  feature2  feature3
0      bananas are great         0         1         0
1      apples are better         1         1         0
3  ice cream for the win         0         1         1

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/455767.html

標籤：Python python-3.x 熊猫数据框

上一篇：如何在Pandas資料框中獲取最后一行日期時間？

下一篇：熊貓：用同一行的另一列的內容更新用loc找到的內容或行