我有兩個資料框,一個有一個包含值串列的列,另一個有一些值。
如果第二個df中的一個值存在于主df列中,我想過濾主df。
代碼:
import pandas as pd
A = pd.DataFrame({'index':[0,1,2,3,4], 'vals':[[1,2],[5,4],[7,1,26],['-'],[9,8,5]]})
B = pd.DataFrame({'index':[4,7], 'val':[1,8]})
print(A)
print(B)
print(B['val'].isin(A['vals'])) # Will not work since its comparing element to list
result = pd.DataFrame({'index':[0,2,4], 'vals':[[1,2],[7,1,26],[9,8,5]]})
資料框 A
|index|vals|
|:-|:--|
|0|[1, 2]|
|1|[5, 4]|
|2|[7, 1, 26]|
|3|[-]|
|4|[9, 8, 5]|
資料框 B
|index|val|
|:-|:--|
|4|1|
|7|8|
結果
|index|vals|
|:-|:--|
|0|[1, 2]|
|2|[7, 1, 26]|
|4|[9, 8, 5]|
uj5u.com熱心網友回復:
您可以分解vals列然后計算交集:
>>> A.loc[A['vals'].explode().isin(B['val']).loc[lambda x: x].index]
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
詳情explode:
>>> A['vals'].explode()
0 1
0 2
1 5
1 4
2 7 # not in B -|
2 1 # in B | -> keep index 2
2 26 # not in B -|
3 -
4 9
4 8
4 5
Name: vals, dtype: object
uj5u.com熱心網友回復:
您可以使用:
# mask the values based on the intersection between the list in each row and B values
mask = A['vals'].apply(lambda a: len(list(set(a) & set(B['val'])))) > 0
result = A[mask]
print(result)
輸出:
index vals
0 0 [1, 2]
2 2 [7, 1, 26]
4 4 [9, 8, 5]
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/463799.html
上一篇:在R中重新排序資料框
