user當特定列的列值已經作為另一列中的串列元素存在時,我想完全洗掉這些行。我怎樣才能最好地完成這個?謝謝你。
user friend
0 jack [mary, jane, alex]
1 mary [kate, andrew, jensen]
2 alice [marina, catherine, howard]
3 andrew [syp, yuslina, john ]
4 catherine [yute, kelvin]
5 john [beyond, holand]
預期產出
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
uj5u.com熱心網友回復:
您的示例似乎不正確,因為應該保留 john (黑名單由所有以前的朋友組成),或者應該洗掉 Andrew(黑名單只是以前的朋友串列)。
這里有不同的選擇。
洗掉是使用的存在于:
任何一組朋友
S = set().union(*df['friend'])
mask = ~df['user'].isin(S)
# [False, True, False, True, True, True]
df[mask]
輸出:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
所有以前的朋友組
您可以首先計算一組擴展的朋友,然后檢查每個用戶是否在集合中:
S = set()
# line below uses python ≥ 3.8, if older version use a classical loop
sets = [(S:=S.union(set(x))) for x in df['friend']]
mask = [u not in s for u,s in zip(df['user'], sets)]
# [True, False, True, False, False, False]
out = df[mask]
輸出:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
只有前一組朋友
mask = [u not in s for u,s in zip(df['user'], df['friend'].agg(set).shift(fill_value={}))]
# [True, False, True, True, True, True]
out = df[mask]
輸出:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
3 andrew [syp, yuslina, john]
4 catherine [yute, kelvin]
5 john [beyond, holand]
使用的輸入:
d = {'user': ['jack', 'mary', 'alice', 'andrew', 'catherine', 'john'],
'friend': [['mary', 'jane', 'alex'],
['kate', 'andrew', 'jensen'],
['marina', 'catherine', 'howard'],
['syp', 'yuslina', 'john'],
['yute', 'kelvin'],
['beyond', 'holand']]}
df = pd.DataFrame(d)
uj5u.com熱心網友回復:
您可以將所需的列轉換為一個串列,而無需任何嵌套串列。為此,您可以使用itertools.chain.from_iterable然后您可以使用pandas.isin.
(andrew存在于[kate, andrew, jensen]所以這個解決方案也不會顯示這一行。)
import itertools
df = df[~df['user'].isin(list(itertools.chain.from_iterable(df['friend'])))]
輸出:
user friend
0 jack [mary, jane, alex]
2 alice [marina, catherine, howard]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/496100.html
標籤:Python python-3.x 熊猫 数据框
