我需要創建一個新列,在分隔串列中的物體之間建立關系(除逗號外,任何分隔符都有效)。
資料框:
df1 = pd.DataFrame(np.array([[1000, 'Jerry', 'BR1','BR1'],
[1001, 'Sal', 'BR2', 'BR1'],
[1002, 'Buck', 'BR3', 'BR2'],
[1003, 'Perry','BR4','BR1']]),
columns=['ID', 'Name', 'Branch', 'Member of'])
最終結果應該是:
ID Name Branch Member of Members
==== ==== ====== ========= =======
1000 Jerry BR1 BR1 Jerry, Sal, Perry
1001 Sal BR2 BR1 Buck
1002 Buck BR3 BR2 NaN
1003 Perry BR4 BR1 NaN
我需要通過在“Member of”中查找所有匹配項來創建“members”列,但回傳“Name”,然后填充以“members”結尾的串列。
np.where 會是一個很好的起點嗎?
np.where(df['Branch'] == df['Member of'], ??, np.nan)
uj5u.com熱心網友回復:
使用groupby生成成員的名單,然后merge:
s = df1.groupby('Member of')['Name'].apply(list).rename('Members')
df2 = df1.merge(s, left_on='Branch', right_index=True, how='left')
輸出:
ID Name Branch Member of Members
0 1000 Jerry BR1 BR1 [Jerry, Sal, Perry]
1 1001 Sal BR2 BR1 [Buck]
2 1002 Buck BR3 BR2 NaN
3 1003 Perry BR4 BR1 NaN
注意。如果你想要一個字串而不是一個串列,使用', '.join代替list
uj5u.com熱心網友回復:
試試這個:
df1['Members'] = df1['Branch'].apply(lambda b: ', '.join(df1.loc[df1['Member of'] == b, 'Name'])).replace('', np.nan)
輸出:
>>> df1
ID Name Branch Member of Members
0 1000 Jerry BR1 BR1 Jerry, Sal, Perry
1 1001 Sal BR2 BR1 Buck
2 1002 Buck BR3 BR2 NaN
3 1003 Perry BR4 BR1 NaN
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/390592.html
