我有一個熊貓資料框,df1. 我有另一個熊貓時間框架,我想df2用列中的列值替換串列中的元素,這些列在列中找到。fruitsduplicatesdf1namedf1
df1
name duplicates
0 a.apple ['b.apple', 'c.apple']
1 t.orange ['arr.orange', 'pg.orange']
2 ts.grape ['a.grape' , 'test.grape']
3 u.berryCool ['X.berryCool', 'cool.berryCool']
df2
people fruits
0 jack ['b.apple', 'c.apple', 'pp.tomato', 'ao.banana' ]
1 mary ['arr.orange', 'b.apple', 'X.berryCool', 'op.mango']
2 andy ['cool.berryCool' , 'test.grape', 'yu.papaya']
3 lawrence ['jc.orange', 'c.apple']
預期產出
people fruits
0 jack ['a.apple', 'a.apple', 'pp.tomato', 'ao.banana' ]
1 mary ['t.orange', 'a.apple', 'u.berryCool', 'op.mango']
2 andy ['u.berryCool' , 'ts.grape', 'yu.papaya']
3 lawrence ['t.orange' , 'a.apple']
我怎樣才能有效地做到這一點?任何建議表示贊賞。
uj5u.com熱心網友回復:
通過首先展平列中串列中的值創建字典,然后使用- 如果不匹配duplicates則映射值回傳相同的值:dict.get
d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}
df2['fruits'] = [[d.get(y,y) for y in x] for x in df2['fruits']]
print (df2)
people fruits
0 jack [a.apple, a.apple, pp.tomato, ao.banana]
1 mary [t.orange, a.apple, u.berryCool, op.mango]
2 andy [u.berryCool, ts.grape, yu.papaya]
3 lawrence [jc.orange, a.apple]
4k DataFrame 中的性能:(取決于資料,最佳測驗真實資料)
df2 = pd.concat([df2] * 1000, ignore_index=True)
In [135]: %%timeit
...: MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
...: df2['fruits1'] = (df2.explode('fruits')['fruits'].replace(MAPPING).groupby(level=0).agg(list))
...:
128 ms ± 2.81 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [136]: %%timeit
...: d = {x: a for a, b in zip(df1['name'], df1['duplicates']) for x in b}
...:
...: df2['fruits2'] = [[d.get(y,y) for y in x] for x in df2['fruits']]
...:
5.27 ms ± 245 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
uj5u.com熱心網友回復:
您可以創建一個映射字典(系列):
MAPPING = df1.explode('duplicates').set_index('duplicates')['name']
df2['fruits'] = (df2.explode('fruits')['fruits'].replace(MAPPING)
.groupby(level=0).agg(list))
print(df2)
# Output
people fruits
0 jack [a.apple, a.apple, pp.tomato, ao.banana]
1 mary [t.orange, a.apple, u.berryCool, op.mango]
2 andy [u.berryCool, ts.grape, yu.papaya]
3 lawrence [jc.orange, a.apple]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/496104.html
標籤:Python python-3.x 熊猫 数据框
上一篇:資料框存盤重復的標頭
