我有一個df1包含多列的資料框(原始表)。
我有一個過濾的 DataFrame ,其中只有4 列df2。dateagent_idgps1gps2
在df1, 我有date,agent_id和final_gps其他列。
我想過濾df1其中存在的所有資料df2,我想比較基于。
df1.date == df2.date & df1.agent_id == df2.agent_id & ( df1.final_gps == df2.gps1或df1.final_gps == df2.gps2 )
df2 樣本
date agent_id gps1 gps2
14-02-2020 12abc (1,2) (7,6)
14-02-2020 12abc (3,4) (7,6)
14-02-2020 33bcd (6,7) (8,9)
20-02-2020 44hgf (1,6) (3,7)
20-02-2020 12abc (3,5) (3,1)
20-02-2020 33bcd (3,4) (3,6)
21-02-2020 12abc (4,5) (5,4)
df1 樣本
date agent_id final_gps ….
10-02-2020 12abc (1,2) …
10-02-2020 33bcd (8,9) …
14-02-2020 12abc (1,2) …
14-02-2020 12abc (7,6) …
14-02-2020 12abc (3,4) …
14-02-2020 33bcd (6,7) …
14-02-2020 33bcd (8,9) …
14-02-2020 33bcd (1,1) …
14-02-2020 33bcd (2,2) …
18-02-2020 12abc (1,2) …
19-02-2020 44hgf (6,7) …
20-02-2020 12abc (3,5) …
20-02-2020 12abc (3,1) …
20-02-2020 44hgf (1,6) …
20-02-2020 44hgf (3,7) …
所需輸出:-
date agent_id final_gps ….
14-02-2020 12abc (1,2) …
14-02-2020 12abc (7,6) …
14-02-2020 12abc (3,4) …
14-02-2020 33bcd (6,7) …
14-02-2020 33bcd (8,9) …
20-02-2020 12abc (3,5) …
20-02-2020 12abc (3,1) …
20-02-2020 44hgf (1,6) …
20-02-2020 44hgf (3,7) …
我試過這個,但它給了我所有存在于 中的匹配記錄df2,但我只想要那些agent_id在那個特定date和特定gps匹配條件下的資料df1。
df = df1[df1['date'].isin(df2['date']) &
df1['agent_id'].isin(df2['agent_id']) &
(df1['final_gps'].isin(df2['gps1']) | df1['final_gps'].isin(df2['gps2']))]
uj5u.com熱心網友回復:
用于DataFrame.meltreshapegps1和gps2to final_gpsfirst,因此可能由所有 3 列合并(不需要定義on),洗掉所有列的重復項并最后排序:
df = (df2.melt(id_vars=['date','agent_id'],
value_vars=['gps1','gps2'],
value_name='final_gps')
.drop('variable', axis=1)
.merge(df1)
.drop_duplicates()
.sort_values(by=['date','agent_id'], ignore_index=True))
print (df)
date agent_id final_gps
0 14-02-2020 12abc (1,2)
1 14-02-2020 12abc (3,4)
2 14-02-2020 12abc (7,6)
3 14-02-2020 33bcd (6,7)
4 14-02-2020 33bcd (8,9)
5 20-02-2020 12abc (3,5)
6 20-02-2020 12abc (3,1)
7 20-02-2020 44hgf (1,6)
8 20-02-2020 44hgf (3,7)
詳情:
print (df2.melt(id_vars=['date','agent_id'],
value_vars=['gps1','gps2'],
value_name='final_gps'))
date agent_id variable final_gps
0 14-02-2020 12abc gps1 (1,2)
1 14-02-2020 12abc gps1 (3,4)
2 14-02-2020 33bcd gps1 (6,7)
3 20-02-2020 44hgf gps1 (1,6)
4 20-02-2020 12abc gps1 (3,5)
5 20-02-2020 33bcd gps1 (3,4)
6 21-02-2020 12abc gps1 (4,5)
7 14-02-2020 12abc gps2 (7,6)
8 14-02-2020 12abc gps2 (7,6)
9 14-02-2020 33bcd gps2 (8,9)
10 20-02-2020 44hgf gps2 (3,7)
11 20-02-2020 12abc gps2 (3,1)
12 20-02-2020 33bcd gps2 (3,6)
13 21-02-2020 12abc gps2 (5,4)
uj5u.com熱心網友回復:
您可以使用多個isin并使用&運算子鏈接它們。由于final_gps可以是gps1or gps2,我們|在括號中使用運算子:
out = (df1[df1['date'].isin(df2['date']) &
df1['agent_id'].isin(df2['agent_id']) &
(df1['final_gps'].isin(df2['gps1']) | df1['final_gps'].isin(df2['gps2']))]
.reset_index(drop=True))
輸出:
date agent_id final_gps ….
0 14-02-2020 12abc (1, 2) …
1 14-02-2020 12abc (7, 6) …
2 14-02-2020 12abc (3, 4) …
3 14-02-2020 33bcd (6, 7) …
4 14-02-2020 33bcd (8, 9) …
5 20-02-2020 12abc (3, 5) …
6 20-02-2020 12abc (3, 1) …
7 20-02-2020 44hgf (1, 6) …
8 20-02-2020 44hgf (3, 7) …
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/441273.html
下一篇:從資料框中提取特征
