我有一個演員姓名的資料框:
df1
actor_id actor_name
1 Brad Pitt
2 Nicole Kidman
3 Matthew Goode
4 Uma Thurman
5 Ethan Hawke
演員所在的電影的另一個資料框:
df2
actor_id actor_movie movie_revenue_m
1 Once Upon a Time in Hollywood 150
2 The Others 50
2 Moulin Rouge 200
3 Stoker 75
4 Kill Bill 125
5 Gattaca 85
我想將兩個資料框合并在一起,以向演員展示他們的電影名稱和電影收入,所以我使用了合并函式:
df3 = df1.merge(df2, on = 'actor_id', how = 'left')
df3
actor_id actor_name actor_movie movie_revenue
1 Brad Pitt Once Upon a Time in Hollywood 150
2 Nicole Kidman Moulin Rouge 50
2 Nicole Kidman The Others 200
3 Matthew Goode Stoker 75
4 Uma Thurman Kill Bill 125
5 Ethan Hawke Gattaca 85
但這會涉及所有電影,所以妮可基德曼被復制了,我只想為每個演員放映一部電影。如何在不“復制”我的演員串列的情況下合并資料框?
我將如何合并按字母順序排列的電影標題?
我將如何合并收入最高的電影名稱?
謝謝!
uj5u.com熱心網友回復:
一種方法是繼續合并,然后過濾結果集
按字母順序排列的電影標題
# sort by name, movie and then pick the first while grouping by actor
df.sort_values(['actor_name','actor_movie'] ).groupby('actor_id', as_index=False).first()
actor_id actor_name actor_movie movie_revenue
0 1 Brad Pitt Once Upon a Time in Hollywood 150
1 2 Nicole Kidman Moulin Rouge 50
2 3 Matthew Goode Stoker 75
3 4 Uma Thurman Kill Bill 125
4 5 Ethan Hawke Gattaca 85
收入最高的電影名稱
# sort by name, and review (descending), groupby actor and pick first
df.sort_values(['actor_name','movie_revenue'], ascending=[1,0] ).groupby('actor_id', as_index=False).first()
actor_id actor_name actor_movie movie_revenue
0 1 Brad Pitt Once Upon a Time in Hollywood 150
1 2 Nicole Kidman The Others 200
2 3 Matthew Goode Stoker 75
3 4 Uma Thurman Kill Bill 125
4 5 Ethan Hawke Gattaca 85
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/527136.html
標籤:Python熊猫合并查找
上一篇:按月選擇df行格式為(lambdax:datetime.datetime.strptime(x,'%Y-%m-%dT%H:%M:%S%z'))
