給定用戶表如下:
user query
0 a1 orange
1 a1 strawberry
2 a1 pear
3 a2 orange
4 a2 strawberry
5 a2 lemon
6 a3 orange
7 a3 banana
8 a6 meat
9 a7 beer
10 a8 juice
我想分組user并聚合為串列query并選擇前兩個專案,如果它超過兩個,預期結果是
user query
0 a1 [orange, strawberry]
1 a2 [orange, strawberry]
2 a3 [orange, banana]
3 a6 [meat]
4 a7 [beer]
5 a8 [juice]
使用下面的代碼
df_user = pd.DataFrame( {'user': {0: 'a1', 1: 'a1', 2: 'a1', 3: 'a2',
4: 'a2', 5: 'a2', 6: 'a3', 7: 'a3',
8: 'a6', 9: 'a7', 10: 'a8'},
'query': {0: 'orange', 1: 'strawberry',
2: 'pear', 3: 'orange', 4: 'strawberry',
5: 'lemon', 6: 'orange', 7: 'banana',
8: 'meat', 9: 'beer', 10: 'juice'}} )
print(df_user.groupby(['user'], as_index=False).agg(list))
我設法得到
user query
0 a1 [orange, strawberry, pear]
1 a2 [orange, strawberry, lemon]
2 a3 [orange, banana]
3 a6 [meat]
4 a7 [beer]
5 a8 [juice]
什么是獲得預期結果的好方法?
uj5u.com熱心網友回復:
您可以使用iloc最多切片 2 個專案:
df_user.groupby(['user'], as_index=False).agg(lambda s: s.iloc[:2].to_list())
輸出:
user query
0 a1 [orange, strawberry]
1 a2 [orange, strawberry]
2 a3 [orange, banana]
3 a6 [meat]
4 a7 [beer]
5 a8 [juice]
uj5u.com熱心網友回復:
這是一種方法:
out = df[df.groupby('user').cumcount()<2].groupby('user', as_index=False).agg(list)
輸出:
user query
0 a1 [orange, strawberry]
1 a2 [orange, strawberry]
2 a3 [orange, banana]
3 a6 [meat]
4 a7 [beer]
5 a8 [juice]
?
uj5u.com熱心網友回復:
您可以使用groupby nth()按索引從每個組中選擇元素(如果存在):
new_df = df.groupby('user').nth([0, 1]).groupby(level=0).agg(list)
輸出:
>>> new_df
query
user
a1 [orange, strawberry]
a2 [orange, strawberry]
a3 [orange, banana]
a6 [meat]
a7 [beer]
a8 [juice]
請注意,如果您不想輸入所有這些數字,那list(range(2))將比 更具動態性:)[0, 1]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/442815.html
