我正在使用一個資料集(10000 個資料點),該資料集提供 100 個不同的帳號以及交易金額、交易日期和時間等。
從這個資料集中,我想為一個帳號創建一個單獨的資料框,然后包含該帳號全年進行的所有交易(按時間排序)。
我試圖通過以下方式做到這一點:
group = df.groupby('account_num')
然后給了我
pandas.core.groupby.generic.DataFrameGroupBy
然后,當我想獲取特定帳號的組時,例如 51234:
group.get_group('51234')
我收到一個錯誤:
KeyError: 51234
如何制作一個單獨的資料框,其中包含一個帳號的所有交易?
(對不起,如果這是一個非常基本的問題,我是新手)
uj5u.com熱心網友回復:
IIUC,您可以通過稍微不同的方式獲得輸出。您可以首先確保您的時間列(我假設它是基于您的描述的日期)實際上是一個datetime物件,然后為特定帳號過濾您的資料框 - 有很多方法可以做到這一點,一個常見的方法是loc,但就我而言,我使用query. 然后你可以根據你的日期排序,使用sort_values,最后你可以groupby在日期列的年份部分使用:
# Convert your date column to datetime
df['date'] = pd.to_datetime(df['date'])
# Filter and sort
>>> print(df.query('account_num == 51234')\
.sort_values(by=['date'],ascending=True))
# Equivalently with loc
print(
df.loc[df['account_num'] == 51234]\
.sort_values(by=['date'],ascending=True))
account_num date
0 51234 2020-01-01
1 51234 2020-02-01
2 51234 2020-03-01
7 51234 2020-08-01
9 51234 2020-08-01
11 51234 2020-08-01
13 51234 2020-08-01
3 51234 2021-04-01
4 51234 2021-05-01
5 51234 2023-06-01
6 51234 2023-07-01
8 51234 2023-07-01
10 51234 2023-07-01
12 51234 2023-07-01
# Filter, sort, and get yearly count
>>> print(
df.query('account_num == 51234')\
.sort_values(by=['date'],ascending=True)\
.groupby(df['date'].dt.year).account_num.count())
date
2020 7
2021 2
2023 5
基于以下示例 DF:
{'account_num': {0: 51234,
1: 51234,
2: 51234,
3: 51234,
4: 51234,
5: 51234,
6: 51234,
7: 51234,
8: 51234,
9: 51234,
10: 51234,
11: 51234,
12: 51234,
13: 51234,
14: 512346,
15: 512346,
16: 512346,
17: 512346,
18: 512346,
19: 512346,
20: 512346,
21: 512346,
22: 512346,
23: 13123,
24: 13123,
25: 13123,
26: 13123,
27: 13123,
28: 13123,
29: 13123,
30: 13123,
31: 13123},
'date': {0: '01/01/2020',
1: '02/01/2020',
2: '03/01/2020',
3: '04/01/2021',
4: '05/01/2021',
5: '06/01/2023',
6: '07/01/2023',
7: '08/01/2020',
8: '07/01/2023',
9: '08/01/2020',
10: '07/01/2023',
11: '08/01/2020',
12: '07/01/2023',
13: '08/01/2020',
14: '09/01/2020',
15: '10/01/2020',
16: '11/01/2020',
17: '12/01/2020',
18: '13/01/2020',
19: '14/01/2020',
20: '15/01/2020',
21: '16/01/2020',
22: '17/01/2020',
23: '18/01/2020',
24: '19/01/2020',
25: '20/01/2020',
26: '21/01/2020',
27: '22/01/2020',
28: '23/01/2020',
29: '24/01/2020',
30: '25/01/2020',
31: '26/01/2020'}}
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/408933.html
標籤:
上一篇:根據分配給范圍的值創建表
