我有個問題。我想回答一些問題(見下文)。不幸的是我得到了一個錯誤ValueError: Wrong number of items passed 0, placement implies 1。我如何確定問題?
- 幾天前(從今天開始)最后一次互動是什么時候?
- 客戶下了多少訂單?
- 最快的互動是什么時候?
- 最短的互動時間是什么時候?
- 互動性的平均值是多少?(為此我計算了天數)
From 2021-02-10 to 2021-02-22 = 12
From 2021-02-22 to 2021-03-18 = 24
From 2021-03-18 to 2021-03-22 = 4
From 2021-03-22 to 2021-09-07 = 109
From 2021-09-07 to 2022-01-18 = 193
-------
68,4 (average days)
資料框
customerId fromDate
0 1 2021-02-22
1 1 2021-03-18
2 1 2021-03-22
3 1 2021-02-10
4 1 2021-09-07
5 1 None
6 1 2022-01-18
7 2 2021-05-17
代碼
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 1, 1, 2],
'fromDate': ['2021-02-22', '2021-03-18', '2021-03-22',
'2021-02-10', '2021-09-07', None, '2022-01-18', '2021-05-17']
}
df = pd.DataFrame(data=d)
display(df)
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce').dt.date
#df = df['fromDate'].groupby('customerId', group_keys=False)
df = df.sort_values(['customerId','fromDate'],ascending=False)#.groupby('customerId')
df_new = pd.DataFrame()
df_new['average'] = df.groupby('customerId').mean()
[OUT] AttributeError: 'DataFrameGroupBy' object has no attribute 'groupby'
df_new = pd.DataFrame()
df_new['lastInteractivity'] = pd.to_datetime('today').normalize() - df['fromDate'].max()
[OUT] TypeError: '>=' not supported between instances of 'datetime.date' and 'float'
我想要的是
customerId lastInteractivity howMany shortest Longest Average
2 371 1 None None None
1 125 5 4 193 68,4
# shortes, longest, average is None because customer with the Id 2 had only 1 date
uj5u.com熱心網友回復:
采用:
#converting to datetimes
df['fromDate'] = pd.to_datetime(df['fromDate'], errors='coerce')
#for correct add missing dates is sorting ascending by both columns
df = df.sort_values(['customerId','fromDate'])
#new column per customerId
df['lastInteractivity'] = pd.to_datetime('today').normalize() - df['fromDate']
#added missing dates per customerId, also count removed missing rows with NaNs
df = (df.dropna(subset=['fromDate'])
.set_index('fromDate')
.groupby('customerId')['lastInteractivity']
.apply(lambda x: x.asfreq('d'))
.reset_index())
#count how many missing dates
m = df['lastInteractivity'].notna()
df1 = (df[~m].groupby(['customerId', m.cumsum()])['customerId']
.size()
.add(1)
.reset_index(name='Count'))
print (df1)
customerId lastInteractivity Count
0 1 1 12
1 1 2 24
2 1 3 4
3 1 4 169
4 1 5 133
df1 = df1.groupby('customerId').agg(howMany=('Count','size'),
shortest=('Count','min'),
Longest=('Count','max'),
Average=('Count','mean'))
#get last lastInteractivity and joined df1
df = (df.groupby('customerId')['lastInteractivity']
.last()
.dt.days
.sort_index(ascending=False)
.to_frame()
.join(df1)
.reset_index())
print (df)
customerId lastInteractivity howMany shortest Longest Average
0 2 371 NaN NaN NaN NaN
1 1 125 5.0 4.0 169.0 68.4
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/480594.html
