我有個問題。我想從一個日期計算例如2022-06-01有多少觸摸客戶與customerId == 1最后一次6 months。他有兩次接觸2022-05-25和2022-05-20。但是,我不知道如何將客戶分組并說您擁有的日期取決于count_from_date客戶的接觸次數。我也有一個KeyError.
資料框
customerId fromDate
0 1 2022-06-01
1 1 2022-05-25
2 1 2022-05-25
3 1 2022-05-20
4 1 2021-09-05
5 2 2022-06-02
6 3 2021-03-01
7 3 2021-02-01
import pandas as pd
d = {'customerId': [1, 1, 1, 1, 1, 2, 3, 3],
'fromDate': ["2022-06-01", "2022-05-25", "2022-05-25", "2022-05-20", "2021-09-05",
"2022-06-02", "2021-03-01", "2021-02-01"]
}
df = pd.DataFrame(data=d)
print(df)
df_new = df.groupby(['customerId', 'fromDate'], as_index=False)['fromDate'].count()
df_new['count_from_date'] = df_new['fromDate']
df = df.merge(df_new['count_from_date'], how='inner', left_index=True, right_index=True)
(df.set_index(['fromDate']).sort_index().groupby('customerId').apply(lambda s: s['count_from_date'].rolling('180D').sum())- 1) / df.set_index(['customerId', 'fromDate'])['count_from_date']
[OUT] KeyError: 'count_from_date'
我想要的是
customerId fromDate occur_last_6_months
0 1 2022-06-01 3 # 2022-05-25, 2022-05-20, 2022-05-20 = 3
1 1 2022-05-25 1 # 2022-05-20 = 1
2 1 2022-05-25 1 # 2022-05-20 = 1
3 1 2022-05-20 0 # No in the last 6 months
4 1 2021-09-05 0 # No in the last 6 months
5 2 2022-06-02 0 # No in the last 6 months
6 3 2021-03-01 1 # 2021-02-01 = 1
7 3 2021-02-01 0 # No in the last 6 months
uj5u.com熱心網友回復:
如果可能的話,將重復值(如第二行和第三行計數匹配掩碼中的值)sum僅與 True 值相加:
df["fromDate"] = pd.to_datetime(df["fromDate"], errors="coerce")
df["last_month"] = df["fromDate"] - pd.offsets.DateOffset(months=6)
def f(x):
d1 = x["fromDate"].to_numpy()
d2 = x["last_month"].to_numpy()
x['occur_last_6_months'] = ((d2[:, None]<= d1) & (d1 <= d1[:, None])).sum(axis=1) - 1
return x
df = df.groupby('customerId').apply(f)
print(df)
customerId fromDate last_month occur_last_6_months
0 1 2022-06-01 2021-12-01 3
1 1 2022-05-25 2021-11-25 2
2 1 2022-05-25 2021-11-25 2
3 1 2022-05-20 2021-11-20 0
4 1 2021-09-05 2021-03-05 0
5 2 2022-06-02 2021-12-02 0
6 3 2021-03-01 2020-09-01 1
7 3 2021-02-01 2020-08-01 0
如果需要減去每個重復日期的所有計數,而不是減去1使用:GroupBy.transformsize
df["fromDate"] = pd.to_datetime(df["fromDate"], errors="coerce")
df["last_month"] = df["fromDate"] - pd.offsets.DateOffset(months=6)
def f(x):
d1 = x["fromDate"].to_numpy()
d2 = x["last_month"].to_numpy()
x['occur_last_6_months'] = ((d2[:, None]<= d1) & (d1 <= d1[:, None])).sum(axis=1)
return x
df = df.groupby('customerId').apply(f)
s = df.groupby(['customerId', 'fromDate'])['customerId'].transform('size')
df['occur_last_6_months'] -= s
print(df)
customerId fromDate last_month occur_last_6_months
0 1 2022-06-01 2021-12-01 3
1 1 2022-05-25 2021-11-25 1
2 1 2022-05-25 2021-11-25 1
3 1 2022-05-20 2021-11-20 0
4 1 2021-09-05 2021-03-05 0
5 2 2022-06-02 2021-12-02 0
6 3 2021-03-01 2020-09-01 1
7 3 2021-02-01 2020-08-01 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/488341.html
上一篇:Python中的多列插值
