我有一個客戶端資料框,CLIENT_ID如下所示:
| CLIENT_ID | CURRENT_DATE_STATUS | 地位 |
|---|---|---|
| 10002 | 2017-07-21 | 開始 |
| 10002 | 2017-07-21 | 開始 |
| 10002 | 2018-07-01 | 攪動 |
| 10002 | 2018-07-01 | 攪動 |
| 10002 | 2019-01-01 | 重啟 |
| 11811 | 2019-08-15 | 開始 |
| 11811 | 2019-08-15 | 開始 |
| 11811 | 2019-12-31 | 重啟 |
| 22101 | 2020-03-11 | 開始 |
| 22101 | 2020-03-11 | 開始 |
| 22101 | 2020-03-11 | 開始 |
| 22101 | 2020-11-01 | 攪動 |
| 22300 | 2018-05-06 | 開始 |
| 22300 | 2018-05-06 | 開始 |
資料框按 排序CLIENT_ID and CURRENT_DATE_STATUS。如何創建指示Boolean 1 or 0列的指標列:
- 如果前一個
STATUS條目已更改CHURNED or RESTARTED為每個CLIENT_ID.
結果資料框如下所示:
| CLIENT_ID | CURRENT_DATE_STATUS | 地位 | 停止 |
|---|---|---|---|
| 10002 | 2017-07-21 | 開始 | 0 |
| 10002 | 2017-07-21 | 開始 | 0 |
| 10002 | 2018-07-01 | 攪動 | 1 |
| 10002 | 2018-07-01 | 攪動 | 0 |
| 10002 | 2019-01-01 | 重啟 | 1 |
| 11811 | 2019-08-15 | 開始 | 0 |
| 11811 | 2019-08-15 | 開始 | 0 |
| 11811 | 2019-12-31 | 重啟 | 1 |
| 22101 | 2020-03-11 | 開始 | 0 |
| 22101 | 2020-03-11 | 開始 | 0 |
| 22101 | 2020-03-11 | 開始 | 0 |
| 22101 | 2020-11-01 | 攪動 | 1 |
| 22300 | 2018-05-06 | 開始 | 0 |
| 22300 | 2018-05-06 | 開始 | 0 |
這是生成資料框的代碼
import pandas as pd
data = {'CLIENT_ID':[10002,10002,10002,10002,10002,11811,11811,11811,22101,22101,22101,22101,22300,22300],
'CURRENT_DATE_STATUS':['2017-07-21','2017-07-21','2018-07-01','2018-07-01','2019-07-01','2019-08-15','2019-08-15','2019-12-31','2020-03-11','2020-03-11','2020-03-11','2020-11-01','2018-05-06','2018-05-06'],
'STATUS':['STARTED','STARTED','CHURNED','CHURNED','RESTARTED','STARTED','STARTED','RESTARTED','STARTED','STARTED','STARTED','CHURNED','STARTED','STARTED']}
df = pd.DataFrame(data)
uj5u.com熱心網友回復:
您可以將 eqaul by 的實際值與每組的Series.eqshift by DataFrameGroupBy.shiftfor not equal 比較Series.ne,chain by &for bitwiseAND和 last chain by |for bitwiseOR與轉換為整數:
s = df.groupby('CLIENT_ID')['STATUS'].shift()
m1 = df['STATUS'].eq('RESTARTED') & s.ne('RESTARTED')
m2 = df['STATUS'].eq('CHURNED') & s.ne('CHURNED')
df['STOPPED'] = (m1 | m2).astype(int)
print (df)
CLIENT_ID CURRENT_DATE_STATUS STATUS STOPPED
0 10002 2017-07-21 STARTED 0
1 10002 2017-07-21 STARTED 0
2 10002 2018-07-01 CHURNED 1
3 10002 2018-07-01 CHURNED 0
4 10002 2019-07-01 RESTARTED 1
5 11811 2019-08-15 STARTED 0
6 11811 2019-08-15 STARTED 0
7 11811 2019-12-31 RESTARTED 1
8 22101 2020-03-11 STARTED 0
9 22101 2020-03-11 STARTED 0
10 22101 2020-03-11 STARTED 0
11 22101 2020-11-01 CHURNED 1
12 22300 2018-05-06 STARTED 0
13 22300 2018-05-06 STARTED 0
另一種解決方案是按前一個比較移位的值,然后如果按串列匹配,則按位按Series.isin最后一個鏈:&AND
m3 = df.groupby('CLIENT_ID')['STATUS'].shift().ne(df['STATUS'])
m4 = df['STATUS'].isin(["CHURNED", "RESTARTED"])
df['STOPPED'] = (m3 & m4).astype(int)
print (df)
CLIENT_ID CURRENT_DATE_STATUS STATUS STOPPED
0 10002 2017-07-21 STARTED 0
1 10002 2017-07-21 STARTED 0
2 10002 2018-07-01 CHURNED 1
3 10002 2018-07-01 CHURNED 0
4 10002 2019-07-01 RESTARTED 1
5 11811 2019-08-15 STARTED 0
6 11811 2019-08-15 STARTED 0
7 11811 2019-12-31 RESTARTED 1
8 22101 2020-03-11 STARTED 0
9 22101 2020-03-11 STARTED 0
10 22101 2020-03-11 STARTED 0
11 22101 2020-11-01 CHURNED 1
12 22300 2018-05-06 STARTED 0
13 22300 2018-05-06 STARTED 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/336079.html
標籤:Python 熊猫 数据框 pandas-groupby
下一篇:如何捕獲資料框中值更改的生效日期
