我想要一個pandas dataframe專欄
- 計算在through
'outcome2'中觀察到的次數'value''datetime' - 從第二次觀察開始
'outcome2' - 每
'ID'或df.index
import pandas as pd
from io import StringIO
import datetime
txt= """
ID,datetime,value
A,12/10/2022 10:00:00,outcome1
A,12/10/2022 11:15:10,outcome2
A,14/10/2022 15:30:30,outcome1
B,11/10/2022 11:30:22,outcome1
B,15/10/2022 22:44:11,outcome2
B,15/10/2022 23:30:22,outcome3
B,15/10/2022 23:31:11,outcome2
"""
df = pd.read_csv(StringIO(txt),\
parse_dates=[1],\
dayfirst=True)\
.assign(id_index= lambda x_df: x_df\
.groupby('ID', sort=False).ngroup())\
.set_index("id_index")\
.rename_axis(index=None)
df = df.assign(value_test = lambda df: df['value']=='outcome2',\
value_cumsum= lambda df: df.groupby('ID', sort=False)['value_test'].cumsum())
ID datetime value value_test value_cumsum
0 A 2022-10-12 10:00:00 outcome1 False 0
0 A 2022-10-12 11:15:10 outcome2 True 1
0 A 2022-10-14 15:30:30 outcome1 False 1
1 B 2022-10-11 11:30:22 outcome1 False 0
1 B 2022-10-15 22:44:11 outcome2 True 1
1 B 2022-10-15 23:30:22 outcome3 False 1
1 B 2022-10-15 23:31:11 outcome2 True 2
我嘗試將第三個變數分配給在函式中df使用 if 陳述句。lambda它以其他人經歷過的方式失敗1:
df = df.assign(value_test = lambda df: df['value']=='outcome2',\
value_cumsum = lambda df: df.groupby('ID', sort=False)['value_test'].cumsum(),\
outcome2 = lambda df: 0 if df[df[value_cumsum]==1] or df[df[value_cumsum]==0]\
else df[value_cumsum]-1 if df[df[value_cumsum] > 1]
ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
我只需要從每組的第二次觀察開始的計數的累積總和(運行總數)'outcome2'。 *'value''outcome2'
請問有什么建議嗎?
是否可以使用lambdavalue_testvalue_cumsum或不使用中間步驟或?
所需的df
ID datetime value outcome2
0 A 2022-10-12 10:00:00 outcome1 0
0 A 2022-10-12 11:15:10 outcome2 0
0 A 2022-10-14 15:30:30 outcome1 0
1 B 2022-10-11 11:30:22 outcome1 0
1 B 2022-10-15 22:44:11 outcome2 0
1 B 2022-10-15 23:30:22 outcome3 0
1 B 2022-10-15 23:31:11 outcome2 1
uj5u.com熱心網友回復:
您可以使用:
df['value_cumsum'] = (df.groupby('ID')['value_test']
.cumsum().sub(1).where(df['value_test'], 0)
)
或者,如果您還想標記 False:
df['value_cumsum'] = (df.groupby('ID')['value_test']
.cumsum().sub(1).clip(lower=0)
)
輸出:
ID datetime value value_test value_cumsum
0 A 2022-10-12 10:00:00 outcome1 False 0
0 A 2022-10-12 11:15:10 outcome2 True 0
0 A 2022-10-14 15:30:30 outcome1 False 0
1 B 2022-10-11 11:30:22 outcome1 False 0
1 B 2022-10-15 22:44:11 outcome2 True 0
1 B 2022-10-15 23:30:22 outcome3 False 0
1 B 2022-10-15 23:31:11 outcome2 True 1
無中間體:
df['value_cumsum'] = (df['value'].eq('outcome2')
.groupby(df['ID'])
.cumsum().sub(1).clip(lower=0)
)
輸出:
ID datetime value value_cumsum
0 A 2022-10-12 10:00:00 outcome1 0
0 A 2022-10-12 11:15:10 outcome2 0
0 A 2022-10-14 15:30:30 outcome1 0
1 B 2022-10-11 11:30:22 outcome1 0
1 B 2022-10-15 22:44:11 outcome2 0
1 B 2022-10-15 23:30:22 outcome3 0
1 B 2022-10-15 23:31:11 outcome2 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/515938.html
上一篇:為驗證集創建滯后
