我想保留具有相同 ID 的最新行以及與某些列值匹配的行。樣本輸入:
ID Timestamp Survey Outcome
12 11/26/2021 INCOMPLETE Survey
95 11/26/2021 INCOMPLETE Survey
95 11/27/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
12 11/29/2021 COMPLETE Survey
24 11/26/2021 RANG-But did not connect
24 11/27/2021 INCOMPLETE Survey
95 11/28/2021 RANG-But did not connect
24 11/28/2021 INCOMPLETE Survey
這里 ID 12 有兩個值,所以我會保留最新的 (11/29/2021) 行。但是對于 ID 95,一旦調查完成,它就不能有任何其他選項,例如rang-but did not connect。所以我想保留最新的時間戳資料,并保留那些資料完成調查但最新資料顯示調查不完整或未連接的行(所有資料在看到COMPLETE SURVEY 后)。
所以我的示例輸出將是:
ID Timestamp Survey Outcome
95 11/27/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
12 11/29/2021 COMPLETE Survey
95 11/28/2021 RANG-But did not connect
24 11/28/2021 INCOMPLETE Survey```
uj5u.com熱心網友回復:
您可以使用:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df.sort_values(by=['ID', 'Timestamp']).reset_index(drop=True, inplace=True)
df = df.groupby('ID').apply(lambda x: x.loc[x[x['Survey Outcome'] == 'COMPLETE Survey'].index[0]: ] if
x['Survey Outcome'].isin(['COMPLETE Survey']).any() else x.loc[x['Timestamp'].idxmax():]).reset_index(drop=True)
print(df)
OUTPUT
ID Timestamp Survey Outcome
0 12 2021-11-29 COMPLETE Survey
1 24 2021-11-28 INCOMPLETE Survey
2 95 2021-11-27 COMPLETE Survey
3 95 2021-11-28 RANG-But did not connect
4 95 2021-11-28 RANG-But did not connect
uj5u.com熱心網友回復:
使用DataFrame.sort_valuesbyID和Timestampfirst,然后GroupBy.cummax用于所有值之后COMPLETE Survey并添加最后一個ID與isinwith不匹配的值DataFrame.drop_duplicates:
df['Timestamp'] = pd.to_datetime(df['Timestamp'])
df = df.sort_values(['ID','Timestamp'])
m = df['Survey Outcome'].eq('COMPLETE Survey')
df1 = df[m.groupby(df['ID']).cummax()]
df2 = df.drop_duplicates('ID', keep='last')
df = df1.append(df2[~df2['ID'].isin(df1['ID'])]).sort_index()
print (df)
ID Timestamp Survey Outcome
2 95 2021-11-27 COMPLETE Survey
3 95 2021-11-28 RANG-But did not connect
4 12 2021-11-29 COMPLETE Survey
7 95 2021-11-28 RANG-But did not connect
8 24 2021-11-28 INCOMPLETE Survey
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/381293.html
