如何從基于日期時間不連續的資料框中洗掉特定人的所有值-有解無憂

date      consumption  customer_id
2018-01-01     12             111
2018-01-02     12             111
*2018-01-03*   14             111   
*2018-01-05*   12             111
2018-01-06     45             111
2018-01-07     34             111 
2018-01-01     23             112 
2018-01-02     23             112
2018-01-03     45             112
2018-01-04     34             112
2018-01-05     23             112
2018-01-06     34             112
2018-01-01     23             113
2018-01-02     34             113
2018-01-03     45             113
2018-01-04     34             113

客戶 111 中的值不連續，它在 2018-01-04 中有缺失值，所以我想從 pandas 的資料框中洗掉所有 111。

date      consumption  customer_id
2018-01-01     23             112 
2018-01-02     23             112
2018-01-03     45             112
2018-01-04     34             112
2018-01-05     23             112
2018-01-06     34             112
2018-01-01     23             113
2018-01-02     34             113
2018-01-03     45             113
2018-01-04     34             113

我想要這樣的結果？大熊貓怎么可能？

uj5u.com熱心網友回復：

您可以計算連續的增量并檢查是否有大于 1d 的：

drop = (pd.to_datetime(df['date'])
          .groupby(df['customer_id'])
          .apply(lambda s: s.diff().gt('1d').any())
       )

out = df[df['customer_id'].isin(drop[~drop].index)]

或與groupby.filter：

df['date'] = pd.to_datetime(df['date'])

out = (df.groupby(df['customer_id'])
         .filter(lambda d: ~d['date'].diff().gt('1d').any())
       )

輸出：

          date  consumption  customer_id
6   2018-01-01           23          112
7   2018-01-02           23          112
8   2018-01-03           45          112
9   2018-01-04           34          112
10  2018-01-05           23          112
11  2018-01-06           34          112
12  2018-01-01           23          113
13  2018-01-02           34          113
14  2018-01-03           45          113
15  2018-01-04           34          113

如果你的日期不一定增加，還要檢查你不能及時回去：

df['date'] = pd.to_datetime(df['date'])

out = (df.groupby(df['customer_id'])
         .filter(lambda d: d['date'].diff().iloc[1:].eq('1d').all())
       )

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/535915.html

標籤：Python熊猫数据框约会时间数据预处理

上一篇：如何在python中獲取前9個月的最后日期

下一篇：從日期時間np.array中洗掉nan：從具有唯一值的日期時間列中提取的陣列