我有帶有日期時間格式索引的 DataFrame。我需要保留與至少 2 天相鄰的日期(我的意思是應該連續 3 天在一起)。請分享您的解決方案。
例如,
Date
2021.11.08 #<-
2021.11.09 #<-
2021.11.10 #<-
2021.11.12
2021.11.13
2021.11.16 #<-
2021.11.17 #<-
2021.11.18 #<-
2021.11.19 #<-
2021.11.22
2021.11.23
<- 被選中
uj5u.com熱心網友回復:
嘗試groupby:
#convert to datetime
df["Date"] = pd.to_datetime(df["Date"], format="%Y.%m.%d")
#check if adjacent rows are 1 day apart
adjacent = df["Date"].diff().dt.days.fillna(1).eq(1)
#get sequences with a minimum length of 3
mask = df.groupby(adjacent.ne(adjacent.shift()).cumsum())["Date"].transform('count').ge(3)
output = df[mask|mask.shift(-1)]
>>> output
Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19
uj5u.com熱心網友回復:
和groupby filter
oneday = pd.offsets.Day(1)
diff = df.Date.diff().bfill()
df.groupby(
diff.ne(oneday).cumsum()
).filter(lambda d: len(d) > 2)
Date
0 2021-11-08
1 2021-11-09
2 2021-11-10
5 2021-11-16
6 2021-11-17
7 2021-11-18
8 2021-11-19
uj5u.com熱心網友回復:
使用蒙版切片:
N=3
# find start of groups
m = ~pd.to_datetime(df['Date']).diff().eq('1d')
# check size and keep if ≥ N
df[m.groupby(m.cumsum()).transform('size').ge(N)]
輸出:
Date
0 2021.11.08
1 2021.11.09
2 2021.11.10
5 2021.11.16
6 2021.11.17
7 2021.11.18
8 2021.11.19
保持每隔一個元素
N = 3
m = ~pd.to_datetime(df['Date']).diff().eq('1d')
g = m.groupby(m.cumsum())
m1 = g.transform('size').ge(N)
m2 = g.cumcount().mod(2) # odd lines
df[m1&m2]
輸出:
Date
1 2021.11.09
6 2021.11.17
8 2021.11.19
注意。如果您只想要第二個而不是每一秒,請使用eq(1)代替mod(2)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/453888.html
