我有一個資料框
import pandas as pd
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]
df = pd.DataFrame([dates, values], index=['dates', 'values']).T
我想在values不為 0的行之間獲取 TimeDelta 。對于每個單獨的范圍。
所以這里是 9 秒和 4 秒。TimeDelta 應出現在相應時間范圍末尾的新列中。
關于如何做到這一點的任何提示?
謝謝
編輯:
想要的 DataFrame 應該看起來像
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545,123]
delta = [None,None,None,None,None,None,None,None,None,'0 days 00:00:09',None,None,None,None,None,None,None,None,None,None,'0 days 00:00:04']
df = pd.DataFrame([dates, values, delta], index=['dates', 'values', 'timedelta']).T
其中列timedelta中的值是 dtype timedelta64[ns]。
uj5u.com熱心網友回復:
dates = pd.date_range(start='05.01.2021 00:00:00', end='05.01.2021 00:00:20', freq='S')
values = [1000,343,122.34,342.6,76.45,202,264.32,9454.3,1000,1000,0,0,0,0,0,0,232,2323,5562,3545, 123]
df = pd.DataFrame([dates, values], index=['dates', 'values']).T
df['timedelta'] = df.groupby(
(df['values'] == 0).diff().cumsum().fillna(0)
)['dates'].transform(lambda x: x.iloc[-1] - x.iloc[0])
輸出:
dates values timedelta
0 2021-05-01 00:00:00 1000 0 days 00:00:09
1 2021-05-01 00:00:01 343 0 days 00:00:09
2 2021-05-01 00:00:02 122.34 0 days 00:00:09
3 2021-05-01 00:00:03 342.6 0 days 00:00:09
4 2021-05-01 00:00:04 76.45 0 days 00:00:09
5 2021-05-01 00:00:05 202 0 days 00:00:09
6 2021-05-01 00:00:06 264.32 0 days 00:00:09
7 2021-05-01 00:00:07 9454.3 0 days 00:00:09
8 2021-05-01 00:00:08 1000 0 days 00:00:09
9 2021-05-01 00:00:09 1000 0 days 00:00:09
10 2021-05-01 00:00:10 0 0 days 00:00:05
11 2021-05-01 00:00:11 0 0 days 00:00:05
12 2021-05-01 00:00:12 0 0 days 00:00:05
13 2021-05-01 00:00:13 0 0 days 00:00:05
14 2021-05-01 00:00:14 0 0 days 00:00:05
15 2021-05-01 00:00:15 0 0 days 00:00:05
16 2021-05-01 00:00:16 232 0 days 00:00:04
17 2021-05-01 00:00:17 2323 0 days 00:00:04
18 2021-05-01 00:00:18 5562 0 days 00:00:04
19 2021-05-01 00:00:19 3545 0 days 00:00:04
20 2021-05-01 00:00:20 123 0 days 00:00:04
解釋:
(df['values'] == 0).diff().cumsum().fillna(0)groupby中使用的系列是
0 0.0
1 0.0
2 0.0
3 0.0
4 0.0
5 0.0
6 0.0
7 0.0
8 0.0
9 0.0
10 1.0
11 1.0
12 1.0
13 1.0
14 1.0
15 1.0
16 2.0
17 2.0
18 2.0
19 2.0
20 2.0
并且它標識連續行的組,其中列values始終為 0 或始終不為 0。
請注意,通過這種方式,該列timedelta包含的內容超出了您的需要。如果您希望輸出與您描述的完全一樣,您可以添加
df.loc[~((df['values'] == 0).diff().shift(-1).fillna(True) & (df['values'] != 0)), 'timedelta'] = np.nan
輸出:
dates values timedelta
0 2021-05-01 00:00:00 1000 NaT
1 2021-05-01 00:00:01 343 NaT
2 2021-05-01 00:00:02 122.34 NaT
3 2021-05-01 00:00:03 342.6 NaT
4 2021-05-01 00:00:04 76.45 NaT
5 2021-05-01 00:00:05 202 NaT
6 2021-05-01 00:00:06 264.32 NaT
7 2021-05-01 00:00:07 9454.3 NaT
8 2021-05-01 00:00:08 1000 NaT
9 2021-05-01 00:00:09 1000 0 days 00:00:09
10 2021-05-01 00:00:10 0 NaT
11 2021-05-01 00:00:11 0 NaT
12 2021-05-01 00:00:12 0 NaT
13 2021-05-01 00:00:13 0 NaT
14 2021-05-01 00:00:14 0 NaT
15 2021-05-01 00:00:15 0 NaT
16 2021-05-01 00:00:16 232 NaT
17 2021-05-01 00:00:17 2323 NaT
18 2021-05-01 00:00:18 5562 NaT
19 2021-05-01 00:00:19 3545 NaT
20 2021-05-01 00:00:20 123 0 days 00:00:04
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/357914.html
