假設我有以下資料框:
import pandas as pd
import numpy as np
data = np.random.randint(1, 10, size=(10,2))
df = pd.DataFrame(data, columns=['x1', 'x2'])
df['switch'] = [1,1,0,0,1,1,0,0,1,1]
index_ = pd.date_range('2022-01-17 13:00:00', periods=10, freq='5s')
df.index = index_.rename('Time')
導致:
x1 x2 switch
Time
2022-01-17 13:00:00 2 6 1
2022-01-17 13:00:05 9 8 1
2022-01-17 13:00:10 4 9 0
2022-01-17 13:00:15 5 6 0
2022-01-17 13:00:20 4 9 1
2022-01-17 13:00:25 6 7 1
2022-01-17 13:00:30 4 6 0
2022-01-17 13:00:35 2 3 0
2022-01-17 13:00:40 4 9 1
2022-01-17 13:00:45 5 2 1
我正在尋找一種方法來獲取開關值為 1 的每個塊的開始時間、結束時間和 x1 和 x2 的方法。
所以在這里(例如):
開始時間:2022-01-17 13:00:00
結束:2022-01-17 13:00:05
x1 均值:5.5
x2 均值:7
我不知道如何檢測 switch 列中的變化,然后從變化之間發生的值構建平均值并回傳 switch 列中發生變化的時間。
uj5u.com熱心網友回復:
您可以將groupby agg與自定義組一起使用:
df2 = df.reset_index()
df2['Time'] = pd.to_datetime(df2['Time'])
(df2[df2['switch'].eq(1)] # keep only rows with switch 1
.groupby(df2['switch'].ne(1).cumsum()) # group by consecutive 1s
.agg({'x1': 'mean', 'x2': 'mean', 'Time': ('min', 'max')})
)
輸出:
x1 x2 Time
mean mean min max
switch
0 7.5 4.0 2022-01-17 13:00:00 2022-01-17 13:00:05
2 3.0 4.5 2022-01-17 13:00:20 2022-01-17 13:00:25
4 6.0 3.5 2022-01-17 13:00:40 2022-01-17 13:00:45
替代輸出:
df2 = df.reset_index()
df2['Time'] = pd.to_datetime(df2['Time'])
(df2[df2['switch'].eq(1)]
.groupby(df2['switch'].ne(1).cumsum())
.agg(avg_x1=('x1', 'mean'),
avg_x2=('x2', 'mean'),
start=('Time', 'min'),
end=('Time', 'max'))
.reset_index(drop=True)
)
輸出:
avg_x1 avg_x2 start end
0 7.5 4.0 2022-01-17 13:00:00 2022-01-17 13:00:05
1 3.0 4.5 2022-01-17 13:00:20 2022-01-17 13:00:25
2 6.0 3.5 2022-01-17 13:00:40 2022-01-17 13:00:45
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/425819.html
標籤:python-3.x 熊猫
