這是我目前的 df。我想分 3 步轉換資料框。我需要洗掉重復的時間戳,但希望根據“Side”列保留最大值或最小值。請幫忙 :)
我試過df= df[~df.index.duplicated(keep='first')]但是這沒有保留最大值或最小值的選項
索引的型別是日期時間格式,價格是浮點數,邊是整數,資料框有8000 行。
Price Side
2021-12-13 00:00:03.285 51700 4
2021-12-13 00:00:03.315 51675 3
2021-12-13 00:00:03.333 50123 4
2021-12-13 00:00:03.333 50200 3
2021-12-13 00:00:03.333 50225 3
2021-12-13 00:00:03.333 50250 3
2021-12-13 00:00:03.421 50123 4
2021-12-13 00:00:03.421 50117 4
2021-12-13 00:00:03.421 50110 4
2021-12-13 00:00:03.671 50100 3
- 如果時間重復,則邊為“3”保持最高值,如果時間重復且邊為“4”保持最低值。
Desired Output:
Price Side
2021-12-13 00:00:03.285 51700 4
2021-12-13 00:00:03.315 51675 3
2021-12-13 00:00:03.333 50123 4
2021-12-13 00:00:03.333 50250 3
2021-12-13 00:00:03.421 50110 4
2021-12-13 00:00:03.671 50100 3
- 創建具有相應價格的新列“3”和“4”
Desired Output:
Price 3 4
2021-12-13 00:00:03.285 51700 0 51700
2021-12-13 00:00:03.315 51675 51675 0
2021-12-13 00:00:03.333 50123 0 50123
2021-12-13 00:00:03.333 50250 50250 0
2021-12-13 00:00:03.421 50110 0 50110
2021-12-13 00:00:03.671 50100 50100 0
- 用同一列中以前的值填空
Desired Output:
Price 3 4
2021-12-13 00:00:03.285 51700 0 51700
2021-12-13 00:00:03.315 51675 51675 51700
2021-12-13 00:00:03.333 50123 51675 50123
2021-12-13 00:00:03.333 50250 50250 50123
2021-12-13 00:00:03.421 50110 50250 50110
2021-12-13 00:00:03.671 50100 50100 50110
uj5u.com熱心網友回復:
new_df = (df
.groupby([pd.Grouper(level=0), 'Side'])
.apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min())
.reset_index()
)
new_df = (
pd.concat([
new_df,
(new_df
.pivot(columns='Side', values=0)
.ffill()
.fillna(0)
)
], axis=1)
.drop('Side', axis=1)
.rename({0: 'Price'}, axis=1)
)
輸出:
>>> df
index Price 3 4
0 2021-12-13 00:00:03.285 51700 0.0 51700.0
1 2021-12-13 00:00:03.315 51675 51675.0 51700.0
2 2021-12-13 00:00:03.333 50250 50250.0 51700.0
3 2021-12-13 00:00:03.333 50123 50250.0 50123.0
4 2021-12-13 00:00:03.421 50110 50250.0 50110.0
5 2021-12-13 00:00:03.671 50100 50100.0 50110.0
精簡版:
new_df = df.groupby([pd.Grouper(level=0), 'Side']).apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min()).reset_index()
new_df = pd.concat([new_df, new_df.pivot(columns='Side', values=0).ffill().fillna(0)], axis=1).drop('Side', axis=1).rename({0:'Price'}, axis=1))
uj5u.com熱心網友回復:
這是一種選擇,有點長:
(df.assign(temp = df.Side.map({4:'low', 3:'high'}))
.groupby([pd.Grouper(level=0), 'Side', 'temp'], sort = False)
.Price
.agg(['min', 'max'])
.unstack('Side')
.loc(axis=1)[[('max', 3), ('min', 4)]]
.droplevel(level = 0,axis = 1)
.droplevel(level = 'temp')
.assign(Price=lambda df: df[3].where(df[3].notna(), df[4]))
.ffill()
.fillna(0)
.astype(int)
.rename_axis(columns = None)
)
3 4 Price
2021-12-13 00:00:03.285 0 51700 51700
2021-12-13 00:00:03.315 51675 51700 51675
2021-12-13 00:00:03.333 51675 50123 50123
2021-12-13 00:00:03.333 50250 50123 50250
2021-12-13 00:00:03.421 50250 50110 50110
2021-12-13 00:00:03.671 50100 50110 50100
這假設 中的唯一值Side是 3 和 4。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/389518.html
標籤:熊猫 数据框 jupyter-笔记本 时间序列 贸易
上一篇:如何按列對資料框進行分組?
