PandasDataframes洗掉重復索引，根據列值首先保留最大值-有解無憂

這是我目前的 df。我想分 3 步轉換資料框。我需要洗掉重復的時間戳，但希望根據“Side”列保留最大值或最小值。請幫忙：）

我試過df= df[~df.index.duplicated(keep='first')]但是這沒有保留最大值或最小值的選項

索引的型別是日期時間格式，價格是浮點數，邊是整數，資料框有8000 行。

                          Price      Side  
2021-12-13 00:00:03.285   51700      4     
2021-12-13 00:00:03.315   51675      3    
2021-12-13 00:00:03.333   50123      4    
2021-12-13 00:00:03.333   50200      3    
2021-12-13 00:00:03.333   50225      3   
2021-12-13 00:00:03.333   50250      3    
2021-12-13 00:00:03.421   50123      4     
2021-12-13 00:00:03.421   50117      4     
2021-12-13 00:00:03.421   50110      4    
2021-12-13 00:00:03.671   50100      3

如果時間重復，則邊為“3”保持最高值，如果時間重復且邊為“4”保持最低值。

Desired Output:
                          Price      Side  
2021-12-13 00:00:03.285   51700      4     
2021-12-13 00:00:03.315   51675      3    
2021-12-13 00:00:03.333   50123      4 
2021-12-13 00:00:03.333   50250      3     
2021-12-13 00:00:03.421   50110      4     
2021-12-13 00:00:03.671   50100      3

創建具有相應價格的新列“3”和“4”

Desired Output:
                          Price      3         4  
2021-12-13 00:00:03.285   51700      0         51700
2021-12-13 00:00:03.315   51675      51675     0  
2021-12-13 00:00:03.333   50123      0         50123
2021-12-13 00:00:03.333   50250      50250     0     
2021-12-13 00:00:03.421   50110      0         50110  
2021-12-13 00:00:03.671   50100      50100     0

用同一列中以前的值填空

Desired Output:
                          Price      3         4  
2021-12-13 00:00:03.285   51700      0         51700  
2021-12-13 00:00:03.315   51675      51675     51700  
2021-12-13 00:00:03.333   50123      51675     50123
2021-12-13 00:00:03.333   50250      50250     50123     
2021-12-13 00:00:03.421   50110      50250     50110  
2021-12-13 00:00:03.671   50100      50100     50110

uj5u.com熱心網友回復：

new_df = (df
    .groupby([pd.Grouper(level=0), 'Side'])
    .apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min())
    .reset_index()
)
new_df = (
    pd.concat([
        new_df,
        (new_df
            .pivot(columns='Side', values=0)
            .ffill()
            .fillna(0)
        )
    ], axis=1)
    .drop('Side', axis=1)
    .rename({0: 'Price'}, axis=1)
)

輸出：

>>> df
                    index  Price        3        4
0 2021-12-13 00:00:03.285  51700      0.0  51700.0
1 2021-12-13 00:00:03.315  51675  51675.0  51700.0
2 2021-12-13 00:00:03.333  50250  50250.0  51700.0
3 2021-12-13 00:00:03.333  50123  50250.0  50123.0
4 2021-12-13 00:00:03.421  50110  50250.0  50110.0
5 2021-12-13 00:00:03.671  50100  50100.0  50110.0

精簡版：

new_df = df.groupby([pd.Grouper(level=0), 'Side']).apply(lambda x: x['Price'].max() if x['Side'].mode()[0] == 3 else x['Price'].min()).reset_index()
new_df = pd.concat([new_df, new_df.pivot(columns='Side', values=0).ffill().fillna(0)], axis=1).drop('Side', axis=1).rename({0:'Price'}, axis=1))

uj5u.com熱心網友回復：

這是一種選擇，有點長：

(df.assign(temp = df.Side.map({4:'low', 3:'high'}))
.groupby([pd.Grouper(level=0), 'Side', 'temp'], sort = False)
.Price
.agg(['min', 'max'])
.unstack('Side')
.loc(axis=1)[[('max', 3), ('min', 4)]]
.droplevel(level = 0,axis = 1)
.droplevel(level = 'temp')
.assign(Price=lambda df: df[3].where(df[3].notna(), df[4]))
.ffill()
.fillna(0)
.astype(int)
.rename_axis(columns = None)
)

                             3      4  Price
2021-12-13 00:00:03.285      0  51700  51700
2021-12-13 00:00:03.315  51675  51700  51675
2021-12-13 00:00:03.333  51675  50123  50123
2021-12-13 00:00:03.333  50250  50123  50250
2021-12-13 00:00:03.421  50250  50110  50110
2021-12-13 00:00:03.671  50100  50110  50100

這假設中的唯一值Side是 3 和 4。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/389518.html

標籤：熊猫数据框 jupyter-笔记本时间序列贸易

上一篇：如何按列對資料框進行分組？

下一篇：Pandas資料幀規范化回傳NaN