為與先前匹配項相關的每一行創建新列-有解無憂

我有當前比賽的比賽和主隊結果的資料集

match_date  home    away    home_result   
2021-11-22  team1   team2   Win
2021-11-22  team3   team4   Win 
2021-11-23  team1   team8   Lose
2021-11-23  team6   team7   Win
2021-11-25  team1   team2   Win 
2021-11-25  team3   team8   Lose 
2021-11-25  team1   team5   Lose 
2021-11-25  team6   team5   Win 
2021-11-28  team3   team1   Lose 
2021-11-29  team1   team5   Win 
2021-11-29  team6   team9   Win

我想創建新的列，我可以在當前比賽之前為每個主隊放置之前的結果，例如 team1 在 2021-11-22（沒有之前的比賽）和 2021-11-23（之前的比賽 team1 Win）和 2021 -11-25（前幾場比賽 team1 贏、輸）和 2021-11-29（前場比賽 team1 贏、輸、輸）這是預期的列：

match_date  home    away    home_result   home_team_previous_results
2021-11-22  team1   team2   Win           NaN
2021-11-22  team3   team4   Win           NaN
2021-11-23  team1   team8   Lose          [("Win","2021-11-22")]  
2021-11-23  team6   team7   Win           NaN 
2021-11-25  team1   team2   Win           [("Win","2021-11-22"), ("Lose","2021-11-23")]
2021-11-25  team3   team8   Lose          [("Win","2021-11-22")]
2021-11-25  team1   team5   Lose          [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25")]
2021-11-25  team6   team5   Win           [("Win","2021-11-23")]
2021-11-28  team3   team1   Lose          [("Win","2021-11-22"), ("Lose","2021-11-25")]
2021-11-29  team1   team5   Win           [("Win","2021-11-22"), ("Lose","2021-11-23"), ("Win","2021-11-25"), ("Lose","2021-11-25")]
2021-11-29  team6   team9   Win           [("Win","2021-11-23"), ("Win","2021-11-25")]

uj5u.com熱心網友回復：

一個粗糙的解決方案：

df['home_team_previous_results'] = (
    df.groupby('home')
    .apply(
        lambda x: pd.Series(
            [
                [
                    tuple([row[col] for col in ['home_result', 'match_date']])
                    for _, row in x.iloc[0:i].iterrows()
                ] or np.nan
                for i in range(len(x))
            ],
        index=x.index)
    ).droplevel(0)
)

輸出：

>>> df
    match_date   home   away home_result                         home_team_previous_results
0   2021-11-22  team1  team2         Win                                                NaN
1   2021-11-22  team3  team4         Win                                                NaN
2   2021-11-23  team1  team8        Lose                                [(Win, 2021-11-22)]
3   2021-11-23  team6  team7         Win                                                NaN
4   2021-11-25  team1  team2         Win            [(Win, 2021-11-22), (Lose, 2021-11-23)]
5   2021-11-25  team3  team8        Lose                                [(Win, 2021-11-22)]
6   2021-11-25  team1  team5        Lose  [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
7   2021-11-25  team6  team5         Win                                [(Win, 2021-11-23)]
8   2021-11-28  team3  team1        Lose            [(Win, 2021-11-22), (Lose, 2021-11-25)]
9   2021-11-29  team1  team5         Win  [(Win, 2021-11-22), (Lose, 2021-11-23), (Win, ...
10  2021-11-29  team6  team9         Win             [(Win, 2021-11-23), (Win, 2021-11-25)]

單線版：

df['home_team_previous_results'] = df.groupby('home').apply(lambda x: pd.Series([[tuple([row[col] for col in ['home_result', 'match_date']]) for _, row in x.iloc[0:i].iterrows()] or np.nan for i in range(len(x))], index=x.index)).droplevel(0)

uj5u.com熱心網友回復：

不幸的是，我不相信 Pandas 支持有效的解決方案。

assert isinstance(df.index, pd.RangeIndex)  # This solution assumes a RangeIndex

df['home_team_previous_results'] = pd.Series(dtype=object)

team_frames = dict(list(df.groupby('home')))

for i, row in df.iterrows():
    previous = team_frames[row['home']].loc[:i-1, ['home_result', 'match_date']]
    records = list(previous.to_records(index=False)) or float('nan')
    df.at[i, 'home_team_previous_results'] = records

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/392005.html

標籤：Python 熊猫

上一篇：我想從pandas列中的行中洗掉特定資料

下一篇：復制選定的行并將副本放在PandasDataFrame的正下方