如何在python中一次迭代2行并在python的第二行中附加一列帶有值的列？-有解無憂

資料：

Date    column1 column2 column3 column4
    2021-08-20  19  30  11  8
    2021-08-21  15  25  11  4
    2021-08-22  5   10  5   0
    2021-08-23  25  36  16  9
    2021-08-24  6   6   6   0

我想一次迭代 2 行并創建一個新列，它就像前一天今天的積壓：在每次迭代的每一行中，我想要一個值，如：df['new_column'] = (df['column2' ]-df['column4']) 來自第 2 行 (df['column2']-df['column4']) 來自第 1 行

我正在嘗試這個：

from itertools import tee
from itertools import zip_longest as izip
def pairwise(iterable):
    "s -> (s0,s1), (s1,s2), (s2, s3), ..."
    a, b = tee(iterable)
    next(b, None)
    return izip(a, b)

for (idx1, row1), (idx2, row2) in pairwise(df.iterrows()):
    print(idx1,row1,"\n\n",idx2,row2,"\n\n")
    df['Backlog_today'][row2] = (df.loc[row2, ['column2']]  - df.loc[row2, ['column4']])
    df['Backlog_yesterday'][row1] = (df.loc[row1, ['column2']]  - df.loc[row1, ['column4']])
    df['new_column'] = df['Backlog_today']   df['Backlog_yesterday']

我該如何糾正？

uj5u.com熱心網友回復：

減去值，然后通過Series.add和添加移位的系列Series.shift：

s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift(), fill_value=0)
print (df)
         Date  column1  column2  column3  column4  new_column
0  2021-08-20       19       30       11        8        22.0
1  2021-08-21       15       25       11        4        43.0
2  2021-08-22        5       10        5        0        31.0
3  2021-08-23       25       36       16        9        37.0
4  2021-08-24        6        6        6        0        33.0

如果需要第一個值0：

s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift()).fillna(0)
print (df)
         Date  column1  column2  column3  column4  new_column
0  2021-08-20       19       30       11        8         0.0
1  2021-08-21       15       25       11        4        43.0
2  2021-08-22        5       10        5        0        31.0
3  2021-08-23       25       36       16        9        37.0
4  2021-08-24        6        6        6        0        33.0

表現在樣本資料：

#5k rows
df = pd.concat([df] * 1000, ignore_index=True)


In [141]: %%timeit 
     ...: df.rolling(2)[['column2', 'column4']].sum().agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
     ...: 
46.8 ms ± 274 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [142]: %%timeit
     ...: s = (df['column2']-df['column4'])
     ...: s.add(s.shift(), fill_value=0)
     ...: 
     ...: 
460 μs ± 4.16 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

In [143]: %%timeit
     ...: s = (df['column2']-df['column4'])
     ...: s.add(s.shift()).fillna(0)
     ...: 
     ...: 
496 μs ± 8.35 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

uj5u.com熱心網友回復：

使用rolling計算今天與昨天然后之間的總和aggregate減去兩列。

df['new_column'] = df.rolling(2, min_periods=1)[['column2', 'column4']].sum() \
                     .agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
print(df)

# Output:
         Date  column1  column2  column3  column4  new_column
0  2021-08-20       19       30       11        8        22.0
1  2021-08-21       15       25       11        4        43.0
2  2021-08-22        5       10        5        0        31.0
3  2021-08-23       25       36       16        9        37.0
4  2021-08-24        6        6        6        0        33.0

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/364168.html

標籤：Python 熊猫数据框循环 for循环

上一篇：如何在Python中轉置一個特定的DataFrame列？另外如何從“for”回圈的第二次迭代中獲取某些值？

下一篇：c 中for回圈從1到100的所有3的倍數