資料:
Date column1 column2 column3 column4
2021-08-20 19 30 11 8
2021-08-21 15 25 11 4
2021-08-22 5 10 5 0
2021-08-23 25 36 16 9
2021-08-24 6 6 6 0
我想一次迭代 2 行并創建一個新列,它就像前一天 今天的積壓:在每次迭代的每一行中,我想要一個值,如:df['new_column'] = (df['column2' ]-df['column4']) 來自第 2 行 (df['column2']-df['column4']) 來自第 1 行
我正在嘗試這個:
from itertools import tee
from itertools import zip_longest as izip
def pairwise(iterable):
"s -> (s0,s1), (s1,s2), (s2, s3), ..."
a, b = tee(iterable)
next(b, None)
return izip(a, b)
for (idx1, row1), (idx2, row2) in pairwise(df.iterrows()):
print(idx1,row1,"\n\n",idx2,row2,"\n\n")
df['Backlog_today'][row2] = (df.loc[row2, ['column2']] - df.loc[row2, ['column4']])
df['Backlog_yesterday'][row1] = (df.loc[row1, ['column2']] - df.loc[row1, ['column4']])
df['new_column'] = df['Backlog_today'] df['Backlog_yesterday']
我該如何糾正?
uj5u.com熱心網友回復:
減去值,然后通過Series.add和添加移位的系列Series.shift:
s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift(), fill_value=0)
print (df)
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 22.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
如果需要第一個值0:
s = (df['column2']-df['column4'])
df['new_column'] = s.add(s.shift()).fillna(0)
print (df)
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 0.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
表現在樣本資料:
#5k rows
df = pd.concat([df] * 1000, ignore_index=True)
In [141]: %%timeit
...: df.rolling(2)[['column2', 'column4']].sum().agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
...:
46.8 ms ± 274 μs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [142]: %%timeit
...: s = (df['column2']-df['column4'])
...: s.add(s.shift(), fill_value=0)
...:
...:
460 μs ± 4.16 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
In [143]: %%timeit
...: s = (df['column2']-df['column4'])
...: s.add(s.shift()).fillna(0)
...:
...:
496 μs ± 8.35 μs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
uj5u.com熱心網友回復:
使用rolling計算今天與昨天然后之間的總和aggregate減去兩列。
df['new_column'] = df.rolling(2, min_periods=1)[['column2', 'column4']].sum() \
.agg(lambda x:x[0] - x[1] ,axis=1).fillna(0)
print(df)
# Output:
Date column1 column2 column3 column4 new_column
0 2021-08-20 19 30 11 8 22.0
1 2021-08-21 15 25 11 4 43.0
2 2021-08-22 5 10 5 0 31.0
3 2021-08-23 25 36 16 9 37.0
4 2021-08-24 6 6 6 0 33.0
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/364168.html
