需要按 ColA 分組,并在 ColD 中使用 2/3 天的視窗來分析這些值以創建“NewCol”。示例和更多資訊如下:
ColA ColB ColC ColD
B 2021-10-24 2 NA
B 2021-10-25 4 2
B 2021-10-26 500 496
B 2021-10-27 100 -400
B 2021-10-28 55 -45
B 2021-10-29 600 545
B 2021-10-30 8 -592
B 2021-10-31 4300 4292
B 2021-11-01 200 -4100
H 2021-10-24 600 NA
H 2021-10-25 10000 9400
H 2021-10-26 100 -9900
H 2021-10-27 300 200
H 2021-10-28 2 -292
H 2021-10-29 8 6
H 2021-10-30 600 592
H 2021-10-31 600 0
H 2021-11-01 650 50
目標是創建“NewCol”,它主要查看 2 天和 3 天前的 ColD 及其組 (ColA) 中的關聯值 - 如果值為 200 或更大,則將其標記為 1。如果 ColD 值 < 200,則分配 0。
示例 - 2021 年 10 月 31 日的 B 組 - 將查看 2 天和 3 天前的 ColD 值 - 因此分別為 545 和 -45 - 因為這些值之一大于 200,所以將 NewCol 分配給 1。
ColA ColB ColC ColD NewCol
B 2021-10-24 2 NA NA
B 2021-10-25 4 2 NA
B 2021-10-26 500 496 NA
B 2021-10-27 100 -400 0
B 2021-10-28 55 -45 1
B 2021-10-29 600 545 1
B 2021-10-30 8 -592 0
B 2021-10-31 4300 4292 1
B 2021-11-01 200 -4100 1
H 2021-10-24 600 NA NA
H 2021-10-25 10000 9400 NA
H 2021-10-26 100 -9900 NA
H 2021-10-27 300 200 1
H 2021-10-28 2 -292 1
H 2021-10-29 8 6 1
H 2021-10-30 600 592 1
H 2021-10-31 600 0 0
H 2021-11-01 650 50 1
任何建議表示贊賞!
uj5u.com熱心網友回復:
由于所有日期都是連續的并按組排序,您可以首先將 ColD 中的值與 200 和groupbyColA 進行比較。然后使用這個分組的物件shift兩次。檢查|兩個移位值中是否有任何 (with ) 是True并強制轉換為int
gr = df['ColD'].ge(200).groupby(df['ColA'])
df['newCol'] = (gr.shift(2)|gr.shift(3)).astype(int)
print(df)
# ColA ColB ColC ColD newCol
# 0 B 2021-10-24 2 NaN 0
# 1 B 2021-10-25 4 2.0 0
# 2 B 2021-10-26 500 496.0 0
# 3 B 2021-10-27 100 -400.0 0
# 4 B 2021-10-28 55 -45.0 1
# 5 B 2021-10-29 600 545.0 1
# 6 B 2021-10-30 8 -592.0 0
# 7 B 2021-10-31 4300 4292.0 1
# 8 B 2021-11-01 200 -4100.0 1
# 9 H 2021-10-24 600 NaN 0
# 10 H 2021-10-25 10000 9400.0 0
# 11 H 2021-10-26 100 -9900.0 0
# 12 H 2021-10-27 300 200.0 1
# 13 H 2021-10-28 2 -292.0 1
# 14 H 2021-10-29 8 6.0 1
# 15 H 2021-10-30 600 592.0 1
# 16 H 2021-10-31 600 0.0 0
# 17 H 2021-11-01 650 50.0 1
uj5u.com熱心網友回復:
txt=""" ColA,ColB,ColC,ColD
B,2021-10-24,2,NA
B,2021-10-25,4,2
B,2021-10-26,500,496
B,2021-10-27,100,-400
B,2021-10-28,55,-45
B,2021-10-29,600,545
B,2021-10-30,8,-592
B,2021-10-31,4300,4292
B,2021-11-01,200,-4100
H,2021-10-24,600,NA
H,2021-10-25,0000,9400
H,2021-10-26,100,-9900
H,2021-10-27,300,200
H,2021-10-28,2,-292
H,2021-10-29,8,6
H,2021-10-30,600,592
H,2021-10-31,600,0
H,2021-11-01,650,50"""
df = pd.read_csv(io.StringIO(txt),sep=',',parse_dates=['ColB'])
df['ColD_2']=df['ColD'].shift(2)
df['ColD_3']=df['ColD'].shift(3)
df['ColD_2_check']=np.where(df['ColD_2']>200,1,0)
df['ColD_3_check']=np.where(df['ColD_3']>200,1,0)
df['newCol']=df['ColD_2_check']|df['ColD_3_check']
df.drop(['ColD_2','ColD_3','ColD_2_check','ColD_3_check'],inplace=True,axis=1)
print(df)
輸出
ColA ColB ColC ColD newCol
0 B 2021-10-24 2 NaN 0
1 B 2021-10-25 4 2.0 0
2 B 2021-10-26 500 496.0 0
3 B 2021-10-27 100 -400.0 0
4 B 2021-10-28 55 -45.0 1
5 B 2021-10-29 600 545.0 1
6 B 2021-10-30 8 -592.0 0
7 B 2021-10-31 4300 4292.0 1
8 B 2021-11-01 200 -4100.0 1
9 H 2021-10-24 600 NaN 1
10 H 2021-10-25 0 9400.0 1
11 H 2021-10-26 100 -9900.0 0
12 H 2021-10-27 300 200.0 1
13 H 2021-10-28 2 -292.0 1
14 H 2021-10-29 8 6.0 0
15 H 2021-10-30 600 592.0 0
16 H 2021-10-31 600 0.0 0
17 H 2021-11-01 650 50.0 1
?
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/360929.html
上一篇:從字典創建資料框
