讓我們采用以下資料框 - 請忽略輸入的輸出列。輸出列是預期的輸出。這是所需日期的差異
data = [
['Group1', 20211129, 'i', 0, 0],
['Group1', 20211202, 'r', 465852069, 3],
['Group1', 20211202, 'r', 465852070, 3],
['Group1', 20211206, 'i', 0, 0],
['Group1', 20211213, 'i', 0, 0],
['Group2', 20211129, 'i', 0, 0],
['Group2', 20211206, 'i', 0, 0],
['Group2', 20211210, 'r', 466486129, 11],
['Group2', 20211213, 'i', 0, 0],
['Group2', 20211227, 'i', 0, 0],
['Group2', 20220103, 'i', 0, 0],
['Group2', 20220104, 'r', 467650236, 22],
['Group2', 20220105, 'r', 467754363, 23]
]
data = pd.DataFrame(data, columns=['group', 'date', 'type', 'rid', 'output'])
data.date = pd.to_datetime(data.date, yearfirst=True, format='%Y%m%d')
data

對于typer 的每一條記錄,我都需要type在每一條向上的方向上找到最遠的 i,group但不應該越過typer。在上面的例子中,對于第 1 行,第 0 行是最遠的 i。對于第 2 行,第 0 行也是最遠的 i。對于第 7 行,即 Group,第 5 行是最遠的 i。對于第 11 行,第 8 行是最遠的 i,因為我們不能跳過 r。對于第 12 行,第 8 行也是最遠的 i。最終目標是獲取對應于“r”和最遠“i”的日期欄位之間的差異。
我嘗試了 bfillrid但沒有成功。我認為應該有更簡單的方法來實作這一點。
uj5u.com熱心網友回復:
想法是按最后一個r連續值創建組,并在自定義函式中獲取最小的i行索引:
#convert to datetimes
data['date'] = pd.to_datetime(data['date'], format='%Y%m%d')
#get Trues for last r consecutive values by chain with shifted value with compare i
g = data['type'].eq('r') & data['type'].shift(-1, fill_value='i').eq('i')
def f(x):
#get only i rows
m = x['type'].eq('i')
#filter date if exist else None and assign to new column
x['out'] = next(iter(x.loc[m, 'date']), None)
return x
#pas groups by column group and groups by last r with cumulative sum
data = data.groupby(['group', g.iloc[::-1].cumsum().iloc[::-1]]).apply(f)
#last get difference with set 0 if not match r
data['out'] = data['date'].sub(data['out']).dt.days.where(data['type'].eq('r'), 0)
print (data)
group date type rid out
0 Group1 2021-11-29 i 0 0
1 Group1 2021-12-02 r 465852069 3
2 Group1 2021-12-02 r 465852070 3
3 Group1 2021-12-06 i 0 0
4 Group1 2021-12-13 i 0 0
5 Group2 2021-11-29 i 0 0
6 Group2 2021-12-06 i 0 0
7 Group2 2021-12-10 r 466486129 11
8 Group2 2021-12-13 i 0 0
9 Group2 2021-12-27 i 0 0
10 Group2 2022-01-03 i 0 0
11 Group2 2022-01-04 r 467650236 22
12 Group2 2022-01-05 r 467754363 23
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/444006.html
