我有一個時間序列的資料框,我想獲得問題CLOSE和SUBMISSION問題之間差異的總和。但是,我希望它只在CLOSE值高于值時減去SUBMISSION。以下是資料點(按 排序CLOSE)、預期輸出和我嘗試的代碼:
df = pd.DataFrame({'REF_KEY': [1, 2, 3, 4, 5], 'SUBMISSION': ['2018-08-21', '2018-09-03', '2018-09-07', '2018-09-06', '2018-08-28'], 'CLOSE': ['2018-09-05', '2018-09-12', '2018-09-18', '2018-09-24', '2018-09-27']})
df['CLOSE'] = df['CLOSE'].astype('datetime64[ns]')
df['SUBMISSION'] = df['SUBMISSION'].astype('datetime64[ns]')
對于REF_KEY == 1,ACCUM_DATE_DELTA應該是以下各項的總和:
- ('2018-09-05' - '2018-08-21') 的 15 天差異
- ('2018-09-05' - '09-03-2018') 之間的 2 天差異
- ('2018-09-05' - '2018-08-28') 之間的 8 天差異使它成為 26
對于REF_KEY == 2,您將獲得以下總和:
- ('2018-09-12' - '2018-08-21') 之間的 22 天差異
- ('2018-09-12' - '2018-09-03') 之間相差 9 天
- ('2018-09-12' - '2018-09-07') 之間的 5 天差異
- ('2018-09-12' - '2018-09-06') 之間的 6 天差異
- ('2018-09-12' - '2018-08-28') 之間的 15 天差異
所以對于REF_KEY == 1,你可以看到它的關閉日期之間的差異包括REF_KEY == [3, 4],那是因為CLOSE大于SUBMISSION。因此,我有一個想法,即創建一個CLOSE日期必須超過SUBMISSION日期的條件。
df_2 = pd.DataFrame({'REF_KEY': [1, 2, 3, 4, 5],
'SUBMISSION': ['2018-08-21', '2018-09-03', '2018-09-07', '2018-09-06', '2018-08-28'], 'CLOSE': ['2018-09-05', '2018-09-12', '2018-09-18', '2018-09-24', '2018-09-27'], 'ACCUM_DATE_DELTA': [25, 57, 86, 116, 131]})
df_2['CLOSE'] = df['CLOSE'].astype('datetime64[ns]')
df_2['SUBMISSION'] = df['SUBMISSION'].astype('datetime64[ns]')
嘗試的代碼:
df_2['ACCUM_DATE_DELTA'] = df_2['CLOSE']*len(df_2[df_2['CLOSE'] - df_2['SUBMISSION]]['SUBMISSION'].cumsum()) - df_2[df_2['CLOSE'] - df_2['SUBMISSION]]['SUBMISSION'].cumsum()
uj5u.com熱心網友回復:
- 交叉
merge生成SUBMISSIONx的笛卡爾積CLOSE - 只保留行
whereCLOSE > SUBMISSION groupby該CLOSE日期與和該集團的CLOSE - SUBMISSION日子merge該ACCUM值恢復到原來的DF
m = pd.merge(df.SUBMISSION, df.CLOSE, how='cross') # cross-merge for all SUBMISSION x CLOSE combos
accum = (m.where(m.CLOSE > m.SUBMISSION) # limit to CLOSE > SUBMISSION
.groupby('CLOSE').SUBMISSION # group by CLOSE
.apply(lambda g: (g.name - g).sum()) # sum of all (CLOSE - SUBMISSION)
.rename('ACCUM'))
df.merge(accum, on='CLOSE') # merge back to df
輸出:
REF_KEY SUBMISSION CLOSE ACCUM
0 1 2018-08-21 2018-09-05 25 days
1 2 2018-09-03 2018-09-12 57 days
2 3 2018-09-07 2018-09-18 87 days
3 4 2018-09-06 2018-09-24 117 days
4 5 2018-08-28 2018-09-27 132 days
筆記:
how='cross'需要 pandas 1.2.0 ,因此對于早期版本,merge在虛擬key列上:m = df[['SUBMISSION']].assign(key=0).merge(df[['CLOSE']].assign(key=0), on='key').drop(columns='key')與 Jonathan 的解決方案一樣,與您的輸出相比,這些天數相差 1。
uj5u.com熱心網友回復:
我不是誰的計算是正確的。如果是我的,請告訴我這是否是您想要的。
對于 'REF_KEY' == 1,您可以使用它來查找累計天數
(df['CLOSE'][0] - df['SUBMISSION']).dt.days.clip(lower=0).sum()
df['CLOSE'][0]是根據所有提交日期計算的第一個截止日期;dt.days以整數形式給出天數
(df['CLOSE'][0] - df['SUBMISSION']).dt.days
0 15
1 2
2 -2
3 -1
4 8
Name: SUBMISSION, dtype: int64
使用clip(lower=0).sum()以改變負值到零,總和
(df['CLOSE'][0] - df['SUBMISSION']).dt.days.clip(lower=0).sum()
result = 25
要自動執行此操作,請apply()與自定義函式一起使用
def calc(x):
# print((x - df['SUBMISSION']).dt.days.clip(lower=0).sum())
return (x - df['SUBMISSION']).dt.days.clip(lower=0).sum()
df
REF_KEY SUBMISSION CLOSE
0 1 2018-08-21 2018-09-05
1 2 2018-09-03 2018-09-12
2 3 2018-09-07 2018-09-18
3 4 2018-09-06 2018-09-24
4 5 2018-08-28 2018-09-27
df['ACCUM_DATE_DELTA'] = df.apply(lambda x: calc(x['CLOSE']), axis=1)
REF_KEY SUBMISSION CLOSE ACCUM_DATE_DELTA
0 1 2018-08-21 2018-09-05 25
1 2 2018-09-03 2018-09-12 57
2 3 2018-09-07 2018-09-18 87
3 4 2018-09-06 2018-09-24 117
4 5 2018-08-28 2018-09-27 132
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/376312.html
下一篇:在ggplot2/plotly中,當我使用`geom_bar(stat='identity',position='fill')`時,如何將數字提示更改為百分比格
