我在包含日期和時間的 DataFrame 中有兩列資料。兩者都以字串開頭。我希望它們最終合并為日期時間格式的單個列。
DataFrame 的頭部是:
Date variable value
0 '04/10/2020' '00:30' 81.310
1 '05/10/2020' '00:30' 121.245
2 '06/10/2020' '00:30' 77.020
3 '07/10/2020' '00:30' 100.705
4 '08/10/2020' '00:30' 114.370
它們在一個被呼叫的 DF 中df_flattened,大約有 20k 行,我目前使用的代碼是:
df_flattened['DateTime'] = df_flattened.apply(lambda x: x['Date'] ' ' x['variable'], axis=1)
df_flattened['DateTime'] = pd.to_datetime(df_flattened['DateTime'])
但是,這需要大約 2.6 秒才能運行,并且資料集將來會變得更大。任何人都可以建議一種快速的方法嗎?
uj5u.com熱心網友回復:
您可以 改為使用連接列apply:
df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] ' ' df_flattened['variable'])
也可以指定連接日期時間的格式:
df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] ' ' df_flattened['variable'], format='%d/%m/%Y %H:%M')
20k 行的性能:
#20k rows
df_flattened = pd.concat([df_flattened] * 4000, ignore_index=True)
In [44]: %%timeit
...: df_flattened['DateTime'] = df_flattened.apply(lambda x: x['Date'] ' ' x['variable'], axis=1)
...: df_flattened['DateTime'] = pd.to_datetime(df_flattened['DateTime'])
...:
...:
325 ms ± 26.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [45]: %timeit df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] ' ' df_flattened['variable'])
11.9 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [46]: %timeit df_flattened['DateTime'] = pd.to_datetime(df_flattened['Date'] ' ' df_flattened['variable'], format='%d/%m/%Y %H:%M')
9.55 ms ± 96.9 μs per loop (mean ± std. dev. of 7 runs, 100 loops each)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/377063.html
