我有一個帶有時間戳的股票資訊資料集,時間跨度超過一個月,但我需要只抓取4:00PM的時間。與此最接近的時間是16:00:03,這是我目前使用的時間。我通過手動輸入日期來硬編碼8月份的值,但我想改變這一點,以便我可以指定使用哪個月份,而不是每天都輸入,或者設定一個開始和結束日期。
df = df.loc[((df["timestamp"/span>] == "2021-08-02 16:00:03"/span>) |
((df["timestamp"/span>] == "2021-08-03 16:00:03"/span>)
((df["timestamp"/span>] == "2021-08-04 16:00:03"/span>)
((df["timestamp"/span>] == "2021-08-05 16:00:03"/span>)
((df["timestamp"/span>] == "2021-08-06 16:00:03"/span>) ]
timestamp bidprice askprice
0 2021-08-02 14:59:03 99. 937500 99.949219 :03.
1 2021-08-02 15:00:03 99. 941406 99.945312 :03.
2 2021-08-02 15:01:03 99.941406 99.945312
3 2021-08-02 15:02:03 99.941406 99.945312[/span
4 2021-08-02 15:03:0399.941406 99.945312
...
時間戳 bidprice askprice
468109 2021-09-01 22:55:02 110. 500000 110.546875 :02.
468110 2021-09-01 22:56:02 110. 500000 110.546875[/span
468111 2021-09-01 22:57:02 110. 500000 110.546875[/span
468112 2021-09-01 22:58:02 110. 484375 110.531250: 02.
468113 2021-09-01 22:59:02 110. 484375 110.531250 :02.
uj5u.com熱心網友回復:
首先,你要將DateTime字串轉換為時間戳
df['timestamp'] = pd.to_datetime(df['timestamp'] )
然后隔離
df = df.loc[df['timestamp'] == datetime.time(hour=16, minute=3) ]
很抱歉,我的代碼未經測驗,但這至少應該讓你走上正確的軌道。
uj5u.com熱心網友回復:
使用一個asof與一個DataFrame合并,這個DataFrame是一個單系列,每天的時間是16:00:00。你可以指定方向為'nearest','forward'或'backward'以獲得你想要的匹配邏輯。
示例資料
import numpy as np
import pandas as pd
N = 30000[/span
np.random.seed(123)
df1 = pd.DataFrame({'timestamp'/span>: (pd.date_range('2021-08-01'/span>, freq='29s'/span>, periods=N)
pd.to_timedelta(np.random.normal(0,1,N), unit='ms') )。
'value'。range(N)})
代碼
#Daily 16:00:00 DataFrame
start_date = '2021-08-01 16:00:00'/span>
end_date = '2021-08-11 16:00:00'/span>
dfbase = pd.DataFrame({'date'/span>: pd.date_range(start_date, end_date, freq='D')})
result = pd.merge_asof(dfbase, df1.sort_values('timestamp') 。
left_on='date'/span>, right_on='timestamp'/span>,
direction='nearest', allow_exact_matches=True)
print(result)
日期時間戳值
0 2021-08-01 16: 00: 00 2021-08-01 15: 59: 53. 999784590 1986: 53.
1 2021-08-02 16:00: 00 2021-08-02 16: 00: 14. 000160424 4966: 14.
2 2021-08-03 16:00: 00 2021-08-03 16: 00: 05. 000322262 7945: 05.
3 2021-08-04 16:00: 00 2021-08-04 15: 59: 55. 998303052 10924: 55.
4 2021-08-05 16:00: 00 2021-08-05 15: 59: 46. 998877694 13903: 46.
5 2021-08-06 16: 00: 00 2021-08-06 16: 00: 06. 998954204 16883: 06.
6 2021-08-07 16: 00: 00 2021-08-07 15: 59: 58. 000602203 19862: 58.
7 2021-08-08 16: 00: 00 2021-08 15:59:49. 001400290 22841: 49.
8 2021-08-09 16:00: 00 2021-08-09 16: 00: 08. 998636467 25821: 08.
9 2021-08-10 16:00: 00 2021-08-10 15:59:59。 998385577 28800: 59.
10 2021-08-11 16:00: 00 2021-08-1101:39:31. 001659917 29999: 31.
另一個選項,將給你一個類似于上述的結果,但資訊較少,是在設定索引后使用DataFrame.asof。你可以提供每天的日期系列。
df1 = df1.set_index('timestamp'/span>)
df1.asof(dfbase.date)
值
日期
2021-08-01 16:00: 00 1986.0
2021-08-02 16:00:00 4965.0
2021-08-03 16:00: 00 7944.0
2021-08-04 16:00: 00 10924.0
2021-08-05 16:00:00 13903.0
2021-08-06 16:00: 00 16882.0
2021-08-07 16:00: 00 19862.0
2021-08-08 16:00:00 22841.0
2021-08-09 16:00:00 25820.0[/span
2021-08-10 16: 00: 00 28800.0
2021-08-11 16: 00: 00 29999.0
因此,類似的結果(有些行是不同的,因為這執行了'backward'方向的匹配),但不給你任何關于哪個確切的時間戳被匹配的資訊,也不支持設定一個公差來排除不良匹配(可能像最后一行仍然是最接近的,但卻是一個不良匹配)
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/332425.html
標籤:
上一篇:將字典串列決議為表格/csv
