所以我有一個這樣的df:
import pandas as pd
import numpy as np
datatime = [('2019-09-15 00:15:00.000000000'),
('2019-09-15 00:30:00.000000000'),
('2019-09-15 00:45:00.000000000'),
('2019-09-15 01:00:00.000000000'),
('2019-09-15 01:15:00.000000000'),
('2019-09-15 01:30:00.000000000'),
('2019-09-15 01:45:00.000000000'),
('2019-09-15 02:00:00.000000000'),
('2019-09-15 02:15:00.000000000')]
p =[494.76,486.36,484.68,500.64,482.16,483.84,483.0,478.8,493.08,474.6]
q = [47.88,33.6,41.16,0.0,0.0,0.0,0.0,0.0,0.0,0.0]
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
df
我通過將我的資料分組到高峰和非高峰時間來進行 30 天的分析。為此,我還需要確定一周中的哪幾天。我嘗試使用該pandas功能:
df.dt.day_name()
但在這種特殊情況下,這是不可行的,因為對于這個函式,一天開始于00:00:00我的程式,我需要它從00:15:00. 由于我每天有 96 分,所以我考慮使用字典:
days_of_the_week = {'Sunday': 1,'Monday': 2,'Tuesday': 3, 'Wednesday': 4, 'Thursday':5, 'Friday':6 , 'Saturday':7}
如何將其應用于我的 df 以便每 96 個點確定一個新的一天?
uj5u.com熱心網友回復:
Timedelta在計算作業日時,您可以使用物件添加偏移量。這不會影響datetime列的值。
In [21]: dt_index = pd.date_range(start='2022-01-01', end='2022-01-01 23:45:00', periods=96)
In [23]: df = pd.DataFrame(zip(dt_index, np.random.rand(len(dt_index))), columns=['datetime', 'whatever'])
In [24]: df.tail()
Out[24]:
datetime whatever
91 2022-01-01 22:45:00 0.910446
92 2022-01-01 23:00:00 0.199106
93 2022-01-01 23:15:00 0.051808
94 2022-01-01 23:30:00 0.799284
95 2022-01-01 23:45:00 0.584663
In [25]: df['weekday'] = (df.datetime.astype('datetime64[ns]') pd.Timedelta(seconds=15*60)).dt.day_name()
In [26]: df.tail()
Out[26]:
datetime whatever weekday
91 2022-01-01 22:45:00 0.910446 Saturday
92 2022-01-01 23:00:00 0.199106 Saturday
93 2022-01-01 23:15:00 0.051808 Saturday
94 2022-01-01 23:30:00 0.799284 Saturday
95 2022-01-01 23:45:00 0.584663 Sunday
只是關于你構建你的方式的注釋DataFrame。
df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
的使用list是不必要的,并且可能會阻礙更大資料集的性能。此外,您不應將嵌套串列用于columns引數,因為它具有意想不到的效果。
In [27]: df = pd.DataFrame(list(zip(datatime,p,q)), columns = [['datetime','p','q']])
In [28]: type(df.datetime)
Out[28]: pandas.core.frame.DataFrame
In [29]: df = pd.DataFrame(zip(datatime, p, q), columns=['datetime','p','q'])
In [30]: type(df.datetime)
Out[30]: pandas.core.series.Series
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/436866.html
