我有特定日期的每小時用電量資料。我想用這些資料來“預測”接下來幾天的每小時用電量。第二天的值應該是前一天同一小時的值乘以比例因子f(例如 2)。
我擁有的資料框df看起來像這樣:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
2021-01-01 05:00:00 1.0
2021-01-01 06:00:00 1.0
2021-01-01 07:00:00 3.0
2021-01-01 08:00:00 3.0
2021-01-01 09:00:00 3.0
2021-01-01 10:00:00 3.0
2021-01-01 11:00:00 3.0
2021-01-01 12:00:00 3.0
2021-01-01 13:00:00 3.0
2021-01-01 14:00:00 3.0
2021-01-01 15:00:00 3.0
2021-01-01 16:00:00 3.0
2021-01-01 17:00:00 3.0
2021-01-01 18:00:00 3.0
2021-01-01 19:00:00 3.0
2021-01-01 20:00:00 1.0
2021-01-01 21:00:00 1.0
2021-01-01 22:00:00 1.0
2021-01-01 23:00:00 1.0
我希望輸出資料框df_ex看起來像這樣:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
2021-01-01 05:00:00 1.0
2021-01-01 06:00:00 1.0
2021-01-01 07:00:00 3.0
2021-01-01 08:00:00 3.0
2021-01-01 09:00:00 3.0
2021-01-01 10:00:00 3.0
2021-01-01 11:00:00 3.0
2021-01-01 12:00:00 3.0
2021-01-01 13:00:00 3.0
2021-01-01 14:00:00 3.0
2021-01-01 15:00:00 3.0
2021-01-01 16:00:00 3.0
2021-01-01 17:00:00 3.0
2021-01-01 18:00:00 3.0
2021-01-01 19:00:00 3.0
2021-01-01 20:00:00 1.0
2021-01-01 21:00:00 1.0
2021-01-01 22:00:00 1.0
2021-01-01 23:00:00 1.0
2021-01-02 00:00:00 2.0
2021-01-02 01:00:00 2.0
2021-01-02 02:00:00 2.0
2021-01-02 03:00:00 2.0
2021-01-02 04:00:00 2.0
2021-01-02 05:00:00 2.0
2021-01-02 06:00:00 2.0
2021-01-02 07:00:00 6.0
2021-01-02 08:00:00 6.0
2021-01-02 09:00:00 6.0
2021-01-02 10:00:00 6.0
2021-01-02 11:00:00 6.0
2021-01-02 12:00:00 6.0
2021-01-02 13:00:00 6.0
2021-01-02 14:00:00 6.0
2021-01-02 15:00:00 6.0
2021-01-02 16:00:00 6.0
2021-01-02 17:00:00 6.0
2021-01-02 18:00:00 6.0
2021-01-02 19:00:00 6.0
2021-01-02 20:00:00 2.0
2021-01-02 21:00:00 2.0
2021-01-02 22:00:00 2.0
2021-01-02 23:00:00 2.0
2021-01-03 00:00:00 4.0
2021-01-03 01:00:00 4.0
2021-01-03 02:00:00 4.0
2021-01-03 03:00:00 4.0
2021-01-03 04:00:00 4.0
2021-01-03 05:00:00 4.0
2021-01-03 06:00:00 4.0
2021-01-03 07:00:00 12.0
2021-01-03 08:00:00 12.0
2021-01-03 09:00:00 12.0
2021-01-03 10:00:00 12.0
2021-01-03 11:00:00 12.0
2021-01-03 12:00:00 12.0
2021-01-03 13:00:00 12.0
2021-01-03 14:00:00 12.0
2021-01-03 15:00:00 12.0
2021-01-03 16:00:00 4.0
2021-01-03 17:00:00 4.0
2021-01-03 18:00:00 4.0
2021-01-03 19:00:00 4.0
2021-01-03 20:00:00 4.0
2021-01-03 21:00:00 4.0
2021-01-03 22:00:00 4.0
2021-01-03 23:00:00 4.0
我嘗試了以下解決方案(df如上所述):
import pandas as pd
import datetime
start = '2021-01-01 00:00'
end = '2021-01-03 23:00'
freq = 'H'
index = pd.date_range(start,
end,
freq=freq)
df_ex = df.reindex(index)
i = df_ex.index[0].day
f = 2.0
df_ex.loc[df_ex.index.day == i 1] = df_ex.loc[df_ex.index.day == i] * f
print(df_ex)
結果是:
load_kWh
2021-01-01 00:00:00 1.0
2021-01-01 01:00:00 1.0
2021-01-01 02:00:00 1.0
2021-01-01 03:00:00 1.0
2021-01-01 04:00:00 1.0
... ...
2021-01-03 19:00:00 NaN
2021-01-03 20:00:00 NaN
2021-01-03 21:00:00 NaN
2021-01-03 22:00:00 NaN
2021-01-03 23:00:00 NaN
看來我在第一天之后用值填充行的嘗試沒有成功。該索引是 DateTimeIndex。
任何有關如何解決此問題的建議將不勝感激!
uj5u.com熱心網友回復:
要創建資料,您需要一次迭代一天。
假設原始資料至少有一整天的資料,那么你可以這樣做:
import pandas as pd
import itertools
import datetime as dt
start = "2021-01-01 00:00"
end = "2021-01-01 23:00"
freq = "H"
df = pd.DataFrame(
{"load_kWh": itertools.chain([1.0] * 7, [3.0] * 13, [1.0] * 4)},
index=pd.date_range(start, end, freq=freq),
)
def add_days_to_df(data: pd.DataFrame, number_of_days: int, k: float) -> pd.DataFrame:
data = data.copy()
for _ in range(number_of_days):
day = data[-24:]
day.index = dt.timedelta(days=1)
day *= k
data = pd.concat((data, day))
return data
print(add_days_to_df(data=df, number_of_days=2, k=2.0))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/533405.html
