我有以下帶有“日期”和“值”列的資料框(df)。我正在尋找如何從變數 start_MM_DD 和 end_MM_DD(月和日)創建新資料框的解決方案。對于每一年,應創建一個具有相應值的列。資料框“df”可以更早開始或資料可能丟失,具體取決于變數的開始方式。
import pandas as pd
import numpy as np
df = pd.DataFrame({'date':pd.date_range(start='2020-01-03', end='2022-01-10')})
df['randNumCol'] = np.random.randint(1, 6, df.shape[0])
start_MM_DD = '01-01'
end_MM_DD = '01-15'
新的資料框應如下所示:

uj5u.com熱心網友回復:
# create your date range; the year does not matter
d_range = pd.date_range('2022-01-01', '2022-01-15').strftime('%m-%d')
# use boolean indexing to filter your frame based on the months and days you want
new = df[df['date'].dt.strftime('%m-%d').isin(d_range)].copy()
# get the year from the date column
new['year'] = new['date'].dt.year
# pivot the frame
new['date'] = new['date'].dt.strftime('%m-%d')
print(new.pivot('date', 'year', 'randNumCol'))
year 2020 2021 2022
date
01-01 NaN 3.0 5.0
01-02 NaN 4.0 2.0
01-03 2.0 4.0 3.0
01-04 3.0 3.0 3.0
01-05 2.0 2.0 4.0
01-06 5.0 5.0 3.0
01-07 1.0 3.0 3.0
01-08 2.0 5.0 2.0
01-09 2.0 2.0 5.0
01-10 5.0 5.0 2.0
01-11 5.0 5.0 NaN
01-12 2.0 3.0 NaN
01-13 1.0 2.0 NaN
01-14 1.0 3.0 NaN
01-15 4.0 3.0 NaN
uj5u.com熱心網友回復:
這只是一個按日期過濾的資料透視表:
import pandas as pd
import numpy as np
df = pd.DataFrame({'date':pd.date_range(start='2020-01-03', end='2022-01-10')})
df['randNumCol'] = np.random.randint(1, 6, df.shape[0])
df = (
df.loc[(df['date'].dt.month.eq(1)) & (df['date'].dt.day.between(1,15))]
.pivot_table(index=df['date'].dt.day,
columns=df['date'].dt.year)
).droplevel(0, axis=1).rename_axis(None)
print(df)
輸出
date 2020 2021 2022
1 NaN 2.0 1.0
2 NaN 5.0 4.0
3 3.0 1.0 2.0
4 3.0 1.0 2.0
5 4.0 5.0 5.0
6 5.0 1.0 4.0
7 2.0 4.0 2.0
8 4.0 3.0 3.0
9 5.0 5.0 3.0
10 4.0 3.0 3.0
11 1.0 1.0 NaN
12 2.0 2.0 NaN
13 1.0 4.0 NaN
14 3.0 2.0 NaN
15 1.0 3.0 NaN
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/518231.html
上一篇:結合2系列并在R中洗掉NA
下一篇:洗掉資料框中字串列陣列中的子字串
