我有一個包含這樣的日期時間值的資料集,
datetime
0 2012-04-01 07:00:00
. .
. .
我想創建作業日、小時、月份的單獨列,
datetime weekday_1 ... weekday_7 hour_1 ... hour_7 ... hour_24 month_1 ... month_4 ... month_12
0 2012-04-01 07:00:00 0 1 0 1 0 0 1 0
(以星期一為weekday_1,示例日期為星期日:weekday_7)
我知道如何從日期時間中提取的唯一方法是,
df['month'] = df['datetime'].dt.month
但我似乎無法應用它來解決我的問題。
對不起,如果這聽起來重復,我對此很陌生。但是類似的問題答案還不夠有用。提前致謝。
uj5u.com熱心網友回復:
創建自定義函式:
# Use {i:02} to get a number on two digits
cols = [f'weeday_{i}' for i in range(1, 8)] \
[f'hour_{i}' for i in range(1, 25)] \
[f'month_{i}' for i in range(1, 13)]
def get_dummy(dt):
l = [0] * (7 24 12)
l[dt.weekday()] = 1
l[dt.hour 6] = 1
l[dt.month 30] = 1
return pd.Series(dict(zip(cols, l)))
df = df.join(df['datetime'].apply(get_dummy))
輸出:
>>> df.iloc[0]
datetime 2012-04-01 07:00:00
weeday_1 0
weeday_2 0
weeday_3 0
weeday_4 0
weeday_5 0
weeday_6 0
weeday_7 1 # <- Sunday
hour_1 0
hour_2 0
hour_3 0
hour_4 0
hour_5 0
hour_6 0
hour_7 1 # <- 07:00
hour_8 0
hour_9 0
hour_10 0
hour_11 0
hour_12 0
hour_13 0
hour_14 0
hour_15 0
hour_16 0
hour_17 0
hour_18 0
hour_19 0
hour_20 0
hour_21 0
hour_22 0
hour_23 0
hour_24 0
month_1 0
month_2 0
month_3 0
month_4 1 # <- April
month_5 0
month_6 0
month_7 0
month_8 0
month_9 0
month_10 0
month_11 0
month_12 0
Name: 0, dtype: object
uj5u.com熱心網友回復:
您可以為作業日、小時、月份創建列,然后為它們設定 getdummy。下面是單個語法的鏈接。[https://www.w3schools.com/python/python_datetime.asp][1]
以下是我關于您的問題的示例代碼
#Assume df is your DataFrame for datetime
df[["weekday","hour","month"]]=df[[datetime.strftime("%Y"),datetime.strftime("%H"),datetime.strftime("%m")]]
df=pd.get_dummies(df[["weekday","hour","month"]])
uj5u.com熱心網友回復:
您可以使用:
df = pd.DataFrame(data={'datetime':[datetime(2012,4,1,7,0,0),
datetime(2012,12,1,8,0,0)]})
df['datetime'] = pd.to_datetime(df['datetime'])
df['month'] = df['datetime'].dt.month
df['weekday'] = df['datetime'].dt.dayofweek
df['hour'] = df['datetime'].dt.hour
for column in ['month', 'weekday', 'hour']:
index = [col for col in df.columns if col!=column]
df = df.pivot_table(index=index, columns=[column], aggfunc=np.count_nonzero).fillna(0).astype(bool).add_prefix(f'{column}_').reset_index()
#print(df)
#Here is the output as of now
# hour datetime month_4 month_12 weekday_5 weekday_6 hour_7 hour_8
# 0 2012-04-01 07:00:00 True False False True True False
# 1 2012-12-01 08:00:00 False True True False False True
other_cols = [f'weekday_{i}' for i in range(1, 8)] [f'hour_{i}' for i in range(1, 25)] [f'month_{i}' for i in range(1, 13)]
df_base = pd.DataFrame(columns= ['datetime'] other_cols)
df_base = pd.concat([df_base, df]).fillna(0)
df_base[df_base.columns[1:]] = df_base[df_base.columns[1:]].fillna(0).astype(int)
print(df_base)
OUTPUT
datetime weekday_1 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6 weekday_7 hour_1 hour_2 ... month_3 month_4 month_5 month_6 month_7 month_8 month_9 month_10 month_11 month_12
0 2012-04-01 07:00:00 0 0 0 0 0 1 0 0 0 ... 0 1 0 0 0 0 0 0 0 0
1 2012-12-01 08:00:00 0 0 0 0 1 0 0 0 0 ... 0 0 0 0 0 0 0 0 0 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/385639.html
上一篇:用python畫圣誕樹、櫻花樹、卡通圖案及打包成exe檔案
下一篇:日期操作型別錯誤(日期時間包)
