我正在嘗試在每月級別重塑資料框,但沒有取得多大成功。我有一個資料框,其中包含跨越給定時期的資料:每月、每季度或每年。本質上,我想按如下方式重塑資料框:一旦用盡每月可用的所有資料,使用季度值,然后一旦用盡所有季度值,使用年度值。你知道我該怎么做嗎?
非常感謝您的幫助!
輸入:
var_name begin_delivery_date end_delivery_date value
Monthly 2022 2022-01-01T06:00:00 2022-02-01T05:59:59 5
Monthly 2022 2022-02-01T06:00:00 2022-03-01T05:59:59 7
... ... ... ...
Quarterly 2022 2022-01-01T06:00:00 2022-04-01T06:00:00 10
... ... ... ...
Yearly 2022 2022-01-01T06:00:00 2023-01-01T06:00:00 49
預期輸出:
date var_name value
2022-01-01 Monthly 2022 5
2022-02-01 Monthly 2022 7
2022-03-01 Quarterly 2022 10
2022-04-01 Yearly 2022 49
2022-05-01 Yearly 2022 49
2022-06-01 Yearly 2022 49
2022-07-01 Yearly 2022 49
2022-08-01 Yearly 2022 49
2022-09-01 Yearly 2022 49
2022-10-01 Yearly 2022 49
2022-11-01 Yearly 2022 49
2022-12-01 Yearly 2022 49
要玩的輸入資料:
{ {
"begin_delivery_date": "2022-01-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-02-01T05:59:59",
"value": 5
},
{
"begin_delivery_date": "2022-02-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-03-01T05:59:59",
"value": 7
},
{
"begin_delivery_date": "2022-03-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-04-01T05:59:59",
"value": 8
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-05-01T05:59:59",
"value": 9
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-07-01T05:59:59",
"value": 10
},
{
"begin_delivery_date": "2022-07-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-10-01T05:59:59",
"value": 11
},
{
"begin_delivery_date": "2022-09-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2023-01-01T05:59:59",
"value": 12
},
{
"begin_delivery_date": "2023-01-01T06:00:00",
"var name": "Yearly 2023",
"end_delivery_date": "2024-01-01T05:59:59",
"value": 50
},
{
"begin_delivery_date": "2024-01-01T06:00:00",
"var name": "Yearly 2024",
"end_delivery_date": "2025-01-01T05:59:59",
"value": 60
}
}
uj5u.com熱心網友回復:
國際大學聯合會,
import pandas as pd
import numpy as np
data = [ {
"begin_delivery_date": "2022-01-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-02-01T05:59:59",
"value": 5
},
{
"begin_delivery_date": "2022-02-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-03-01T05:59:59",
"value": 7
},
{
"begin_delivery_date": "2022-03-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-04-01T05:59:59",
"value": 8
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Monthly 2022",
"end_delivery_date": "2022-05-01T05:59:59",
"value": 9
},
{
"begin_delivery_date": "2022-04-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-07-01T05:59:59",
"value": 10
},
{
"begin_delivery_date": "2022-07-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2022-10-01T05:59:59",
"value": 11
},
{
"begin_delivery_date": "2022-09-01T06:00:00",
"var name": "Quarterly 2022",
"end_delivery_date": "2023-01-01T05:59:59",
"value": 12
},
{
"begin_delivery_date": "2023-01-01T06:00:00",
"var name": "Yearly 2023",
"end_delivery_date": "2024-01-01T05:59:59",
"value": 50
},
{
"begin_delivery_date": "2024-01-01T06:00:00",
"var name": "Yearly 2024",
"end_delivery_date": "2025-01-01T05:59:59",
"value": 60
}
]
df = pd.DataFrame(data)
從日期范圍創建日期串列并分解資料框。
df['dates'] = [pd.date_range(s, e, freq='M') for s, e in zip(df['begin_delivery_date'], df['end_delivery_date'])]
df_out = df.explode('dates')
print(df_out)
輸出:
begin_delivery_date var name end_delivery_date value dates
0 2022-01-01T06:00:00 Monthly 2022 2022-02-01T05:59:59 5 2022-01-31 06:00:00
1 2022-02-01T06:00:00 Monthly 2022 2022-03-01T05:59:59 7 2022-02-28 06:00:00
2 2022-03-01T06:00:00 Monthly 2022 2022-04-01T05:59:59 8 2022-03-31 06:00:00
3 2022-04-01T06:00:00 Monthly 2022 2022-05-01T05:59:59 9 2022-04-30 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-04-30 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-05-31 06:00:00
4 2022-04-01T06:00:00 Quarterly 2022 2022-07-01T05:59:59 10 2022-06-30 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-07-31 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-08-31 06:00:00
5 2022-07-01T06:00:00 Quarterly 2022 2022-10-01T05:59:59 11 2022-09-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-09-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-10-31 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-11-30 06:00:00
6 2022-09-01T06:00:00 Quarterly 2022 2023-01-01T05:59:59 12 2022-12-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-01-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-02-28 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-03-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-04-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-05-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-06-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-07-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-08-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-09-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-10-31 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-11-30 06:00:00
7 2023-01-01T06:00:00 Yearly 2023 2024-01-01T05:59:59 50 2023-12-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-01-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-02-29 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-03-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-04-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-05-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-06-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-07-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-08-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-09-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-10-31 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-11-30 06:00:00
8 2024-01-01T06:00:00 Yearly 2024 2025-01-01T05:59:59 60 2024-12-31 06:00:00
uj5u.com熱心網友回復:
創建一個df并隨機播放(資料是你上面寫的資料)
df = pd.DataFrame(data)
df = df.sample(frac=1).reset_index(drop=True)
將 var name 中的每個值拆分為 2 個單獨的列,var_name_pediod 和 var_name_year
df["var_name_pediod"] = df["var name"].str.split(" ").str[0]
df["var_name_year"] = df["var name"].str.split(" ").str[1]
創建用于對期間進行排序的字典并將“var_name_pediod”列替換為字典
sort_dic = {"Monthly":1,"Quarterly":2,"Yearly":3}
df["var_name_pediod"] = df["var_name_pediod"].replace(sort_dic)
按“var_name_pediod”列對值進行排序
df.sort_values(by=['var_name_pediod'], inplace=True)
Groupby var_name_pediod 并按“var_name_year”排序
df.groupby(['var_name_pediod']).apply(lambda x: x.sort_values(by=['var_name_year'])).reset_index(drop=True)
完畢。如果您不需要額外的列,請洗掉它們
df.drop(columns=["var_name_pediod","var_name_year"],inplace=True)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/454546.html
