需要幫助填補 df_1 中缺少一個月開始日期的空白(例如:01、02、05 和 07 到 11),我需要有一個連續的月份(即 12)。
In: df_1 = pd.DataFrame([['2021-03-01', 'Supp_1', 'Product_1', '1'],
['2021-04-01', 'Supp_1', 'Product_1', 1],
['2021-06-01','Supp_1', 'Product_1', 1],
['2021-12-01', 'Supp_1', 'Product_1', 1.25]],
columns=['Date','Supplier','Product','Cost'])
Out:
Date Supplier Product Cost
0 2021-03-01 Supp_1 Product_1 1
1 2021-04-01 Supp_1 Product_1 1
2 2021-06-01 Supp_1 Product_1 1
3 2021-12-01 Supp_1 Product_1 1.25
預期的結果是,
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 1
1 2021-02-01 Supp_1 Product_1 1
2 2021-03-01 Supp_1 Product_1
3 2021-04-01 Supp_1 Product_1
4 2021-05-01 Supp_1 Product_1
5 2021-06-01 Supp_1 Product_1 1
6 2021-07-01 Supp_1 Product_1
7 2021-08-01 Supp_1 Product_1
8 2021-09-01 Supp_1 Product_1
9 2021-10-01 Supp_1 Product_1
10 2021-11-01 Supp_1 Product_1
11 2021-12-01 Supp_1 Product_1 1.25
一旦我們有了 df_2,我就可以使用 ffill() 和 bfill() 來填補“成本”的空白
uj5u.com熱心網友回復:
您可以使用resample:
print (df_1.assign(Date=pd.to_datetime(df_1["Date"]))
.set_index("Date")
.resample("MS").asfreq()
.reset_index())
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 1
1 2021-02-01 Supp_1 Product_1 1
2 2021-03-01 NaN NaN NaN
3 2021-04-01 NaN NaN NaN
4 2021-05-01 NaN NaN NaN
5 2021-06-01 Supp_1 Product_1 1
6 2021-07-01 NaN NaN NaN
7 2021-08-01 NaN NaN NaN
8 2021-09-01 NaN NaN NaN
9 2021-10-01 NaN NaN NaN
10 2021-11-01 NaN NaN NaN
11 2021-12-01 Supp_1 Product_1 1.25
uj5u.com熱心網友回復:
您可以使用此管道。關鍵步驟是set_index日期和使用asfreq:
(df1.assign(Date=pd.to_datetime(df1['Date']))
.set_index('Date')
.asfreq('MS')
.assign(Supplier=lambda d: d['Supplier'].ffill(),
Product=lambda d: d['Product'].ffill()
)
.reset_index()
)
輸出:
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 1
1 2021-02-01 Supp_1 Product_1 1
2 2021-03-01 Supp_1 Product_1 NaN
3 2021-04-01 Supp_1 Product_1 NaN
4 2021-05-01 Supp_1 Product_1 NaN
5 2021-06-01 Supp_1 Product_1 1
6 2021-07-01 Supp_1 Product_1 NaN
7 2021-08-01 Supp_1 Product_1 NaN
8 2021-09-01 Supp_1 Product_1 NaN
9 2021-10-01 Supp_1 Product_1 NaN
10 2021-11-01 Supp_1 Product_1 NaN
11 2021-12-01 Supp_1 Product_1 1.25
uj5u.com熱心網友回復:
另外一個選項:
df_1.Date = pd.to_datetime(df_1.Date)
df_1 = df_1.set_index('Date').asfreq('MS').reset_index()
df_1
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 1
1 2021-01-02 NaN NaN NaN
2 2021-01-03 NaN NaN NaN
3 2021-01-04 NaN NaN NaN
4 2021-01-05 NaN NaN NaN
... ... ... ... ...
330 2021-11-27 NaN NaN NaN
331 2021-11-28 NaN NaN NaN
332 2021-11-29 NaN NaN NaN
333 2021-11-30 NaN NaN NaN
334 2021-12-01 Supp_1 Product_1 1.25
335 rows × 4 columns
uj5u.com熱心網友回復:
如果我理解正確,您希望根據一年中所有月份的組合以及Supplier和來顯示缺失的行,Product然后向前/向后填充Cost列。
也許pyjanitor的完整功能可以提供幫助:
# pip git https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor as jn
year = df.Date.dt.year.at[0]
months = pd.date_range(f"{year}-01-01", f"{year}-12-01", freq="MS")
months = dict(Date = months)
df.complete(months, 'Supplier', 'Product', sort = True)
Date Supplier Product Cost
0 2021-01-01 Supp_1 Product_1 NaN
1 2021-02-01 Supp_1 Product_1 NaN
2 2021-03-01 Supp_1 Product_1 1
3 2021-04-01 Supp_1 Product_1 1
4 2021-05-01 Supp_1 Product_1 NaN
5 2021-06-01 Supp_1 Product_1 1
6 2021-07-01 Supp_1 Product_1 NaN
7 2021-08-01 Supp_1 Product_1 NaN
8 2021-09-01 Supp_1 Product_1 NaN
9 2021-10-01 Supp_1 Product_1 NaN
10 2021-11-01 Supp_1 Product_1 NaN
11 2021-12-01 Supp_1 Product_1 1.25
然后,您可以向上或向下填充該Cost列。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/315273.html
