從 PDF 中提取表格會產生以下資料框:
Date Transaction Details Withdrawals Deposits Balance
0 01-01-2020 Tx1-Description - Line1 1625.0 NaN 97994.82
1 NaN Line 2 NaN NaN NaN
2 01-01-2020 Tx2-Description - Line1 NaN 84994.82 90000.00
3 NaN Line 2 NaN NaN NaN
4 NaN Line 3 NaN NaN NaN
5 02-01-2020 Tx3-Description - Line1 71.0 NaN 84923.82
6 NaN Line 2 NaN NaN NaN
7 02-01-2020 Tx4-Description - Line1 NaN 80.00 90000.00
8 NaN Line 2 NaN NaN NaN
9 NaN Line 3 NaN NaN NaN
10 03-01-2020 Tx5-Description - Line1 100.0 NaN 85000.00
如何Transaction Details正確合并列?
期望的輸出:
Date Transaction Details Withdrawals Deposits Balance
0 01-01-2020 Tx1-Description - Line1 Line 2 1625.0 NaN 97994.82
1 01-01-2020 Tx2-Description - Line1 Line 2 Line 3 NaN 84994.82 90000.00
2 02-01-2020 Tx3-Description - Line1 Line 2 71.0 NaN 84923.82
3 02-01-2020 Tx4-Description - Line1 Line 2 Line 3 NaN 80.00 90000.00
4 03-01-2020 Tx5-Description - Line1 100.0 NaN 85000.00
uj5u.com熱心網友回復:
IIUC,您可以groupby使用“日期”來形成組,然后聚合:
(df.groupby(df['Date'].notna().cumsum(), as_index=False)
.agg({'Date': 'first', 'Transaction Details': ' '.join,
'Withdrawals': 'sum', 'Deposits': 'sum', 'Balance': 'sum'})
)
注意。請注意,NaN 變為 0,但replace(0, float('nan'))如果需要,您可以
輸出:
Date Transaction Details Withdrawals Deposits Balance
0 01-01-2020 Tx1-Description - Line1 Line 2 1625.0 0.00 97994.82
1 01-01-2020 Tx2-Description - Line1 Line 2 Line 3 0.0 84994.82 90000.00
2 02-01-2020 Tx3-Description - Line1 Line 2 71.0 0.00 84923.82
3 02-01-2020 Tx4-Description - Line1 Line 2 Line 3 0.0 80.00 90000.00
4 03-01-2020 Tx5-Description - Line1 100.0 0.00 85000.00
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/429580.html
上一篇:Pandas-如何清理廢料
