我有一個資料框說:
例子:
import pandas as pd
df = pd.DataFrame({'Item': ['California', '2012%', '2013%','Arizona','2012%',' 19','Janu%ary'],
'col1': [0,50, 50,0,10,11,14],'col2': [0, 50, 40,0,15,13,15]})
Output=
Item col1 col2
1 California 0 0
2 2012% 50 50
3 2013% 40 40
4 Arizona 0 0
5 2012%. 10. 15
6. %2019. 11. 13
7. Janu%ary. 14. 15
我希望像“California”和“Arizona”這樣的列名(列值中沒有“%”的列名被視為必須附加到各自子標題的標題。就像可能迭代行并找到一個模式,例如,在行中沒有“%”表示它的標題,“%”表示它的子標題,然后對于“子標題”行,添加最后找到的“標題”。
Expected output=
Item col1 col2
1 California 2012% 50 50
2 California 2013% 40 40
3 Arizona 2012%. 10. 15
4 Arizona 2019%. 11. 13
5 Arizona January%. 14. 15
uj5u.com熱心網友回復:
IIUC,您可以使用掩碼并執行布爾掩碼/索引:
# does the name contains '%' (you could use other conditions)
m = df['Item'].str.contains('%')
# mask and ffill the "header", then concatenate
df['Item'] = df['Item'].mask(m).ffill() ' ' df['Item']
# drop the former header rows
df = df.loc[m]
輸出:
Item col1 col2
1 California 2012% 50 50
2 California 2013% 50 40
4 Arizona 2012% 10 15
5 Arizona 2019% 11 13
6 Arizona January% 14 15
替代有一個真正的索引:
m = df['Item'].str.contains('%')
df['index'] = df['Item'].mask(m).ffill()
df = df.loc[m].set_index('index')
輸出:
Item col1 col2
index
California 2012% 50 50
California 2013% 50 40
Arizona 2012% 10 15
Arizona 2019% 11 13
Arizona January% 14 15
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/444908.html
上一篇:如何在抓取網站時到達最后一頁后停止seleniumwebdriver?
下一篇:帶有時間列的Pandas資料框
