我正在使用 Pandas 并擁有以下資料集(是的,所有值都是String型別):
data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
}
df = pd.DataFrame(data)
df
輸出:
stage hour location
0 1 Berlim Berlim
1 1 1 1
2 1 2 2
3 1 3 3
4 2 Munich Munich
5 2 1 1
6 2 2 2
7 4 Leipzig Leipzig
8 4 1 1
9 4 2 2
10 4 3 3
11 4 4 4
目標是:
- 重復在
df['location]whiledf['location']上沒有字母時找到的值。 - 最后,我將應用過濾器從 df where 中洗掉值
df['hour'] = df['location'],我決定不問,因為我還沒有嘗試過。
所以對于 1) 所需的輸出是:
stage hour location
0 1 Berlim Berlim
1 1 1 Berlim
2 1 2 Berlim
3 1 3 Berlim
4 2 Munich Munich
5 2 1 Munich
6 2 2 Munich
7 4 Leipzig Leipzig
8 4 1 Leipzig
9 4 2 Leipzig
10 4 3 Leipzig
11 4 4 Leipzig
對于 2) 所需的輸出是:
stage hour location
0 1 1 Berlim
1 1 2 Berlim
2 1 3 Berlim
3 2 1 Munich
4 2 2 Munich
5 4 1 Leipzig
6 4 2 Leipzig
7 4 3 Leipzig
8 4 4 Leipzig
所以我開始嘗試先填寫 df['location'] ,這是我做不到的。運行下面的代碼,我總是有所有記錄的“Berlim”。
for index, row in df.iterrows():
isHeader = bool(re.search('[A-Z]', row['location']))
print('>>>> evaluation(location, isHeader) - ',row['location'], ' , ', isHeader)
if isHeader == True:
currentHeader = row['location']
print("> new header to be used on the next rows: ", currentHeader)
df['currentHeader'] = currentHeader
else:
print('> not a header, so ',currentHeader, 'will be used')
df['location'] = row['location']
print('> new pair: ', row['location'], currentHeader)
df['currentHeader'] = currentHeader
df
電流輸出:
stage hour location
0 1 Berlim Berlim
1 1 1 Berlim
2 1 2 Berlim
3 1 3 Berlim
4 2 Munich Berlim
5 2 1 Berlim
6 2 2 Berlim
7 4 Leipzig Berlim
8 4 1 Berlim
9 4 2 Berlim
10 4 3 Berlim
11 4 4 Berlim
任何人都可以幫助我,好嗎?這是一個我失敗的邏輯問題,你就是不明白為什么。如果有更好的方法可以做到這一點,請隨時分享。
謝謝!
編輯
也試過這個,但在這種情況下,它將對所有記錄重復并存盤最后一個值df['location']并應用于所有記錄Leipzig。
for index, row in df.iterrows():
isHeader = bool(re.search('[A-Z]', row['location']))
print('>>>> evaluation(location, isHeader) - ',row['location'], ' , ', isHeader)
if isHeader == True:
currentHeader = row['location']
print("> new header to be used on the next rows: ", currentHeader)
df['currentHeader'] = currentHeader
else:
print('> not a header, so ',df['currentHeader'], 'will be used')
row['location'] = row['location']
print("else: header", row['location'], currentHeader)
df['currentHeader'] = currentHeader
uj5u.com熱心網友回復:
import pandas as pd
data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
}
df = pd.DataFrame(data)
df.loc[df.location.str.isnumeric(),'location'] = None
df.ffill(inplace=True)
如果您想為第二個輸出洗掉非數字
df.loc[df.hour.str.isnumeric()]
輸出
stage hour location
1 1 1 Berlim
2 1 2 Berlim
3 1 3 Berlim
5 2 1 Munich
6 2 2 Munich
8 4 1 Leipzig
9 4 2 Leipzig
10 4 3 Leipzig
11 4 4 Leipzig
uj5u.com熱心網友回復:
我沒有詳細閱讀所有內容,但是 IIUC 計算了一個布爾掩碼。您將使用它來屏蔽和ffill非單詞行,并對輸出進行切片。
輸出#1:
# make a mask of rows that "have letters in it"
mask = df['location'].str.contains('[a-z]', case=False)
# use the mask to hide the non-match and fill with previous value
out = df.assign(location=df['location'].where(mask).ffill())
輸出:
stage hour location
0 1 Berlim Berlim
1 1 1 Berlim
2 1 2 Berlim
3 1 3 Berlim
4 2 Munich Munich
5 2 1 Munich
6 2 2 Munich
7 4 Leipzig Leipzig
8 4 1 Leipzig
9 4 2 Leipzig
10 4 3 Leipzig
11 4 4 Leipzig
輸出#2:
相同,也使用(反轉)掩碼對輸出進行切片
mask = df['location'].str.contains('[a-z]', case=False)
out2 = df.assign(location=df['location'].where(mask).ffill())[~mask]
或從先前的輸出:
out2 = out[~mask]
輸出:
stage hour location
1 1 1 Berlim
2 1 2 Berlim
3 1 3 Berlim
5 2 1 Munich
6 2 2 Munich
8 4 1 Leipzig
9 4 2 Leipzig
10 4 3 Leipzig
11 4 4 Leipzig
uj5u.com熱心網友回復:
import pandas as pd
data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
}
df = pd.DataFrame(data)
df['location'] = df['location'].apply(lambda x: None if x.isdigit() else x)
df['location'] = df['location'].fillna(method='ffill')
df
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/425873.html
下一篇:交替展平嵌套串列的值
