遍歷pandas中的列，并動態更改其值，直到找到新的[Pandas]-有解無憂

我正在使用 Pandas 并擁有以下資料集（是的，所有值都是String型別）：

data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
        'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
        'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
       }
 
df = pd.DataFrame(data)

df

輸出：

    stage   hour        location
0   1       Berlim      Berlim
1   1       1           1
2   1       2           2
3   1       3           3
4   2       Munich      Munich
5   2       1           1
6   2       2           2
7   4       Leipzig     Leipzig
8   4       1           1
9   4       2           2
10  4       3           3
11  4       4           4

目標是：

重復在df['location]whiledf['location']上沒有字母時找到的值。
最后，我將應用過濾器從 df where 中洗掉值df['hour'] = df['location']，我決定不問，因為我還沒有嘗試過。

所以對于 1) 所需的輸出是：

    stage   hour        location
0   1       Berlim      Berlim
1   1       1           Berlim
2   1       2           Berlim
3   1       3           Berlim
4   2       Munich      Munich
5   2       1           Munich
6   2       2           Munich
7   4       Leipzig     Leipzig
8   4       1           Leipzig
9   4       2           Leipzig
10  4       3           Leipzig
11  4       4           Leipzig

對于 2) 所需的輸出是：

    stage   hour        location
0   1       1           Berlim
1   1       2           Berlim
2   1       3           Berlim
3   2       1           Munich
4   2       2           Munich
5   4       1           Leipzig
6   4       2           Leipzig
7   4       3           Leipzig
8   4       4           Leipzig

所以我開始嘗試先填寫 df['location'] ，這是我做不到的。運行下面的代碼，我總是有所有記錄的“Berlim”。

for index, row in df.iterrows():
    isHeader = bool(re.search('[A-Z]', row['location']))
    print('>>>> evaluation(location, isHeader) - ',row['location'], ' , ', isHeader)
    if isHeader == True:
        currentHeader = row['location']
        print("> new header to be used on the next rows: ", currentHeader)
        df['currentHeader'] = currentHeader
    else:
        print('> not a header, so ',currentHeader, 'will be used')
        df['location'] = row['location']
        print('> new pair: ', row['location'], currentHeader)
        df['currentHeader'] = currentHeader
        
df

電流輸出：

    stage   hour        location
0   1       Berlim      Berlim
1   1       1           Berlim
2   1       2           Berlim
3   1       3           Berlim
4   2       Munich      Berlim
5   2       1           Berlim
6   2       2           Berlim
7   4       Leipzig     Berlim
8   4       1           Berlim
9   4       2           Berlim
10  4       3           Berlim
11  4       4           Berlim

任何人都可以幫助我，好嗎？這是一個我失敗的邏輯問題，你就是不明白為什么。如果有更好的方法可以做到這一點，請隨時分享。

謝謝！

編輯也試過這個，但在這種情況下，它將對所有記錄重復并存盤最后一個值df['location']并應用于所有記錄Leipzig。

for index, row in df.iterrows():
isHeader = bool(re.search('[A-Z]', row['location']))
print('>>>> evaluation(location, isHeader) - ',row['location'], ' , ', isHeader)
if isHeader == True:
    currentHeader = row['location']
    print("> new header to be used on the next rows: ", currentHeader)
    df['currentHeader'] = currentHeader
else:
    print('> not a header, so ',df['currentHeader'], 'will be used')
    row['location'] = row['location']
    print("else: header", row['location'], currentHeader)
    df['currentHeader'] = currentHeader

uj5u.com熱心網友回復：

import pandas as pd

data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
        'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
        'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
       }
 
df = pd.DataFrame(data)

df.loc[df.location.str.isnumeric(),'location'] = None
df.ffill(inplace=True)

如果您想為第二個輸出洗掉非數字

df.loc[df.hour.str.isnumeric()]

輸出

   stage hour location
1      1    1   Berlim
2      1    2   Berlim
3      1    3   Berlim
5      2    1   Munich
6      2    2   Munich
8      4    1  Leipzig
9      4    2  Leipzig
10     4    3  Leipzig
11     4    4  Leipzig

uj5u.com熱心網友回復：

我沒有詳細閱讀所有內容，但是 IIUC 計算了一個布爾掩碼。您將使用它來屏蔽和ffill非單詞行，并對輸出進行切片。

輸出#1：

# make a mask of rows that "have letters in it"
mask = df['location'].str.contains('[a-z]', case=False)

# use the mask to hide the non-match and fill with previous value
out = df.assign(location=df['location'].where(mask).ffill())

輸出：

   stage     hour location
0      1   Berlim   Berlim
1      1        1   Berlim
2      1        2   Berlim
3      1        3   Berlim
4      2   Munich   Munich
5      2        1   Munich
6      2        2   Munich
7      4  Leipzig  Leipzig
8      4        1  Leipzig
9      4        2  Leipzig
10     4        3  Leipzig
11     4        4  Leipzig

輸出#2：

相同，也使用（反轉）掩碼對輸出進行切片

mask = df['location'].str.contains('[a-z]', case=False)
out2 = df.assign(location=df['location'].where(mask).ffill())[~mask]

或從先前的輸出：

out2 = out[~mask]

輸出：

   stage hour location
1      1    1   Berlim
2      1    2   Berlim
3      1    3   Berlim
5      2    1   Munich
6      2    2   Munich
8      4    1  Leipzig
9      4    2  Leipzig
10     4    3  Leipzig
11     4    4  Leipzig

uj5u.com熱心網友回復：

import pandas as pd
data = {'stage':['1', '1', '1', '1','2','2','2','4','4','4','4','4'],
        'hour':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4'],
        'location':['Berlim','1','2','3', 'Munich','1','2','Leipzig','1','2','3','4']
       }
df = pd.DataFrame(data)
df['location'] = df['location'].apply(lambda x: None if x.isdigit() else x)
df['location'] = df['location'].fillna(method='ffill')
df

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/425873.html

標籤：Python 熊猫

上一篇：升級到gradle7.x后生成兩個war檔案

下一篇：交替展平嵌套串列的值