根據另一個Dataframe上的條件替換Dataframe的列值-有解無憂

我有兩個資料框，如果存在，我需要根據第二個資料框的值更新第一個資料框。下面提供的示例故事是將 student_id 替換為 updatedId（如果存在于“old_id”列中）并將其替換為“new_id”。

import pandas as pd
import numpy as np

student = {
    'Name': ['John', 'Jay', 'sachin', 'Geetha', 'Amutha', 'ganesh'],
    'gender': ['male', 'male', 'male', 'female', 'female', 'male'],
    'math score': [50, 100, 70, 80, 75, 40],
    'student_Id': ['1234', '6788', 'xyz', 'abcd', 'ok83', '234v'],
}

updatedId = {
    'old_id' : ['ok83', '234v'],
    'new_id' : ['83ko', 'v432'],
}

df_student = pd.DataFrame(student)
df_updated_id = pd.DataFrame(updatedId)

print(df_student)
print(df_updated_id)

# Method with np.where
for index, row in df_updated_id.iterrows():
    df_student['student_Id'] = np.where(df_student['student_Id'] == row['old_id'], row['new_id'],  df_student['student_Id'])
    
# print(df_student)

# Method with dataframe.mask
for index, row in df_updated_id.iterrows():
   df_student['student_Id'].mask(df_student['student_Id'] == row['old_id'],  row['new_id'], inplace=True)

print(df_student)

上述兩種方法的結果都有效并產生了正確的結果

     Name  gender  math score student_Id
0    John    male          50       1234
1     Jay    male         100       6788
2  sachin    male          70        xyz
3  Geetha  female          80       abcd
4  Amutha  female          75       ok83
5  ganesh    male          40       234v

  old_id new_id
0   ok83   83ko
1   234v   v432

     Name  gender  math score student_Id
0    John    male          50       1234
1     Jay    male         100       6788
2  sachin    male          70        xyz
3  Geetha  female          80       abcd
4  Amutha  female          75       83ko
5  ganesh    male          40       v432

盡管如此，學生的實際資料大約有 500,000 行，updated_id 有 6000 行。

因此，由于回圈非常慢，我遇到了性能問題：

放置一個簡單的計時器來觀察為 df_updated_id 處理的記錄數

100 行 - numpy 時間=3.9020769596099854；掩碼時間=3.9169061183929443

500 行 - numpy 時間=20.42293930053711；掩碼時間=19.768696784973145

1000 行 - numpy 時間=40.06309795379639；掩碼時間=37.26559829711914

我的問題是我是否可以使用合并（連接表）來優化它，或者放棄 iterrows？我嘗試了類似下面的方法，但未能使其正常作業。根據另一個資料框中的匹配 id 替換資料框列值，以及如何在 Pandas 中迭代 DataFrame 中的行

請指教..

uj5u.com熱心網友回復：

我們只能replace

df_student.replace({'student_Id':df_updated_id.set_index('old_id')['new_id']},inplace=True)
df_student
Out[337]: 
     Name  gender  math score student_Id
0    John    male          50       1234
1     Jay    male         100       6788
2  sachin    male          70        xyz
3  Geetha  female          80       abcd
4  Amutha  female          75       83ko
5  ganesh    male          40       v432

uj5u.com熱心網友回復：

您也可以嘗試map：

df_student['student_Id'] = (
    df_student['student_Id'].map(df_updated_id.set_index('old_id')['new_id'])
                            .fillna(df_student['student_Id'])
)
print(df_student)

# Output
     Name  gender  math score student_Id
0    John    male          50       1234
1     Jay    male         100       6788
2  sachin    male          70        xyz
3  Geetha  female          80       abcd
4  Amutha  female          75       83ko
5  ganesh    male          40       v432

uj5u.com熱心網友回復：

另外，嘗試用字典理解替換：

df_student.replace({'student_Id':{o:n for o, n in zip(updatedId['old_id'], 
                                                      updatedId['new_id'])}})

輸出：

     Name  gender  math score student_Id
0    John    male          50       1234
1     Jay    male         100       6788
2  sachin    male          70        xyz
3  Geetha  female          80       abcd
4  Amutha  female          75       83ko
5  ganesh    male          40       v432

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/451431.html

標籤：Python 熊猫数据框麻木的

上一篇：ValueError：“c”引數必須是顏色、顏色序列或數字序列

下一篇：用索引值替換列的值