我有 2 個資料框,需要根據兩個資料框的“名稱”列獲取 system_type 列。
我有 500000 行 df1 作為格式
Name Timestamp usage AXCS 2022-01-01 5 BGXD 2022-02-01 70 HFSD 2022-03-01 45 AEVC 2022-01-01 25 BHRF 2022-02-01 12
和 550000 行 df2 作為
Name System_Type HFSD Dev BHRF Test BGXD Prod AEVC Prod AXCS Test
我使用了以下編碼
pd.merge(df1, df2, on="Name")
處理需要很多時間,是否有其他方法/方法來處理它。請指教
uj5u.com熱心網友回復:
你可以這樣做:
import pandas as pd
df1 = pd.DataFrame({
'System_Name':['AXCS','BGXD','HFSD','AEVC', 'BHRF'],
'Timestamp':['2022-01-01','2022-02-01','2022-03-01','2022-01-01', '2022-01-01'],
'usuage ':[5,70,45,25,12],
})
df2 = pd.DataFrame({
'System_Name':['HFSD','BHRF','BGXD','AEVC', 'AXCS'],
'System_Type':['Dev','Test','Prod','Prod', 'Test'],
})
# Get all diferent values
df3 = pd.merge(df1, df2, how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
# If you like to filter by a System_Name
df3 = pd.merge(df1, df2, on="System_Name", how='outer', indicator='Exist')
df3 = df3.loc[df3['Exist'] == 'both']
print(df3)
#輸出 :
System_Name Timestamp usuage System_Type Exist
0 AXCS 2022-01-01 5 Test both
1 BGXD 2022-02-01 70 Prod both
2 HFSD 2022-03-01 45 Dev both
3 AEVC 2022-01-01 25 Prod both
4 BHRF 2022-01-01 12 Test both
uj5u.com熱心網友回復:
您可以df2用作字典映射:
df1['System_Type'] = df1['Name'].map(df2.set_index('Name')['System_Type'])
print(df1)
# Output
Name Timestamp usage System_Type
0 AXCS 2022-01-01 5 Test
1 BGXD 2022-02-01 70 Prod
2 HFSD 2022-03-01 45 Dev
3 AEVC 2022-01-01 25 Prod
4 BHRF 2022-02-01 12 Test
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/443649.html
上一篇:如何繪制變數多年的月份
