我們有 2 個資料集,目標是將一個資料集中的列添加到另一個資料集中。下面是代碼示例:
import pandas as pd
d1 = {'city_id': [116,1,1,1,116,1,1,116,1], 'key': [14,14,22,21,22,14,13,80,99]}
d2={'key':[14,22,80],'population':[2000,7500,11000],'median_income':[30000,50000,44000]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
print(df1)
print()
print(df2)
city_id key
0 116 14
1 1 14
2 1 22
3 1 21
4 116 22
5 1 14
6 1 13
7 116 80
8 1 99
key population median_income
0 14 2000 30000
1 22 7500 50000
2 80 11000 44000
在下一步我做:
print(df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
))
city_id key population median_income
0 116 14 2000 30000
4 116 22 7500 50000
7 116 80 11000 44000
比,當我嘗試將其分配到原始資料框中時,出現錯誤:
df1.loc[df1['city_id']==116]=df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
)
ValueError: shape mismatch: value array of shape (3,4) could not be broadcast to indexing result of shape (3,2)
雖然預期的結果是:
city_id key population median_income
0 116 14 2000 30000
1 1 14 NaN NaN
2 1 22 NaN NaN
3 1 21 NaN NaN
4 116 22 7500 50000
5 1 14 NaN NaN
6 1 13 NaN NaN
7 116 80 11000 44000
8 1 99 NaN NaN
什么可能解決這個問題?
筆記!我們不能使用合并,因為實際上有 20 多個不同的“city_id”,它會創建許多后綴,例如“population_x”、“population_y”、“population_z”...“median_income_x”、“median_income_y”、“median_income_z”,其中似乎不太方便。這個想法是為每個 city_id 創建函式并使用 assing。
uj5u.com熱心網友回復:
我認為在這種情況下這不太可能是正確的做法。您幾乎可以肯定地根據需要使用合并和熔化。話雖如此,您可以簡單地將兩個新的空列添加到原始資料集中。
import pandas as pd
d1 = {'city_id': [116,1,1,1,116,1,1,116,1], 'key': [14,14,22,21,22,14,13,80,99]}
d2={'key':[14,22,80],'population':[2000,7500,11000],'median_income':[30000,50000,44000]}
df1 = pd.DataFrame(data=d1)
df2 = pd.DataFrame(data=d2)
import numpy as np
df1[["population", "median_income"]] = np.nan
df1.loc[df1['city_id']==116]=df1.loc[df1['city_id']==116].assign(
population=lambda x: x['key'].map(dict(zip(df2['key'],df2['population'])))
,median_income=lambda x:x['key'].map(dict(zip(df2['key'],df2['median_income'])))
)
這可以按您的意愿作業。
uj5u.com熱心網友回復:
看起來你最好將兩者合并,然后將 city_id 不是 116 的列清空。
df3 = df1.merge(df2, on='key', how='left')
df3.loc[df3['city_id'] != 116, ['population', 'median_income']] = np.nan
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/467220.html
標籤:Python python-3.x 熊猫 数据框
