我有個問題。我想合并兩個資料框。一個df_neighbourhood_one是單熱編碼矩陣,一個df是普通資料幀。問題是當我合并兩個資料框時,我只得到 5 行,但應該有 7 行。
d = {'host_id': [1, 1, 2, 3, 4, 5, 6],
'id': [10, 11, 20, 30, 40, 50, 60],
'neighbourhood': ["Centrum", 'West', 'West', 'North', 'Centrum', 'West', 'West'],
'something': ['some1', 'some2', 'some3', 'some4', 'some5', 'some6', 'some7']}
df = pd.DataFrame(data=d)
print(df)
print()
df_neighbourhood = df.copy()
df_neighbourhood = df_neighbourhood.set_index(['host_id', 'id'])
df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood']).head()
print(df_neighbourhood_one)
df = pd.merge(df, df_neighbourhood_one, right_index=True, left_on=['host_id', 'id'])
print(df)
print(df.shape)
# df
host_id id neighbourhood something
0 1 10 Centrum some1
1 1 11 West some2
2 2 20 West some3
3 3 30 North some4
4 4 40 Centrum some5
5 5 50 West some6
6 6 60 West some7
# df_neighbourhood_one
Centrum North West
host_id id
1 10 1 0 0
11 0 0 1
2 20 0 0 1
3 30 0 1 0
4 40 1 0 0
# df but merged
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
(5, 7) # df shape
我想要的是
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
5 5 50 West some6 0 0 1
6 6 60 West some7 0 0 1
uj5u.com熱心網友回復:
只需how='left'與fillna.
此外,您head在創建df_neighbourhood_one. 不要用那個。
In [734]: df_neighbourhood_one = pd.get_dummies(df_neighbourhood['neighbourhood'])
In [735]: pd.merge(df, df_neighbourhood_one, right_index=True, left_on=['host_id', 'id'], how='left').fillna(0)
Out[735]:
host_id id neighbourhood something Centrum North West
0 1 10 Centrum some1 1 0 0
1 1 11 West some2 0 0 1
2 2 20 West some3 0 0 1
3 3 30 North some4 0 1 0
4 4 40 Centrum some5 1 0 0
5 5 50 West some6 0 0 1
6 6 60 West some7 0 0 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/360287.html
