我有兩個資料框,df并且在某些列df1中df我有 NULL,但在df1這些列中我有非空值。我只需要覆寫存在 NULL 的行。
以下df是:
------------ -------------------- ------- --------------- -------------------- ---------- ------------
| Id| Name|Country| City| Address| Latitude| Longitude|
------------ -------------------- ------- --------------- -------------------- ---------- ------------
| 42949672960|Americana Resort ...| US| Dillon| 135 Main St| null| null|
| 42949672965|Comfort Inn Delan...| US| Deland|400 E Internation...| 29.054737| -81.297208|
| 60129542147|Ubaa Old Crawford...| US| Des Plaines| 5460 N River Rd| null| null|
以下df1是:
------------- -------------------- ------- ------------ -------------------- ---------- ------------
| Id| Name|Country| City| Address| Latitude| Longitude|
------------- -------------------- ------- ------------ -------------------- ---------- ------------
| 42949672960|Americana Resort ...| US| Dillon| 135 Main St|39.6286685|-106.0451009|
| 60129542147|Ubaa Old Crawford...| US| Des Plaines| 5460 N River Rd|42.0654049| -87.8916252|
------------- -------------------- ------- ------------ -------------------- ---------- ------------
我想要這個結果:
------------ -------------------- ------- --------------- -------------------- ---------- ------------
| Id| Name|Country| City| Address| Latitude| Longitude|
------------ -------------------- ------- --------------- -------------------- ---------- ------------
| 42949672960|Americana Resort ...| US| Dillon| 135 Main St|39.6286685|-106.0451009|
| 42949672965|Comfort Inn Delan...| US| Deland|400 E Internation...| 29.054737| -81.297208|
...
...
uj5u.com熱心網友回復:
您可以左連接或內連接它們,然后使用合并來選擇第一個非空緯度/經度。
df1
----------- --------- ----------
| id| lat| lon|
----------- --------- ----------
|42949672960| null| null|
|42949672965|29.054737|-81.297208|
|60129542147| null| null|
----------- --------- ----------
df2
----------- ---------- ------------
| id| lat| lon|
----------- ---------- ------------
|42949672960|39.6286685|-106.0451009|
|60129542147|42.0654049| -87.8916252|
----------- ---------- ------------
加入他們
from pyspark.sql import functions as F
(df1
.join(df2, on=['id'], how='left')
.select(
F.col('id'),
F.coalesce(df1['lat'], df2['lat']).alias('lat'),
F.coalesce(df1['lon'], df2['lon']).alias('lon')
)
.show()
)
# ----------- ---------- ------------
# | id| lat| lon|
# ----------- ---------- ------------
# |42949672965| 29.054737| -81.297208|
# |60129542147|42.0654049| -87.8916252|
# |42949672960|39.6286685|-106.0451009|
# ----------- ---------- ------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/345662.html
