我有兩個具有不同“d”值但具有相同值“a”和“b”的資料框
這是df1
df1 = spark.createDataFrame([
('c', 'd', 8),
('e', 'f', 8),
('c', 'j', 9),
], ['a', 'b', 'd'])
?
df1.show()
--- --- ---
| a| b| d|
--- --- ---
| c| d| 8|
| e| f| 8|
| c| j| 9|
--- --- ---
這是df 2
df2 = spark.createDataFrame([
('c', 'd', 7),
('e', 'f', 3),
('c', 'j', 8),
], ['a', 'b', 'd'])
df2.show()
--- --- ---
| a| b| d|
--- --- ---
| c| d| 7|
| e| f| 3|
| c| j| 8|
--- --- ---
我想獲得列“d”的值之間的差異,但我也想保留列“a”和“b”
df3
--- --- ---
| a| b| d|
--- --- ---
| c| d| 1|
| e| f| 5|
| c| j| 1|
--- --- ---
我嘗試在兩個資料幀之間做減法,但沒有奏效
df1.subtract(df2).show()
--- --- ---
| a| b| d|
--- --- ---
| c| d| 8|
| e| f| 8|
| c| j| 9|
--- --- ---
uj5u.com熱心網友回復:
您可以這樣做:
df3 = df1.join(df2, on = ['b', 'a'], how = 'outer').select('a', 'b', (df1.d - df2.d).alias('diff'))
df3.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/461311.html
標籤:Python 数据框 pyspark apache-spark-sql 朱庇特
