我有一個 SQL 查詢,我試圖將其轉換為 PySpark,它有一些joins 和多個where條件:
UPDATE COMPANY1
INNER JOIN COMPANY2
ON COMPANY1.C1_PROFIT = COMPANY2.C2_PROFIT
SET COMPANY2.C2_TARGET = "1"
WHERE (((COMPANY2.C2_TARGET) Is Null)
AND ((COMPANY1.C1_SALES) Is Null)
AND ((COMPANY2.C2_PROFIT) Is Not Null));
我正在嘗試執行的 PySpark 查詢(df_1-> COMPANY2& df_2-> Company1):
join = ((df_1.C2_PROFIT == df_2.C1_PROFIT) & \
(df_1.C2_TARGET=='') & \
(df_2.C1_SALES=='') & \
(df_1.C2_PROFIT!=''))
df_1 = (df_1.alias('a')
.join(df_2.alias('b'), join, 'left')
.select(
*[c for c in df_1.columns if c != 'C2_TARGET'],
F.expr("nvl2(b.C1_PROFIT, '1', a.C2_TARGET) C2_TARGET")
)
)
但我仍然null在“C2_TARGET”列中獲得值。
有關資訊:列“C1_Profit”是無空值的,但在“C2_Profit”中,我們有時也有null值。
示例輸入:
------------------ --------------
| C1_PROFIT |C1_SALES |
------------------ --------------
|5637 | Positive |
|7464 | |
|43645 | |
|64657 | Growth P|
------------------ --------------
------------------ --------------
| C2_PROFIT |C2_TARGET |
------------------ --------------
| | |
|7464 | |
|43645 | |
|64657 | |
------------------ --------------
預期結果:

uj5u.com熱心網友回復:
在這個答案中,你有一個如何做的例子
UPDATE A INNER JOIN B
...
SET A...
在您的情況下,您SET B...:
UPDATE A INNER JOIN B
...
SET B...
您已正確切換資料框的順序。
不正確的是它''與null. 您必須在您的條件下使用.isNull()和.isNotNull()。
示例輸入:
from pyspark.sql import functions as F
df_1 = spark.createDataFrame(
[(5637, 'Positive'),
(7464, None),
(43645, None),
(64657, 'Growth P')],
['C1_PROFIT', 'C1_SALES'])
df_2 = spark.createDataFrame(
[(None, None),
(7464, None),
(43645, None),
(64657, None)],
'C2_PROFIT int, C2_TARGET string')
腳本:
join_on = (df_1.C1_PROFIT == df_2.C2_PROFIT) & \
df_2.C2_TARGET.isNull() & \
df_1.C1_SALES.isNull() & \
df_2.C2_PROFIT.isNotNull()
df = (df_2.alias('a')
.join(df_1.alias('b'), join_on, 'left')
.select(
*[c for c in df_2.columns if c != 'C2_TARGET'],
F.expr("nvl2(b.C1_PROFIT, '1', a.C2_TARGET) C2_TARGET")
)
)
df.show()
# --------- ---------
# |C2_PROFIT|C2_TARGET|
# --------- ---------
# | null| null|
# | 7464| 1|
# | 64657| null|
# | 43645| 1|
# --------- ---------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/533252.html
標籤:sql加入pysparkapache-spark-sqlsql更新
上一篇:加入2表不顯示資料
