我試圖通過調整兩個不同表格的構建邏輯來將兩個不同的筆記本統一為一個。
第一條是這樣寫的:
spark.sql(''' SELECT CD_CLI,
MAX(VL_RPTD_UTZO) AS MAX_VL_RPTD_UTZO,
'2017-01-31' AS DT_MVTC
FROM vl_rptd_utzo
WHERE DT_EXTC BETWEEN '2016-07-31' AND '2016-12-31'
GROUP BY CD_CLI
''').createOrReplaceTempView('vl_rptd_max_utzo_2017_01_31')
第二個:
spark.sql('''SELECT CD_CLI,
CASE WHEN SUM(in_lim_crt) > 0
THEN ROUND(SUM(SUM_VL_TTL_FAT)/SUM(in_lim_crt), 4)
ELSE -99999999999
END AS VL_MED_FAT,
'2017-01-31' as DT_MVTC
FROM in_lim_fat
WHERE DT_MVTC BETWEEN '2016-07-31' AND '2016-12-31'
GROUP BY CD_CLI
''').createOrReplaceTempView('media_vl_fatura_2017_01_31')
我的,也許是天真的?,方法是聯合這兩個選擇,因為它們從同一來源呼叫相同的欄位:
spark.sql('''SELECT CD_CLI,
CASE WHEN SUM(in_lim_crt) > 0
THEN ROUND(SUM(SUM_VL_TTL_FAT)/SUM(in_lim_crt), 4)
ELSE -99999999999
END AS VL_MED_FAT,
'2017-01-31' as DT_MVTC
FROM in_lim_fat
WHERE DT_MVTC BETWEEN '2016-07-31' AND '2016-12-31'
GROUP BY CD_CLI
UNION
SELECT CD_CLI,
MAX(VL_RPTD_UTZO) AS MAX_VL_RPTD_UTZO,
'2017-01-31' AS DT_MVTC
FROM vl_rptd_utzo
WHERE DT_EXTC BETWEEN '2016-07-31' AND '2016-12-31'
GROUP BY CD_CLI
''').createOrReplaceTempView('new_table')
但是當我要求描述時:
spark.sql('describe new_table').show(10, False)
輸出是:
---------- ------------- -------
|col_name |data_type |comment|
---------- ------------- -------
|CD_CLI |int |null |
|VL_MED_FAT|decimal(38,4)|null |
|DT_MVTC |string |null |
---------- ------------- -------
為什么 MAX_VL_RPTD_UTZO 沒有出現在新表中?我是 sql 新手,也許這很天真和簡單,但我無法解決它。
uj5u.com熱心網友回復:
您的第一個選擇有CD_CLI,VL_MED_FAT和DT_MVTC
你的第二個選擇有CD_CLI,MAX_VL_RPTD_UTZO和DT_MVTC
Spark 將使用第一個查詢的列名作為模式并將其應用于聯合中的其他子序列查詢,因此 的值MAX_VL_RPTD_UTZO將出現在VL_MED_FAT.
編輯 #1:如果你想要 4 列,那么它必須在 2 個查詢之間保持一致,所以像這樣
select CD_CLI, VL_MED_FAT, null as MAX_VL_RPTD_UTZO, DT_MVTC from ...
union
select CD_CLI, null as VL_MED_FAT, MAX_VL_RPTD_UTZO, DT_MVTC from ...
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/316303.html
