我需要聯合結果和收集集但想忽略 null
val df1 = Seq(
("1","Adam","Angra", "Anastasia")
).toDF("id","fname", "mname", "lname")
df1.createOrReplaceTempView("df1")
val df2 = Seq(
("1",null,null, "Bosma")
).toDF("id","fname", "mname", "lname")
df2.createOrReplaceTempView("df2")
df2 資料框始終具有 fname 和 mname null - 當按 id 分組時,我需要將 lname 連接為串列
當前查詢:
select id,fname,mname,collect_set(lname) as lname from (select * from df1 union select * from df2) group by id,fname, mname
實際輸出
id fname mname lname
1 Adam Angra ["Anastasia"]
1 null null ["Bosma"]
預期產出
id fname mname lname
1 Adam Angra ["Anastasia","Bosma"]
需要幫助以使用 SQL 查詢獲得超出預期的結果
uj5u.com熱心網友回復:
您可以分組id并使用first函式(忽略null值)來獲取fname, mname.
val sql = """
select id,first(fname, true) as fname,first(mname, true) as mname,collect_set(lname) as lname from
(select * from df1 union select * from df2)
group by id
"""
val df = spark.sql(sql)
df.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/441149.html
標籤:sql 斯卡拉 阿帕奇火花 apache-spark-sql
上一篇:如何使用變數作為火花選擇欄位
