這是我的輸入:
val df = Seq(
("Adam","Angra", "Anastasia"),
("Boris","Borun", "Bisma"),
("Shawn","Samar", "Statham")
).toDF("fname", "mname", "lname")
df.createOrReplaceTempView("df")
我希望 Spark sql 輸出如下所示:
struct
{"data_description":"fname","data_details":"Adam"},{"data_description":"mname","data_details":"Angra"},{"data_description":"lname","data_details":"Anastasia"}
{"data_description":"fname","data_details":"Boris"},{"data_description":"mname","data_details":"Borun"},{"data_description":"lname","data_details":"Bisma"}
{"data_description":"fname","data_details":"Shawn"},{"data_description":"mname","data_details":"Samar"},{"data_description":"lname","data_details":"Statham"}
到目前為止,我嘗試了以下內容:
val df1 = spark.sql("""select concat(fname,':',mname,":",lname) as name from df""")
df1.createOrReplaceTempView("df1")
val df2 = spark.sql("""select named_struct('data_description','fname','data_details',split(name, ':')[0]) as struct1,named_struct('data_description','mname','data_details',split(name, ':')[1]) as struct2, named_struct('data_description','lname','data_details',split(name, ':')[2]) as struct3 from df1""")
df2.createOrReplaceTempView("df2")
上面的輸出:
struct1 struct2 struct3
{"data_description":"fname","data_details":"Adam"} {"data_description":"mname","data_details":"Angra"} {"data_description":"lname","data_details":"Anastasia"}
{"data_description":"fname","data_details":"Boris"} {"data_description":"mname","data_details":"Borun"} {"data_description":"lname","data_details":"Bisma"}
{"data_description":"fname","data_details":"Shawn"} {"data_description":"mname","data_details":"Samar"} {"data_description":"lname","data_details":"Statham"}
但是我得到了 3 個不同的結構。我需要在一個用逗號分隔的單一結構中
uj5u.com熱心網友回復:
sql陳述句如下,其他的你懂的。
val sql = """
select
concat_ws(
','
,concat('{"data_description":"fname","data_details":"',fname,'"}')
,concat('{"data_description":"mname","data_details":"',mname,'"}')
,concat('{"data_description":"lname","data_details":"',lname,'"}')
) as struct
from df
"""
uj5u.com熱心網友回復:
您可以創建結構陣列,然后to_json在希望輸出為字串時使用:
spark.sql("""
select to_json(array(
named_struct('data_description','fname','data_details', fname),
named_struct('data_description','mname','data_details', mname),
named_struct('data_description','lname','data_details', lname)
)) as struct
from df
""").show()
// ----------------------------------------------------------------------------------------------------------------------------------------------------------------
//|struct |
// ----------------------------------------------------------------------------------------------------------------------------------------------------------------
//|[{"data_description":"fname","data_details":"Adam"},{"data_description":"mname","data_details":"Angra"},{"data_description":"lname","data_details":"Anastasia"}]|
//|[{"data_description":"fname","data_details":"Boris"},{"data_description":"mname","data_details":"Borun"},{"data_description":"lname","data_details":"Bisma"}] |
//|[{"data_description":"fname","data_details":"Shawn"},{"data_description":"mname","data_details":"Samar"},{"data_description":"lname","data_details":"Statham"}] |
// ----------------------------------------------------------------------------------------------------------------------------------------------------------------
如果你有很多列,你可以像這樣動態生成 struct sql 運算式:
val structs = df.columns.map(c => s"named_struct('data_description','$c','data_details', $c)").mkString(",")
val df2 = spark.sql(s"""
select to_json(array($structs)) as struct
from df
""")
如果您不想使用陣列,您可以簡單地連接to_json3 個結構體的結果:
val structs = df.columns.map(c => s"to_json(named_struct('data_description','$c','data_details', $c))").mkString(",")
val df2 = spark.sql(s"""
select concat_ws(',', $structs) as struct
from df
""")
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/387245.html
