val someDF = Seq(
(8, """{"details":{"decision":"ACCEPT","source":"Rules"}"""),
(64, """{"details":{"decision":"ACCEPT","source":"Rules"}""")
).toDF("number", "word")
someDF.show(false) :
------ ---------------------------------------------------------------
|number|word |
------ ---------------------------------------------------------------
|8 |{"details":{"decision":"ACCEPT","source":"Rules"} |
|64 |{"details":{"decision":"ACCEPT","source":"Rules"} |
------ ---------------------------------------------------------------
問題陳述: 我想將所有列合并為 1 列,JSON 型別保留在單個輸出列中。這不是轉義引號等,就像我在下面得到的那樣。
我試過的:
someDF.toJSON.toDF.show(false)
// this escaped the quotes, which I don't want
------------------------------------------------------------------------------------------------
|value |
------------------------------------------------------------------------------------------------
|{"number":8,"word":"{\"details\":{\"decision\":\"ACCEPT\",\"source\":\"Rules\"}"} |
|{"number":64,"word":"{\"details\":{\"decision\":\"ACCEPT\",\"source\":\"Rules\"}"} |
------------------------------------------------------------------------------------------------
同樣的問題 someDF.select( to_json(struct(col("*"))).alias("value"))
我想要的是:
------------------------------------------------------------------------------------------------
|value |
------------------------------------------------------------------------------------------------
|{"number":8,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}} |
|{"number":64,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}} |
------------------------------------------------------------------------------------------------
有沒有辦法做到這一點?
更新: 雖然我在這里使用了一個簡單的資料框,但實際上我有數百列,因此手動定義的架構對我不起作用。
uj5u.com熱心網友回復:
"someDF" 中的 "word" 列是字串型別,因此to_json將其視為常規字串。這里的關鍵是在使用to_json.
import org.apache.spark.sql.functions._
import org.apache.spark.sql.types._
val someDF = Seq(
(8, """{"details":{"decision":"ACCEPT","source":"Rules"}}"""),
(64, """{"details":{"decision":"ACCEPT","source":"Rules"}}""")
).toDF("number", "word")
val schema = StructType(Seq(StructField("details", StructType(Seq(StructField("decision", StringType), StructField("source", StringType))))))
someDF.select(to_json(struct($"number", from_json($"word", schema).alias("word"))).alias("value")).show(false)
結果:
-----------------------------------------------------------------------
|value |
-----------------------------------------------------------------------
|{"number":8,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}} |
|{"number":64,"word":{"details":{"decision":"ACCEPT","source":"Rules"}}}|
-----------------------------------------------------------------------
uj5u.com熱心網友回復:
你可以檢索使用的列清單columns的方法對你資料框中,然后手動構建使用的組合您的JSON字串concat,并concat_ws內置函式:
import org.apache.spark.sql.functions.{col, concat, concat_ws, lit}
val result = someDF.select(
concat(
lit("{"),
concat_ws(
",",
someDF.columns.map(x => concat(lit("\""), lit(x), lit("\":"), col(x))): _*
),
lit("}")).as("value")
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/366277.html
上一篇:jdbc讀取resultSetbycolName問題的別名
下一篇:區分Scala-3列舉和密封特征
