如何處理這種JSON格式的資料Spark Scala?
Note:對于每個 ID,我們可能有多個 Party
{
"123": {
"Partyl": {
"FIRSTNAMEBEN": "ABC",
"ALIASBEN": "",
"RELATIONSHIPTYPE": "ABC, FGHIJK LMN",
"DATEOFBIRTH": "7/Oct/1969"
},
"Party2": {
"FIRSTNAMEBEN": "ABCC",
"ALIASBEN": "",
"RELATIONSHIPTYPE": "ABC, FGHIJK LMN",
"DATEOFBIRTH": "7/Oct/1969"
}
},
"GeneratedTime": "2022-01-30 03:09:26"
}
Expected Output:
ID FIRSTNAMEBEN ALIASBEN RELATIONSHIPTYPE DATEOFBIRTH
123 ABC Null ABC, FGHIJK LMN 7/Oct/1969
123 ABCC Null ABC, FGHIJK LMN 7/Oct/1969
uj5u.com熱心網友回復:
您可以使用以下方式來實作您想要的。
val df = spark.read.option("multiline", true).json("...json")
val df1 = df.
select(explode(array("123.*")) as "level1").
withColumn("ID",lit(df.columns(0))).
withColumn("FIRSTNAMEBEN", col("level1.FIRSTNAMEBEN")).
withColumn("ALIASBEN", col("level1.ALIASBEN")).
withColumn("RELATIONSHIPTYPE", col("level1.RELATIONSHIPTYPE")).
withColumn("DATEOFBIRTH", col("level1.DATEOFBIRTH")).
drop("level1")
Output:
scala> df1.show(false)
--- ------------ -------- ---------------- -----------
|ID |FIRSTNAMEBEN|ALIASBEN|RELATIONSHIPTYPE|DATEOFBIRTH|
--- ------------ -------- ---------------- -----------
|123|ABCC | |ABC, FGHIJK LMN |7/Oct/1969 |
|123|ABC | |ABC, FGHIJK LMN |7/Oct/1969 |
--- ------------ -------- ---------------- -----------
uj5u.com熱心網友回復:
schema=StructType([StructField('123',MapType(StringType(),StructType([
StructField('ALIASBEN',StringType(),True),StructField('DATEOFBIRTH',StringType(),True),StructField('FIRSTNAMEBEN',StringType(),True),StructField('RELATIONSHIPTYPE',StringType(),True),]),True))])
df =spark.read.json(path= '/content/sample_data/new.json',multiLine=True,schema=schema)
df.select(explode('123')).select('value.*').show()
uj5u.com熱心網友回復:
// Please note that if your json data is multiline the do add an option for multiline
spark.read.option("multiline", true).json("/path/to/json/file")
.selectExpr("stack(1,'123',`123`) as (ID,value)")
.withColumn("Party1",col("value.Partyl"))
.withColumn("Party2",col("value.Party2"))
.selectExpr("ID","stack(2,'Party1',Party1,'Party2',Party2) as (key1,value1)")
.select(col("ID"),col("value1.*")).show
// Here is the output of the above code.
--- -------- ----------- ------------ ----------------
| ID|ALIASBEN|DATEOFBIRTH|FIRSTNAMEBEN|RELATIONSHIPTYPE|
--- -------- ----------- ------------ ----------------
|123| | 7/Oct/1969| ABC| ABC, FGHIJK LMN|
|123| | 7/Oct/1969| ABCC| ABC, FGHIJK LMN|
--- -------- ----------- ------------ ----------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/431218.html
標籤:json 斯卡拉 阿帕奇火花 蜂巢 apache-spark-sql
上一篇:如何在php中訪問json資料
