我有一個包含這些資料的 DF:
-------- ------------------------------------------
|recType |value |
-------- ------------------------------------------
|{"id": 1|{"id": 1, "user_id": 100, "price": 50} |
...
我可以用 過濾 recType contains,但是如何處理===和引號?我似乎每次都會遇到一些錯誤。
uj5u.com熱心網友回復:
我知道這里的列是字串。如果是這樣,from_json 函式可以將它們決議為結構。
import org.apache.spark.sql.types.{StructField, StructType, IntegerType}
import org.apache.spark.sql.functions.from_json
val recTypeSchema = StructType(Array(
StructField("id", IntegerType, true)
))
val valueSchema = StructType(Array(
StructField("id", IntegerType, true),
StructField("user_id", IntegerType, true),
StructField("price", IntegerType, true)
))
val parsedDf = df
.withColumn("recType", from_json($"recType", recTypeSchema))
.withColumn("value", from_json($"value", valueSchema))
parsedDf.printSchema
root
|-- recType: struct (nullable = true)
| |-- id: integer (nullable = true)
|-- value: struct (nullable = true)
| |-- id: integer (nullable = true)
| |-- user_id: integer (nullable = true)
| |-- price: integer (nullable = true)
parsedDf.filter($"recType.id" === 1).show
------- ------------
|recType| value|
------- ------------
| {1}|{1, 100, 50}|
------- ------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/433552.html
標籤:数据框 斯卡拉 阿帕奇火花 apache-spark-sql
下一篇:使用索引和資料框查找標準差
