我有這個資料框:
---- --------------------------------
|name|dates |
---- --------------------------------
|A |[[1994, 12, 11], [,,]] |
|B |[[1994, 12, 11], [1994, 12, 15]]|
---- --------------------------------
使用此架構:
root
|-- name: string (nullable = true)
|-- dates: struct (nullable = true)
| |-- start_date: struct (nullable = true)
| | |-- year: integer (nullable = true)
| | |-- month: integer (nullable = true)
| | |-- day: integer (nullable = true)
| |-- end_date: struct (nullable = true)
| | |-- year: integer (nullable = true)
| | |-- month: integer (nullable = true)
| | |-- day: integer (nullable = true)
當里面的所有欄位end_date都為空時,我想將此作為輸出,將結束日期設定為空
---- --------------------------------
|name|dates |
---- --------------------------------
|A |[[1994, 12, 11],] |
|B |[[1994, 12, 11], [1994, 12, 15]]|
---- --------------------------------
uj5u.com熱心網友回復:
您可以dates通過從現有屬性重新創建新結構來更新結構列,并使用when運算式檢查所有end_dates屬性是否為空:
val df2 = df.withColumn(
"dates",
struct(
col("dates.start_date"), // keep start_date
when(
Seq("year", "month", "day")
.map(x => col(s"dates.end_date.$x").isNull)
.reduce(_ and _),
lit(null).cast("struct<year:int,month:int,day:int>")
).otherwise(col("dates.end_date")).alias("end_date") // set end_date to null if all attr are null
)
)
df2.show(false)
// ---- --------------------------------
//|name|dates |
// ---- --------------------------------
//|A |[[1994, 12, 11],] |
//|B |[[1994, 12, 11], [1994, 12, 25]]|
// ---- --------------------------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/383932.html
