我有兩個資料框 ( Dataset<Row>) 具有相同的列,但結構的順序陣列不同。
df1:
root
|-- root: string (nullable = false)
|-- array_nested: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- array_id: integer (nullable = false)
| | |-- array_value: string (nullable = false)
---- ------------
|root|array_nested|
---- ------------
|One |[[1, 1-One]]|
---- ------------
df2:
root
|-- root: string (nullable = false)
|-- array_nested: array (nullable = false)
| |-- element: struct (containsNull = true)
| | |-- array_value: string (nullable = false)
| | |-- array_id: integer (nullable = false)
---- ------------
|root|array_nested|
---- ------------
|Two |[[2-Two, 2]]|
---- ------------
我想讓模式相同,但是當我嘗試我的方法時,它會生成更多的陣列:
List<Column> updatedStructNames = new ArrayList<>();
updatedStructNames.add(col("array_nested.array_id"));
updatedStructNames.add(col("array_nested.array_value"));
Column[] updatedStructNameArray = updatedStructNames.toArray(new Column[0]);
Dataset<Row> df3 = df2.withColumn("array_nested", array(struct(updatedStructNameArray)));
它將像這樣生成模式:
root
|-- root: string (nullable = false)
|-- array_nested: array (nullable = false)
| |-- element: struct (containsNull = false)
| | |-- array_id: array (nullable = false)
| | | |-- element: integer (containsNull = true)
| | |-- array_value: array (nullable = false)
| | | |-- element: string (containsNull = true)
---- ----------------
|root|array_nested |
---- ----------------
|Two |[[[2], [2-Two]]]|
---- ----------------
我怎樣才能實作相同的模式?
uj5u.com熱心網友回復:
您可以使用transform函式來更新array_nested列的結構元素:
Dataset < Row > df3 = df2.withColumn(
"array_nested",
expr("transform(array_nested, x -> struct(x.array_id as array_id, x.array_value as array_value))")
);
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/387860.html
標籤:爪哇 数组 阿帕奇火花 结构 apache-spark-sql
