這是我的代碼:
df_05_body = spark.sql("""
select
gtin
, principalBody.constituents
from
v_df_04""")
df_05_body.createOrReplaceTempView("v_df_05_body")
df_05_body.printSchema()
這是架構:
root
|-- gtin: array (nullable = true)
| |-- element: string (containsNull = true)
|-- constituents: array (nullable = true)
| |-- element: array (containsNull = true)
| | |-- element: struct (containsNull = true)
| | | |-- constituentCategory: struct (nullable = true)
| | | | |-- value: string (nullable = true)
| | | | |-- valueRange: string (nullable = true)
如何更改principalBody.constituentsSQL 中的行以讀取欄位constituentCategory.value和constituentCategory.valueRange?
uj5u.com熱心網友回復:
該列constituents是結構陣列的陣列。如果您的意圖是獲得一個扁平結構,那么您需要扁平化嵌套陣列,然后分解:
df_05_body = spark.sql("""
WITH
v_df_04_exploded AS (
SELECT
gtin,
explode(flatten(principalBody.constituents)) AS constituent
FROM
v_df_04 )
SELECT
gtin,
constituent.constituentCategory.value,
constituent.constituentCategory.valueRange
FROM
v_df_04_exploded
""")
或者像這樣簡單地使用inlineafter flatten:
df_05_body = spark.sql("""
SELECT
gtin,
inline(flatten(principalBody.constituents))
FROM
v_df_04_exploded
""")
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/412331.html
標籤:
