我需要從 Spark 資料框的型別模式創建一個字典pyspark.sql.types.StructType。
代碼需要遍歷整個StructType,只找到那些StructField型別的元素,StructType并且在提取到字典中時,使用nameparent StructField,因為keywhilevalue將name只屬于第一個嵌套/child StructField。
示例架構 ( StructType):
root
|-- field_1: int
|-- field_2: int
|-- field_3: struct
| |-- date: date
| |-- timestamp: timestamp
|-- field_4: int
期望的結果:
{"field_3": "date"}
uj5u.com熱心網友回復:
您可以使用字典推導在架構中導航。
{x.name: x.dataType[0].name for x in df.schema if x.dataType.typeName() == 'struct'}
測驗#1
df = spark.createDataFrame([], 'field_1 int, field_2 int, field_3 struct<date:date,timestamp:timestamp>, field_4 int')
df.printSchema()
# root
# |-- field_1: integer (nullable = true)
# |-- field_2: integer (nullable = true)
# |-- field_3: struct (nullable = true)
# | |-- date: date (nullable = true)
# | |-- timestamp: timestamp (nullable = true)
# |-- field_4: integer (nullable = true)
{x.name: x.dataType[0].name for x in df.schema if x.dataType.typeName() == 'struct'}
# {'field_3': 'date'}
測驗#2
df = spark.createDataFrame([], 'field_1 int, field_2 struct<col_int:int,col_long:long>, field_3 struct<date:date,timestamp:timestamp>')
df.printSchema()
# root
# |-- field_1: integer (nullable = true)
# |-- field_2: struct (nullable = true)
# | |-- col_int: integer (nullable = true)
# | |-- col_long: long (nullable = true)
# |-- field_3: struct (nullable = true)
# | |-- date: date (nullable = true)
# | |-- timestamp: timestamp (nullable = true)
{x.name: x.dataType[0].name for x in df.schema if x.dataType.typeName() == 'struct'}
# {'field_2': 'col_int', 'field_3': 'date'}
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/527668.html
上一篇:更新串列字計數器
