我在 PySpark 中創建了如下資料框:
from pyspark.sql.types import StructType, StructField, StringType, IntegerType
data_1 = [
("rule1", "", "1", "2", "3", "4"),
("rule2", "1", "3", "5", "6", "4"),
("rule3", "", "0", "1", "2", "5"),
("rule4", "0", "1", "3", "6", "2"),
]
schema = StructType(
[
StructField("_c0", StringType(), True),
StructField("para1", StringType(), True),
StructField("para2", StringType(), True),
StructField("para3", StringType(), True),
StructField("para4", StringType(), True),
StructField("para5", StringType(), True),
]
)
df = spark.createDataFrame(data=data_1,schema=schema)
這給出了:
----- ----- ----- ----- ----- -----
|_c0 |para1|para2|para3|para4|para5|
----- ----- ----- ----- ----- -----
|rule1| |1 |2 |3 |4 |
|rule2|1 |3 |5 |6 |4 |
|rule3| |0 |1 |2 |5 |
|rule4|0 |1 |3 |6 |2 |
----- ----- ----- ----- ----- -----
我想把它轉換成這樣的字典:
dict = {'rule1': {'para2': '1', 'para3': '2','para4': '3','para5': '4'},
'rule2': {'para1': '1', 'para2': '3','para3': '5','para4': '6','para5': '4'}, ...}
具有空""值的列不應出現在最終字典中,例如在“rule1”的字典中,“para1”不存在。其余的都在場。
我嘗試將此作為初始代碼,但并不令人滿意:
dict1 = df.rdd.map(lambda row: row.asDict()).collect()
final_dict = {d['_c0']: d[col] for d in dict1 for col in df.columns}
# Returns {'rule1': '4', 'rule2': '4', 'rule3': '5', 'rule4': '2'}
uj5u.com熱心網友回復:
您可以嘗試這些嵌套的字典推導:
dict_rules = {r['_c0']: {k: v
for k, v in r.asDict().items()
if k != '_c0' and v != ''}
for r in df.collect()}
# {'rule1': {'para2': '1', 'para3': '2', 'para4': '3', 'para5': '4'},
# 'rule2': {'para1': '1', 'para2': '3', 'para3': '5', 'para4': '6', 'para5': '4'},
# 'rule3': {'para2': '0', 'para3': '1', 'para4': '2', 'para5': '5'},
# 'rule4': {'para1': '0', 'para2': '1', 'para3': '3', 'para4': '6', 'para5': '2'}}
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/527671.html
上一篇:使用帶引數的函式提取值
