我有一個帶有地圖的資料框:
sdf = spark.createDataFrame(
[
(1, {'Kira':25,'Lilly':15}),
(2, {'Tom':14}),
],
["id", "label"]
)
--- -------------------------
|id |label |
--- -------------------------
|1 |{Lilly -> 15, Kira -> 25}|
|2 |{Tom -> 14} |
--- -------------------------
我想將鍵放在一列中,將值放在另一列中,如下所示:
--- ----- ---
|id |name |age|
--- ----- ---
|1 |Kira |25 |
|1 |Lilly|15 |
|2 |Tom |14 |
--- ----- ---
uj5u.com熱心網友回復:
長手。使用地圖集合函式來創建名稱和年齡列。利用行內函式爆炸
sdf.withColumn('name',map_keys('label')).withColumn('age', map_values('label')).selectExpr('id','inline(arrays_zip(name,age))').show()
--- ----- ---
| id| name|age|
--- ----- ---
| 1|Lilly| 15|
| 1| Kira| 25|
| 2| Tom| 14|
--- ----- ---
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/491353.html
標籤:数据框 pyspark apache-spark-sql
上一篇:根據資料框中另一列的值創建列
