我有一個看起來像這樣的 pyspark 資料框,
data = [("James","Joyce"),
("Michael","Doglus"),
("Robert","Connings"),
("Maria","XYZ"),
("Jen","PQR")
]
df2 = spark.createDataFrame(data, ["Name", "Lots_of_names"])
df2
Name Lots_of_names
0 James Joyce
1 Michael Doglus
2 Robert Connings
3 Maria XYZ
4 Jen PQR
我想將兩列合并為一個長列(可能在一個新的資料框中),它將有 10 行。有什么方法可以到達那里嗎?提前致謝。
uj5u.com熱心網友回復:
你可能想做這樣的事情
import pyspark.sql.functions as F
df_out = df2.select(F.explode(F.array("Name", "Lots_of_names")).alias("one_col"))
產生 df_out 如下
# one_col
#------
# James
# Joyce
# Michael
# Doglus
# Robert
# Connings
# Maria
# XYZ
# Jen
# PQR
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/360952.html
