來自SAS的我想在PySpark的一個SQL連接中連接多個資料幀。在SAS中,這是有可能的,但是,我感覺在Pyspark中這是不可能的。我的腳本看起來像這樣:
A.createOrReplaceTempView("A"/span>)
B.createOrReplaceTempView("B")
C.createOrReplaceTempView("C")
D = spark.sql("select a.*, b.VAR_B, C.VAR_C
from A a left join B b on a.VAR == b.VAR
left join C c on a.VAR == c.VAR")
這在PySpark中可能嗎?謝謝你!
uj5u.com熱心網友回復:
在PySpark中,連接的作業方式與SQL類似。
首先定義一個df,例如
df_a = spark.sql('select * from a)
df_b = spark.sql('select * from b)
df_c = spark.sql('select * from c)
然后你可以按以下方式進行連接-
df_joined_a = df_a.join(df_b, a['VAR'/span>] == b['VAR'/span>], 'left'/span>)
.select(df_a['*'], df_b['VAR'] .alias('b_var')
df_joined_c = df_joined_a.join(df_c, df_joined_a['VAR'] == c['VAR'], 'left')
.select(df_joined_a['*'], df.c['VAR'] )
更多的例子可以在這里找到 - https://sparkbyexamples.com/pyspark/pyspark-join-explained-with-examples/
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/320143.html
標籤:
下一篇:連接中的SQL行資料到列資料
