我想加入兩個不同的 DataFrames ( dfAand dfB) 構建如下:
dfA.show()
----- ------- -------
| id_A| name_A|address|
----- ------- -------
| 1| AAAA| Paris|
| 4| DDDD| Sydney|
----- ------- -------
dfB.show()
----- ------- ---------
| id_B| name_B| job|
----- ------- ---------
| 1| AAAA| Analyst|
| 2| AERF| Engineer|
| 3| UOPY| Gardener|
| 4| DDDD| Insurer|
----- ------- ---------
我需要使用以下串列來進行連接:
val keyListA = List("id_A", "name_A")
val keyListB = List("id_B", "name_B")
一個簡單的解決方案是:
val join = dfA.join(
dfA("id_A") === dfB("id_B") &&
dfA("name_A") === dfB("name_B"),
"left_outer")
是否有允許您使用keyListA和keyListB串列進行此連接的語法?
uj5u.com熱心網友回復:
如果您真的想從列名串列構建連接運算式:
import org.apache.spark.sql.{Column, DataFrame}
import org.apache.spark.sql.functions._
val dfA: DataFrame = ???
val dfB: DataFrame = ???
val keyListA = List("id_A", "name_A", "property1_A", "property2_A", "property3_A")
val keyListB = List("id_B", "name_B", "property1_B", "property2_B", "property3_B")
def joinExprsFrom(keyListA: List[String], keyListB: List[String]): Column =
keyListA
.zip(keyListB)
.map { case (fromA, fromB) => col(fromA) === col(fromB) }
.reduce((acc, expr) => acc && expr )
dfA.join(
dfB,
joinExprsFrom(keyListA, keyListB),
"left_outer")
你需要確保keyListA與keyListB具有相同的尺寸和非空。
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/381442.html
