
最重要的 col1, col2.... 的數量是未知的,可能有數百列。所以我們將不得不動態地進行連接和分割
這怎么是火花斯卡拉?
uj5u.com熱心網友回復:
資料集物件具有columns為您提供列名陣列的屬性。很容易過濾 的列df1并留下那些存在于 中的列df2,然后用于map派生所需的列:
val df1 = Seq(("xyz", 10.0, 12.0),
("abc", 42.0, 7.0)).toDF("join_col", "col1", "col2")
val df2 = Seq(("xyz", 7.0, 22.0, 11.0),
("abc", 11.0, 9.0, 42.0)).toDF("join_col", "col1", "col2", "col3")
// Common columns in both datasets
val cols = df1.columns.filter(df2.columns.toSet)
val join_col = cols(0)
val joined = df1.join(df2, df1(join_col) === df2(join_col))
// Columns from df1
val df1Cols = cols.map(df1(_))
// Division columns renamed to div_whatever
val divCols = cols.drop(1).map((name) => df1(name) / df2(name) as s"div_${name}")
val finalTable = joined.select((df1Cols divCols) :_*)
finalTable.show(false)
// -------- ---- ---- ------------------ ------------------
// |join_col|col1|col2|div_col1 |div_col2 |
// -------- ---- ---- ------------------ ------------------
// |xyz |10.0|12.0|1.4285714285714286|0.5454545454545454|
// |abc |42.0|7.0 |3.8181818181818183|0.7777777777777778|
// -------- ---- ---- ------------------ ------------------
這個假設連接列是df1.
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/486217.html
上一篇:Scala:提升陣列
下一篇:Scala:求和型別的函式映射
