嗨,我有兩個這樣的資料框:
import spark.implicits._
import org.apache.spark.sql._
val transformationDF = Seq(
("A_IN", "ain","String"),
("ADDR_HASH","addressHash","String")
).toDF("db3Column", "hudiColumn","hudiDatatype")
val addressDF=Seq(
("123","uyt"),
("124","qwe")
).toDF("A_IN", "ADDR_HASH")
現在我想重命名列并更改transformationdf中提到的值的資料型別。transformationDF中的hudicolumn名稱和hudidatatype將成為addressDF的列名稱和資料型別。我試過這樣的代碼來改變但不起作用:
var db3ColumnName:String =_
var hudiColumnName:String =_
var hudiDatatypeName:String = _
for (row <- transformationDF.rdd.collect)
{
db3ColumnName = row.mkString(",").split(",")(0)
hudiColumnName= row.mkString(",").split(",")(1)
hudiDatatypeName = row.mkString(",").split(",")(2)
addressDF.withColumnRenamed(db3ColumnName,hudiColumnName).withColumn(hudiColumnName,col(hudiColumnName).cast(hudiDatatypeName))
}
現在,當我列印 addressDF 時,更改不會反映出來。

誰能幫我這個 。
uj5u.com熱心網友回復:
這是一個要求使用的教科書案例foldLeft:
val finalDF = transformationDF.collect.foldLeft(addressDF){ case (df, row) =>
{
val db3ColumnName = row.getString(0)
val hudiColumnName = row.getString(1)
val hudiDatatypeName = row.getString(2)
df.withColumnRenamed(db3ColumnName, hudiColumnName)
.withColumn(hudiColumnName, col(hudiColumnName).cast(hudiDatatypeName))
}
}
Spark 中的資料集是不可變的,每個“修改”資料集的操作實際上都會回傳一個新物件,而呼叫該操作的物件保持不變。上面foldLeft有效地從所有轉換開始addressDF并將所有轉換鏈接到作為第二個引數串列中的第一個引數傳遞的中間物件上。當前迭代的回傳值成為下一次迭代的輸入。最后一次迭代的回傳值就是foldLeft自身的回傳值。
uj5u.com熱心網友回復:
當您使用withColumnRenamed或withColumn時,它會回傳一個新的資料集,因此您應該這樣做:
var db3ColumnName: String = null
var hudiColumnName: String = null
var hudiDatatypeName: String = null
for (row <- transformationDF.rdd.collect) {
db3ColumnName = row.mkString(",").split(",")(0)
hudiColumnName = row.mkString(",").split(",")(1)
hudiDatatypeName = row.mkString(",").split(",")(2)
addressDF = addressDF.withColumnRenamed(db3ColumnName, hudiColumnName).withColumn(hudiColumnName, col(hudiColumnName).cast(hudiDatatypeName))
}
addressDF.printSchema()
列印地址DF將回傳:
root
|-- ain: string (nullable = true)
|-- addressHash: string (nullable = true)
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/459004.html
上一篇:查找兩個陣列相交的索引
