我正在將資料(如下所示)讀入串列串列,我想將其轉換為包含七列的資料框。我得到的錯誤是:requirement failed: number of columns doesn't match. Old column names (1): value, new column names (7): <list of columns>
我做錯了什么,我該如何解決?
資料:
Column1, Column2, Column3, Column4, Column5, Column6, Column7
a,b,c,d,e,f,g
a2,b2,c2,d2,e2,f2,g2
代碼:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\\n")
val header = erResponse(0)
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = body.toDF()
erDf.show()
uj5u.com熱心網友回復:
您收到此number of columns doesn't match錯誤是因為您的erDf資料框僅包含一列,其中包含一個陣列:
----------------------------
|value |
----------------------------
|[a, b, c, d, e, f, g] |
|[a2, b2, c2, d2, e2, f2, g2]|
----------------------------
您無法將此唯一列與標題中包含的七列匹配。
這里的解決方案是,給定此erDf資料框,遍歷標題列串列以一一構建列。您的完整代碼因此變為:
val spark = SparkSession.builder.appName("er").master("local").getOrCreate()
import spark.implicits._
val erResponse = response.body.toString.split("\\\n")
val header = erResponse(0).split(", ") // build header columns list
val body = erResponse.drop(1).map(x => x.split(",").toList).toList
val erDf = header
.zipWithIndex
.foldLeft(body.toDF())((acc, elem) => acc.withColumn(elem._1, col("value")(elem._2)))
.drop("value")
這將為您提供以下erDf資料框:
------- ------- ------- ------- ------- ------- -------
|Column1|Column2|Column3|Column4|Column5|Column6|Column7|
------- ------- ------- ------- ------- ------- -------
| a| b| c| d| e| f| g|
| a2| b2| c2| d2| e2| f2| g2|
------- ------- ------- ------- ------- ------- -------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/348083.html
