我正在嘗試手動創建一個型別為 Set 列的資料集:
case class Files(Record: String, ids: Set)
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
這給了我錯誤:
>command-1888379816641405:10: error: type Set takes type parameters
case class Files(s3path: String, ids: Set)
我究竟做錯了什么?
uj5u.com熱心網友回復:
Set是一個引數化型別,所以當你在你的Files案例類中宣告它時,你應該定義你的內部是什么型別Set,比如Set[Int]一組整數。所以你的Files案例類定義應該是:
case class Files(Record: String, ids: Set[Int])
因此,創建具有集合列的資料集的完整代碼:
import org.apache.spark.sql.SparkSession
object ToDataset {
private val spark = SparkSession.builder()
.master("local[*]")
.appName("test-app")
.config("spark.ui.enabled", "false")
.config("spark.driver.host", "localhost")
.getOrCreate()
def main(args: Array[String]): Unit = {
import spark.implicits._
val files = Seq(
Files("202110260931", Set(770010, 770880)),
Files("202110260640", Set(770010, 770880)),
Files("202110260715", Set(770010, 770880))
).toDS()
files.show()
}
case class Files(Record: String, ids: Set[Int])
}
這將回傳以下資料集:
------------ ----------------
| Record| ids|
------------ ----------------
|202110260931|[770010, 770880]|
|202110260640|[770010, 770880]|
|202110260715|[770010, 770880]|
------------ ----------------
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/366264.html
