我輸入的是一個csv檔案,每行內容如下
HX332780,14/7/5,OTHER OFFENSE,PROBATION VIOLATION,PARKING LOT/GARAGE(NON.RESID.),Y,N,1113
HX332854,14/7/5,OTHER OFFENSE,HARASSMENT BY TELEPHONE,APARTMENT,N,N,1533
HX332743,14/7/5,CRIMINAL DAMAGE,TO VEHICLE,STREET,N,N,1021
HX332735,14/7/5,THEFT,$500 AND UNDER,RESTAURANT,N,N,1014
......
.....
以下是簡單處理的代碼
object SparkPi {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("Spark Pi").setMaster("spark://Master:7077").setJars(List("/home/hadoop/Downloads/JetBrains.IntelliJ.xdowns/idea-IU-139.1117.1/spark-examples-1.5.2-hadoop2.6.0.jar"))
val sc = new SparkContext(conf)
val rawData = sc.textFile("/home/hadoop/123.csv")
val secondData = rawData.map(_.split(",").takeRight(4).head)
val thirdData = secondData.map(n=>(n,1)).reduceByKey(_+_).collect()
sc.stop()
}
}
在集群執行后出現以下錯誤
15/12/09 22:11:09 WARN TaskSetManager: Lost task 1.0 in stage 0.0 (TID 1, 219.216.65.129): java.lang.ClassCastException: cannot assign instance of org.apache.spark.examples.SparkPi$$anonfun$2 to field org.apache.spark.rdd.RDD$$anonfun$flatMap$1$$anonfun$apply$4.cleanF$2 of type scala.Function1 in instance of org.apache.spark.rdd.RDD$$anonfun$flatMap$1$$anonfun$apply$4
.....
....
請問大神們是哪里出錯了啊?去掉collect就沒報錯了,我只想統計資料每行中倒數第四列的不同單詞出現的頻率....
uj5u.com熱心網友回復:
朋友,請問這個問題你解決了嗎uj5u.com熱心網友回復:
據說是2.0.1和Scala版本的兼容問題。用2.0.0就沒有問題。。。轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/71135.html
標籤:Spark
