上代碼:
val sc = new SparkContext(new SparkConf().setMaster("local").setAppName("testApp"))
val sqlContext = new SQLContext(sc)
implicit val region = Region.CN_NORTH_1
val tempS3Dir = "s3a://redshift-test/RedShift/red/"
//設定S3鏈接資訊
sqlContext.sparkContext.hadoopConfiguration.set("fs.s3a.access.key", "AKIA3ZwewewewCHYE");
sqlContext.sparkContext.hadoopConfiguration.set("fs.s3a.secret.key", "wg2mPMDNtcqeweweweweCSu7Q+JJHNPT2O");
sqlContext.sparkContext.hadoopConfiguration.set("fs.s3a.endpoint", "s3.cn-north-1.amazonaws.com.cn");
sqlContext.setConf("driver","com.amazon.redshift.jdbc4.Driver")
val dataDF=sqlContext.read
.format("csv")
.option("header",true)
.load("s3a://redshift-test/RedShift/out/test0.csv")
//讀取表資料
val test_union = sqlContext.read
.format("jdbc")
.option("url", jdbcURL)
.option("dbtable", "test_union")
.load()
//dataDF有但是test_union沒有的資料
val data = dataDF.except(test_union)
data.show()
data.write
.mode(SaveMode.Overwrite) // Overwrite表示重新加載
.option("header",true)
.jdbc(jdbcURL, "test_test", new Properties)
sc.stop()
執行很慢,一直在轉。為什么呢?val data = dataDF.except(test_union)應該是這部的問題,但不知怎么辦
uj5u.com熱心網友回復:
雖然沒有你這種操作,但是我覺得except的操作 比笛卡爾積也少不了操作都是多次回圈操作,所以個人認為快不了多少,轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/39329.html
標籤:Spark
