在網上搜索了很多但找不到答案后,我發布了這個問題。我有一個以下格式的 JSONArray
[
{
"firstName":"John",
"lastName":"Doe",
"deparment" : {
"DeptCode":"10",
"deptName" : "HR"
}
},
{
"firstName":"Mel",
"lastName":"Gibson",
"deparment" : {
"DeptCode":"20",
"deptName" : "IT"
}
}
]
JSONArray 來自 org.json.simple.JSONArray 包。我正在嘗試將其轉換為 Java Spark Dataframe。我正在嘗試使用以下代碼:
SparkConf conf = new SparkConf().setAppName("linecount").setMaster("local[*]");
SparkSession session = SparkSession.builder().config(conf).getOrCreate();
Dataset<Row> dataset = session.read().json(array.toString());
但沒有運氣。我面臨以下錯誤。我也可以在 scala 中看到我們可以使用 DS 方法將其轉換為 Dataframe。以前有人試過嗎?
Exception in thread "main" java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: [{"firstName":"John%22,"lastName%22:"Doe%22%7D,{"firstName%22:"Mel%22,"lastName%22:"Gibson%22%7D%5D
at org.apache.hadoop.fs.Path.initialize(Path.java:206)
at org.apache.hadoop.fs.Path.<init>(Path.java:172)
at org.apache.spark.sql.execution.datasources.DataSource$.org$apache$spark$sql$execution$datasources$DataSource$$checkAndGlobPathIfNecessary(DataSource.scala:615)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at org.apache.spark.sql.execution.datasources.DataSource$$anonfun$14.apply(DataSource.scala:350)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
at scala.collection.immutable.List.foreach(List.scala:318)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:349)
at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:178)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:333)
at org.apache.spark.sql.DataFrameReader.json(DataFrameReader.scala:279)
at com.vikas.rawat.AnotherMainClass.main(AnotherMainClass.java:34)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: [{"firstName":"John%22,"lastName%22:"Doe%22%7D,{"firstName%22:"Mel%22,"lastName%22:"Gibson%22%7D%5D
at java.net.URI.checkPath(Unknown Source)
at java.net.URI.<init>(Unknown Source)
at org.apache.hadoop.fs.Path.initialize(Path.java:203)
... 14 more
uj5u.com熱心網友回復:
您應該從 JSON 字串創建一個 RDD 并將其傳遞給spark.read.json方法。
SparkSession spark = SparkSession.builder().master("local").getOrCreate();
String s = "{\"root\":[ \n"
" {\n"
" \"firstName\":\"John\",\n"
" \"lastName\":\"Doe\",\n"
" \"deparment\" : {\n"
" \"DeptCode\":\"10\",\n"
" \"deptName\" : \"HR\"\n"
" }\n"
" },\n"
" {\n"
" \"firstName\":\"Mel\",\n"
" \"lastName\":\"Gibson\",\n"
" \"deparment\" : {\n"
" \"DeptCode\":\"20\",\n"
" \"deptName\" : \"IT\"\n"
" }\n"
"}\n"
"]}";
JSONObject json = (JSONObject) JSONValue.parse(s);
JSONArray msgsArray = (JSONArray) json.get("root");
scala.collection.Seq<String> seq = scala.collection.JavaConverters.asScalaIteratorConverter
(Arrays.asList(msgsArray.toString()).iterator()).asScala().toSeq();
RDD<String> jsonRDD = spark.sparkContext().
parallelize(seq, 4, scala.reflect.ClassTag$.MODULE$.apply(String.class));
spark.read().json(jsonRDD).show();
--------- --------- --------
|deparment|firstName|lastName|
--------- --------- --------
| {10, HR}| John| Doe|
| {20, IT}| Mel| Gibson|
--------- --------- --------
uj5u.com熱心網友回復:
您可以將 json 從字串匯入資料集,但需要注意的是,每個字串必須是一個物件。
火花檔案:
// 或者,可以為 JSON 資料集創建一個 DataFrame,該資料集由 // 一個 Dataset[String] 為每個字串存盤一個 JSON 物件 val otherPeopleDataset = spark.createDataset( """{"name":"Yin","address" :{"city":"Columbus","state":"Ohio"}}""" :: Nil) val otherPeople = spark.read.json(otherPeopleDataset) otherPeople.show()
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/410207.html
標籤:
上一篇:替換字典中的鍵值
