spark如何決議還有內嵌的json
比如:
{
"_index": "nginxacc-2016.09.30",
"_type": "logs",
"_id": "AVd6G5gNfVF4aGz4f2fE",
"_version": 1,
"_score": 1,
"_source": {
"@timestamp": "2016-09-30T00:00:09.000Z",
"clientip": "42.122.1.97",
"status": "200",
"@version": "1",
"geoip": {
"ip": "42.122.1.97",
"country_code2": "CN",
"country_code3": "CHN",
"country_name": "China",
"continent_code": "AS",
"region_name": "28",
"city_name": "Tianjin",
"latitude": 39.1422,
"longitude": 117.17669999999998,
"timezone": "Asia/Shanghai",
"real_region_name": "Tianjin",
"location": [
117.17669999999998
,
39.1422
]
}
}
}
geoip這個欄位里的內容無法獲取到spark dataframe 當中。
請大神們幫幫忙
uj5u.com熱心網友回復:
printSchema看看內嵌json的欄位對應的型別是什么,然后再用匿名的udf去處理轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/74377.html
標籤:Spark
