我有一個從嵌套 JSON 創建的 CSV 檔案。它既有常規型別的列(例如 int、string),也有從嵌套 JSON 創建的 JSON 列:
attributes;business_id;categories;city;days_open;latitude;longitude;name;review_count;stars;state
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "beer_and_wine", "Ambience": {"casual": True, "classy": False, "divey": False, "hipster": False, "intimate": False, "romantic": False, "touristy": False, "trendy": False, "upscale": False}, "BYOB": False, "BikeParking": True, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": False, "lot": False, "street": True, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": True, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": False, "GoodForMeal": {"breakfast": False, "brunch": False, "dessert": False, "dinner": False, "latenight": False, "lunch": False}, "HappyHour": True, "HasTV": True, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": True, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": True, "RestaurantsPriceRange": 2, "RestaurantsReservations": False, "RestaurantsTableService": True, "RestaurantsTakeOut": True, "Smoking": "no", "WheelchairAccessible": True, "WiFi": "free"};6iYb2HFDywm3zjuRg0shjw;["Gastropubs", "Food", "Beer Gardens", "Restaurants", "Bars", "American (Traditional)", "Beer Bar", "Nightlife", "Breweries"];Boulder;["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"];40.0175444;-105.2833481;Oskar Blues Taproom;86;4.0;CO
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "beer_and_wine", "Ambience": {"casual": True, "classy": False, "divey": False, "hipster": False, "intimate": False, "romantic": False, "touristy": False, "trendy": False, "upscale": False}, "BYOB": False, "BikeParking": False, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": True, "lot": False, "street": False, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": True, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": True, "GoodForMeal": {"breakfast": True, "brunch": False, "dessert": False, "dinner": False, "latenight": False, "lunch": True}, "HappyHour": False, "HasTV": False, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": False, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": False, "RestaurantsPriceRange": 2, "RestaurantsReservations": False, "RestaurantsTableService": True, "RestaurantsTakeOut": True, "Smoking": "no", "WheelchairAccessible": False, "WiFi": "free"};tCbdrRPZA0oiIYSmHG3J0w;["Salad", "Soup", "Sandwiches", "Delis", "Restaurants", "Cafes", "Vegetarian"];Portland;["Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"];45.5889058992;-122.5933307507;Flying Elephants at PDX;126;4.0;OR
{"AcceptsInsurance": False, "AgesAllowed": "allages", "Alcohol": "none", "Ambience": None, "BYOB": False, "BikeParking": False, "BusinessAcceptsBitcoin": False, "BusinessAcceptsCreditCards": True, "BusinessParking": {"garage": False, "lot": False, "street": True, "valet": False, "validated": False}, "ByAppointmentOnly": False, "Caters": False, "CoatCheck": False, "Corkage": False, "DogsAllowed": False, "DriveThru": False, "GoodForDancing": False, "GoodForKids": False, "GoodForMeal": None, "HappyHour": False, "HasTV": False, "Music": None, "NoiseLevel": "average", "Open24Hours": False, "OutdoorSeating": False, "RestaurantsAttire": "casual", "RestaurantsCounterService": False, "RestaurantsDelivery": False, "RestaurantsGoodForGroups": False, "RestaurantsPriceRange": 2, "RestaurantsReservations": True, "RestaurantsTableService": True, "RestaurantsTakeOut": False, "Smoking": "no", "WheelchairAccessible": False, "WiFi": "no"};bvN78flM8NLprQ1a1y5dRg;["Antiques", "Fashion", "Used", "Vintage & Consignment", "Shopping", "Furniture Stores", "Home & Garden"];Portland;["Thursday", "Friday", "Saturday", "Sunday"];45.5119069956;-122.6136928797;The Reclaimory;13;4.5;OR
能否使用 AWS Glue 處理此檔案以作為 AWS Athena / Hive 的輸入(在 Athena 內部使用)?特別是,如何為 JSON 列指定資料型別?我必須手動執行此操作嗎?JSON 寫得好嗎,還是應該重新格式化?
uj5u.com熱心網友回復:
將盡力回答您的所有問題。
能否使用 AWS Glue 處理此檔案以作為 AWS Athena / Hive 的輸入(在 Athena 內部使用)?
應該。如果您正確構建 hive 表,任何 csv 檔案都可以上傳到那里。
如何指定 JSON 列的資料型別?
在蜂巢中,您可以存盤為string. 查看您的 json 結構,您可以使用這樣的運算式輕松訪問元素 - get_json_object(json_col_str,'$.BusinessParking.garage')。
我必須手動執行此操作嗎?
我想是的,除非您有一些自動 DDL 創建實用程式。您可以將示例行放在 xl 中并輕松找出表結構。
JSON 寫得好嗎,還是應該重新格式化?
從你給出的例子中,我檢查了最后一行,json 物件對我來說似乎很好。我還使用https://jsonformatter.curiousconcept.com/進行了檢查,它以一種漂亮的格式驗證并格式化它。如有任何差異,您可以使用它。
uj5u.com熱心網友回復:
只要 JSON 列中沒有分號,您就應該能夠使用 Athena 查詢這些資料。將您的表定義為 CSV,以分號作為分隔符,并string用作 JSON 列的型別。
當您查詢此表時,您可以使用JSON 函式來查詢 JSON 列,例如:
SELECT json_extract_scalar(attributes, '$.AcceptsInsurance')
…
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/403379.html
標籤:
下一篇:VuejS:輸入自動填充
