我有一個嵌套的 json 檔案(10 萬行),如下所示:
{"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
我正在嘗試創建一個 csv,以便它可以輕松加載到 rdbms 中。我正在嘗試在 Pandas 中使用 json_normalize() 但即使在我到達那里之前我也遇到了錯誤。
with open('transactions.json') as data_file:
data = json.load(data_file)
JSONDecodeError: Extra data: line 2 column 1 (char 466)
uj5u.com熱心網友回復:
如果您的問題源于讀取 json 檔案本身,那么我將使用:
json.loads()
然后使用
pd.read_csv()
如果您的問題源于從 json dict 到資料幀的轉換,您可以使用以下命令:
test = {"UniqueId":"4224f3c9-323c-e911-a820-a7f2c9e35195","TransactionDateUTC":"2019-03-01 15:00:52.627 UTC","Itinerary":"MUC-CPH-ARN-MUC","OriginAirportCode":"MUC","DestinationAirportCode":"CPH","OneWayOrReturn":"Return","Segment":[{"DepartureAirportCode":"MUC","ArrivalAirportCode":"CPH","SegmentNumber":"1","LegNumber":"1","NumberOfPassengers":"1"},{"DepartureAirportCode":"ARN","ArrivalAirportCode":"MUC","SegmentNumber":"2","LegNumber":"1","NumberOfPassengers":"1"}]}
import json
import pandas
# convert json to string and read
df = pd.read_json(json.dumps(test), convert_axes=True)
# 'unpack' the dict as series and merge them with original df
df = pd.concat([df, df.Segment.apply(pd.Series)], axis=1)
# remove dict
df.drop('Segment', axis=1, inplace=True)
那將是我的方法,但可能有更方便的方法。
uj5u.com熱心網友回復:
第一步:遍歷一個記錄檔案
由于您的檔案每行有一個 JSON 記錄,因此您需要遍歷檔案中的所有記錄,您可以這樣做:
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
# or
df = pd.read_json(line, convert_axes=True)
# do something with data or df
第二步:寫入CSV檔案
現在,您可以將它與 a 結合起來csv.writer將檔案轉換為 CSV 檔案。
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
#Loop for each record, somehow:
#row = build list with row contents
writer.writerow(row)
把這一切放在一起
我將讀取第一條記錄一次以獲取將它們顯示為 CSV 標頭的鍵,然后我將讀取整個檔案并將其轉換為一條記錄:
import copy
import csv
import json
import pandas as pd
# Read the first JSON record to get the keys that we'll use as headers for the CSV file
with open('transactions.json', encoding="utf8") as data_file:
keys = list(json.loads(next(data_file)).keys())
# Our CSV headers are going to be the keys from the first row, except for
# segments, which we'll replace (arbitrarily) by three numbered segment column
# headings.
keys.pop()
base_keys = copy.copy(keys)
keys.extend(["Segment1", "Segment2", "Segment3"])
with open('transactions.csv', "w", encoding="utf8") as csv_file:
writer = csv.writer(csv_file)
writer.writerow(keys) # Write the CSV headers
with open('transactions.json', encoding="utf8") as data_file:
for line in data_file:
data = json.loads(line)
row = [data[k] for k in base_keys] data["Segment"]
writer.writerow(row)
生成的 CSV 檔案在每個 Segment i列中仍將有一個 JSON 記錄。如果你想以不同的方式格式化每個段,你可以定義一個format_segment(segment)函式并用data["Segment"]這個串列理解替換:[format_segment(segment) for segment in data["Segment"]]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/385961.html
