python-帶有多個物件的json到csv-有解無憂

我不時從 AWS sns 接收檔案 (json)。其中一些僅包含 1 個物件，而另一些則包含多個物件。

我正在嘗試運行一個 python 腳本來提取幾個列作為示例，但是由于這幾個物件或檔案的格式，我得到了多個錯誤。理想情況下，我想將所有資料放入列中，特別是“訊息”部分中包含的資料。（即eventType、from、to、bcc、messageId、timestamp等。

有人可以幫忙嗎？謝謝

腳本：

data = []
for line in open(folder   file, 'r', encoding='utf-8'):
    data.append(json.loads(line))
    pd.json_normalize(data)

    #df = pd.DataFrame(((d['Message']) for d in data), columns=['Message'])
    df = pd.DataFrame([(
         data[0]['Timestamp'],
         data[0]['Subject'],
         data[0]['Message']['eventType'])
     ], columns=['Timestamp', 'Subject', 'eventType'])

    df.to_csv(folder   'testOutput.csv', index=False, encoding='utf-8')
    print(df)

具有多個物件的示例檔案：

{"Type":"Notification","MessageId":"0579da9d-671f-547f-879a-8151d6048","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"Click\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <[email protected]>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"[email protected]\",\"[email protected]\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <[email protected]>\"},{\"name\":\"To\",\"value\":\"[email protected]\"},{\"name\":\"Bcc\",\"value\":\"[email protected]\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <[email protected]>\"],\"to\":[\"[email protected]\"],\"bcc\":[\"[email protected]\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click\":{\"timestamp\":\"2022-10-21T14:40:55.000Z\",\"ipAddress\":\"66.66.666.666\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T14:40:55.066Z","UnsubscribeURL":"unsubscribeLink"}
{"Type":"Notification","MessageId":"03758739-75a9-5462-8361-dcf5c410e015","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"ClickNum2\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <[email protected]>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"[email protected]\",\"[email protected]\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <[email protected]>\"},{\"name\":\"To\",\"value\":\"[email protected]\"},{\"name\":\"Bcc\",\"value\":\"[email protected]\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <[email protected]>\"],\"to\":[\"[email protected]\"],\"bcc\":[\"[email protected]\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click2\":{\"timestamp\":\"2022-10-21T15:45:50.000Z\",\"ipAddress\":\"55.55.555.555\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T15:50:50.066Z","UnsubscribeURL":"unsubscribeLink"}
{"Type":"Notification","MessageId":"03758739-75a9-5462-8361-dcf5c410e015","TopicArn":"arn:aws:sns:region:123456789:SES-notification-s","Subject":"Event Notification","Message":"{\"eventType\":\"ClickNum3\",\"mail\":{\"timestamp\":\"2022-10-21T14:39:47.003Z\",\"source\":\"tester <[email protected]>\",\"sendingAccountId\":\"123456789\",\"messageId\":\"01000183fafc5f3b-4f5f3dc2-281a-472a-85-915f85289858-000000\",\"destination\":[\"[email protected]\",\"[email protected]\"],\"headersTruncated\":false,\"headers\":[{\"name\":\"Content-Type\",\"value\":\"multipart/mixed; boundary=\\\"===============4453571503606487627==\\\"\"},{\"name\":\"MIME-Version\",\"value\":\"1.0\"},{\"name\":\"Subject\",\"value\":\"Email tracking test\"},{\"name\":\"From\",\"value\":\"tester <[email protected]>\"},{\"name\":\"To\",\"value\":\"[email protected]\"},{\"name\":\"Bcc\",\"value\":\"[email protected]\"},{\"name\":\"Message-ID\",\"value\":\"null\"}],\"commonHeaders\":{\"from\":[\"tester <[email protected]>\"],\"to\":[\"[email protected]\"],\"bcc\":[\"[email protected]\"],\"messageId\":\"010001fafc5f33b-4f53dc2-281a-472a-8f65-915f889858-000000\",\"subject\":\"Email tracking test\"},\"tags\":{\"ses:operation\":[\"SendRawEmail\"],\"ses:configuration-set\":[\"ses-email-tracking\"],\"ses:source-ip\":[\"12.12.123.123\"],\"ses:from-domain\":[\"domain.dev\"],\"ses:caller-identity\":[\"caller_identity\"]}},\"click3\":{\"timestamp\":\"2022-10-21T16:50:50.000Z\",\"ipAddress\":\"44.44.444.444\",\"userAgent\":\"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Safari/537.36\",\"link\":\"https://website.com\",\"linkTags\":null}}\n","Timestamp":"2022-10-21T16:50:50.066Z","UnsubscribeURL":"unsubscribeLink"}

uj5u.com熱心網友回復：

你可以試試這個來包含從 json 到 csv 檔案的所有欄位。

import pandas as pd
import json

df=pd.read_json('test.json',lines=True)
newdf=[]
for i,row in df.iterrows():
    data=json.loads(row['Message'])
    row['timestamp']=data['mail']['timestamp']
    row['eventType']=data['eventType']
    newdf.append(row)
df=pd.DataFrame(newdf)
df.to_csv("test.csv",index=False)

uj5u.com熱心網友回復：

如何決議整個輸入：

import fileinput
import json

import pandas as pd


def parse_input(filename):

    rows = (l for l in fileinput.input(filename))
    parsed_1 = [json.loads(row) for row in rows]
    for row in parsed_1:
        row['Message'] = json.loads(row['Message'])

    return parsed_1


input = parse_input('input.txt')

然后

df = pd.json_normalize(input)

并選擇所需的列？

In [5]: df
Out[5]: 
           Type                             MessageId  ...  Message.click3.link Message.click3.linkTags
0  Notification     0579da9d-671f-547f-879a-8151d6048  ...                  NaN                     NaN
1  Notification  03758739-75a9-5462-8361-dcf5c410e015  ...                  NaN                     NaN
2  Notification  03758739-75a9-5462-8361-dcf5c410e015  ...  https://website.com                     NaN

[3 rows x 39 columns]

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/519945.html

標籤：PythonjsonCSV解析多列

上一篇：NodejsJson檢查特定鍵和值是否存在，然后獲取其值

下一篇：如何在python中將json文本轉換為png