我拼命地嘗試將 CSV 中的嵌套 JSON 功能轉換為資料框行。你能幫忙嗎?
示例 CSV 行
2021-09-26T08:25:43.021051958Z,"{""level"":""info"",""message"":""成功(快取)"",""請求"":""GET / api/v1/settingsid=3"",""httpCode"":200,""service"":""stats-vis-backend"",""timestamp"":""2021-09-26 08:25 :43""}",ip-10-xxx-xxx-18.eu-central-1.compute.internal,podname-75ffdf6b-gns8v
所需的輸出(僅使用 JSON 部分):
| ID | 資訊 | 要求 | http代碼 | 服務 | 時間戳 |
|---|---|---|---|---|---|
| 0 | 成功(快取) | 獲取/api/v1/settings?id=3 | 200 | 后臺統計 | 2021-09-26 08:25:43 |
如果這是資料幀輸出結構,我會非常高興。我嘗試了 JSON normalize 等,但我離解決方案還很遠。
非常感謝!!
最佳大衛
完整代碼試用(基于SeaBean):
import csv
import ast
import pandas as pd
# read CSV
df = pd.read_csv('/Users/David/xaa.csv',sep=',', header=None)
print(df.head(1))
# convert string of JSON/dict to real JSON/dict
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
print(df_json.head(1))
完整輸出轉儲
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3437, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-7-86e494aa8f0c>", line 12, in <module>
df[1] = df[1].apply(ast.literal_eval)
File "/Users/David/opt/anaconda3/lib/python3.8/site-packages/pandas/core/series.py", line 4138, in apply
mapped = lib.map_infer(values, f, convert=convert_dtype)
File "pandas/_libs/lib.pyx", line 2467, in pandas._libs.lib.map_infer
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 59, in literal_eval
node_or_string = parse(node_or_string, mode='eval')
File "/Users/David/opt/anaconda3/lib/python3.8/ast.py", line 47, in parse
return compile(source, filename, mode, flags,
File "<unknown>", line 1
> next start
^
SyntaxError: invalid syntax
示例輸出 df1
0 {"level":"info","message":"Success (Cached)","...
1 {"level":"info","message":"Success (Cached)","...
2 {"level":"info","message":"Success (Cached)","...
3 {"level":"info","message":"Success","request":...
4 {"level":"info","message":"Success (Cached)","...
...
249995 {"level":"info","message":"Success (Cached)","...
249996 {"level":"info","message":"Success (Cached)","...
249997 {"level":"info","message":"Success (Cached)","...
249998 {"level":"info","message":"Success","request":...
249999 {"level":"info","message":"Success (Cached)","...
Name: 1, Length: 250000, dtype: object
df1 的 toDict() 輸出示例
{0: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:25:43"}',
1: '{"level":"info","message":"Success (Cached)","request":"GET /api/v1/settings?id=3","httpCode":200,"service":"stats-vis-backend","timestamp":"2021-09-26 08:26:17"}',
輸出列印(df.iloc[[4480]])
0 1 \
4480 2021-09-26T12:00:58.983344643Z > next start
2 \
4480 ip-10-xxx-xxxx-30.eu-central-1.compute.internal
3
4480 xxxx-converter-75ffxf6b-jq2w7
uj5u.com熱心網友回復:
您可以pd.DataFrame在將 JSON 的字串轉換為真正的 JSON(不在字串中)后使用第二列(帶 JSON)的列值串列,如下所示:
# read CSV
df = pd.read_csv(r'mycsv.csv', sep=',', header=None)
# convert string of JSON/dict to real JSON/dict
import ast
# the JSON/dict is at column `1` (second column from left)
df[1] = df[1].apply(ast.literal_eval)
# Create dataframe from the JSON part
df_json = pd.DataFrame(df[1].tolist())
如果您已經將 CSV 讀入帶有列標題的資料框,您還可以使用第二列1的列標簽而不是上面代碼中第二列的列標簽。
結果:
print(df_json)
level message request httpCode service timestamp
0 info Success (Cached) GET /api/v1/settingsid=3 200 stats-vis-backend 2021-09-26 08:25:43
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/313609.html
上一篇:如何重定向輸出以更改特定的csv列并寫入同一檔案。邏輯上:awk'BEGIN{FS=OFS=","}{$19=$1}1'a.csv>toSameFile
