日志檔案的示例:
{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}
它將生成5個檔案:
- 時間戳.列
- Field1.column
- Field_Doc.f1.column
- Field_Doc.f2.column
- Field_Doc.f3.column
timestamp.column 的示例內容:
2022-01-14T00:12:21.000
2022-01-18T00:15:51.000
注意:日志中的欄位將是動態的,不要假設這些是預期的屬性
有人可以告訴我如何做到這一點,
日志檔案的大小約為 4GB 到 48GB
uj5u.com熱心網友回復:
如果每個 JSON 都在單行中,那么您可以open()歸檔并使用for line in file:逐行讀取 - 接下來您可以使用模塊將行轉換為字典json并進行處理。
您可以使用for key, value in data:單獨處理每個專案。您可以使用key創建檔案名f"{key}.column"并以附加模式打開它"a"并寫入str(value) "\n"此檔案。
因為您有嵌套字典,所以您需要isinstance(value, dict)檢查您是否沒有{"f1": 0, "f2": 1.7, "f3": 2}并重復此字典的代碼 - 這可能需要使用遞回。
最少的作業代碼。
我io只用來模擬記憶體中的檔案,但你應該使用open(filename)
file_data = '''{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}'''
import json
# --- functions ---
def process_dict(data, prefix=""):
for key, value in data.items():
if prefix:
key = prefix "." key
if isinstance(value, dict):
process_dict(value, key)
else:
with open(key '.column', "a") as f:
f.write(str(value) "\n")
# --- main ---
#file_obj = open("filename")
import io
file_obj = io.StringIO(file_data) # emulate file in memory
for line in file_obj:
data = json.loads(line)
print(data)
process_dict(data)
#process_dict(data, "some prefix for all files")
編輯:
更通用的版本 - 它function作為第三個引數,因此可以與不同的功能一起使用
file_data = '''{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}'''
import json
# --- functions ---
def process_dict(data, func, prefix=""):
for key, value in data.items():
if prefix:
key = prefix "." key
if isinstance(value, dict):
process_dict(value, func, key)
else:
func(key, value)
def write_func(key, value):
with open(key '.column', "a") as f:
f.write(str(value) "\n")
# --- main ---
#file_obj = open("filename")
import io
file_obj = io.StringIO(file_data) # emulate file in memory
for line in file_obj:
data = json.loads(line)
print(data)
process_dict(data, write_func)
#process_dict(data, write_func, "some prefix for all files")
使其更通用的其他想法是創建扁平化 dict 并創建的函式
{'timestamp': '2022-01-14T00:12:21.000', 'Field1': 10, 'Field_Doc.f1': 0}
{'timestamp': '2022-01-18T00:15:51.000', 'Field_Doc.f1': 0, 'Field_Doc.f2': 1.7, 'Field_Doc.f3': 2}
稍后使用回圈來撰寫元素。
file_data = '''{"timestamp": "2022-01-14T00:12:21.000", "Field1": 10, "Field_Doc": {"f1": 0}}
{"timestamp": "2022-01-18T00:15:51.000", "Field_Doc": {"f1": 0, "f2": 1.7, "f3": 2}}'''
import json
# --- functions ---
def flatten_dict(data, prefix=""):
result = {}
for key, value in data.items():
if prefix:
key = prefix "." key
if isinstance(value, dict):
result.update( process_dict(value, key) )
else:
result[key] = value
#result.update( {key: value} )
return result
# --- main ---
#file_obj = open("filename")
import io
file_obj = io.StringIO(file_data) # emulate file in memory
for line in file_obj:
data = json.loads(line)
print('before:', data)
data = flatten_dict(data)
#data = flatten_dict(data, "some prefix for all items")
print('after :', data)
print('---')
for key, value in data.items():
with open(key '.column', "a") as f:
f.write(str(value) "\n")
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/429793.html
