我想使用 Python 將文本檔案轉換為 json 行格式。我需要它適用于任何長度的文本檔案(字符或單詞)。
例如,我想轉換以下文本;
A lot of effort in classification tasks is placed on feature engineering and parameter optimization, and rightfully so.
These steps are essential for building models with robust performance. However, all these efforts can be wasted if you choose to assess these models with the wrong evaluation metrics.
對此:
{"text": "A lot of effort in classification tasks is placed on feature engineering and parameter optimization, and rightfully so."}
{"text": "These steps are essential for building models with robust performance. However, all these efforts can be wasted if you choose to assess these models with the wrong evaluation metrics."}
我試過這個:
text = ""
with open(text.txt", encoding="utf8") as f:
for line in f:
text = {"text": line}
但不是運氣。
uj5u.com熱心網友回復:
執行此操作的一種hacky 方法是將文本檔案粘貼到 csv 中。確保在 csv 的第一個單元格中寫入文本,然后使用以下代碼:
import pandas as pd
df = pd.read_csv(knowledge)
df.to_json(knowledge_jsonl,
orient="records",
lines=True)
不理想,但它有效。
uj5u.com熱心網友回復:
您的for回圈的基本思想是正確的,但該行text = {"text": line}每次都只是覆寫上一行,而您想要的是生成一個行串列。
請嘗試以下操作:
import json
# Generate a list of dictionaries
lines = []
with open("text.txt", encoding="utf8") as f:
for line in f.read().splitlines():
if line:
lines.append({"text": line})
# Convert to a list of JSON strings
json_lines = [json.dumps(l) for l in lines]
# Join lines and save to .jsonl file
json_data = '\n'.join(json_lines)
with open('my_file.jsonl', 'w') as f:
f.write(json_data)
splitlines洗掉\n字符并if line:忽略空行。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/396560.html
標籤:Python json 蟒蛇-3.x 文本 jsonlines
上一篇:python:無法理解異步是如何作業的。無法運行python腳本
下一篇:格式化字串中的多種值格式
