我有很多文本檔案,它們都具有相同的結構(我整理了一下),如下所示:
Annoying
------------------------
you are annoying me so much
you're incredibly annoying
I find you annoying
you are annoying
you're so annoying
how annoying you are
you annoy me
you are annoying me
you are irritating
you are such annoying
you're too annoying
you are very annoying
<Response>
Was going to say the same about you,
Ill try to fix that
Ok thanks for the feedback.
我需要將它們轉換為具有以下結構的 JSON:
{"tag": "Annoying",
"patterns": ["I find you annoying", "you are irritating" ...],
"responses": ["Ill try to fix that", "Ok thanks for the feedback." ...],
}
所以第一行總是標簽,所有的行<Response>都是patterns,之后的一切都是responses
我已經設法將所有內容都轉換為 JSON 格式,如下所示:
{"Annoying":{"0":"------------------------","1":"you are annoying me so much",
"2":"you're incredibly annoying",
"3":"I find you annoying",
"4":"you are annoying",
"5":"you're so annoying"
}}
這不是正確的格式,我認為這里的步驟是:
- 將檔案中的輸入放入具有以下結構的資料框中:
| 標簽 | 模式 | 回應 |
|---|---|---|
| 惱人的 | ... | …… |
| - | ... | ... |
- 將資料框轉換為具有正確結構的 json。
但是,我完全不知道如何實作這一目標。我想它應該像這樣作業:
- 讀取檔案輸入時,始終將第一行作為
tag - 以下所有行放入
patterns - 檢查
<Response>作為內容讀取的所有行,如果是,則將列切換到responses
任何幫助表示贊賞!
uj5u.com熱心網友回復:
這是僅使用原始 python 的解決方案:
txt = """
Annoying
------------------------
you are annoying me so much
you're incredibly annoying
I find you annoying
you are annoying
you're so annoying
how annoying you are
you annoy me
you are annoying me
you are irritating
you are such annoying
you're too annoying
you are very annoying
<Response>
Was going to say the same about you,
Ill try to fix that
Ok thanks for the feedback.
""".strip()
raw_patterns, raw_responses = txt.split("<Response>")
# split in tag and actual pattern content
tag, raw_patterns2 = raw_patterns.split("\n------------------------")
patterns = raw_patterns2.strip().split("\n")
responses = raw_responses.strip().split("\n")
res = {
"tag" : tag,
"patterns": patterns,
"responses": responses
}
uj5u.com熱心網友回復:
這是使用 for 回圈和列舉函式的另一種解決方案。
dictionary = {}
tag = ""
patterns = ""
responses = ""
#calculate the line number of <Response>
with open("yourFile.txt","r") as data:
for num, line in enumerate(data, 1):
if "<Response>" in line:
response_line = num
# parse the file and save the lines into variables
with open("yourFile.txt","r") as data:
for num, line in enumerate(data, 1):
if num == 1:
tag = line.strip("\n")
elif num < response_line:
patterns = line.replace("\n", " ")
elif num > response_line:
responses = line.replace("\n", " ")
#construct the dicitionary from variables
dictionary["tag"] = tag
dictionary["patterns"] = patterns
dictionary["responses"] = responses
print(dictionary)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/433533.html
