我有一個如下的txt檔案。資料集具有以下模板,我想將此資料集轉換為 6 列,其中包含 Python 中的 Id、原因、代碼、事件時間、嚴重性和嚴重性代碼標題:
Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4
我想知道是否可以將上述資料集轉換如下:
Id Cause Code Event Time Severity Severity Code
0005 ERROR 307 2020-11-09 10:16:48 WARNING 5
0006 FAILURE 517 2020-11-09 10:19:47 MINOR 4
uj5u.com熱心網友回復:
試試這個:
import re
pattern = re.compile("(. ?)=(. ?)\s{2,}")
data = []
item = {}
with open("data.txt") as fp:
for line in fp:
for m in pattern.finditer(line):
key, value = [m.group(i).strip() for i in [1,2]]
if key == "Id":
if item:
data.append(item)
item = {"Id": value}
else:
item[key] = value
data.append(item)
df = pd.DataFrame(data)
uj5u.com熱心網友回復:
這是對上述資料進行轉換的一種方法,希望對您有所幫助!
import re
import pandas as pd
x = """Id = 0005 Cause = ERROR
Code = 307 Event Time = 2020-11-09 10:16:48
Severity = WARNING
Severity Code = 5 Id = 0006 Cause = FAILURE
Code = 517 Event Time = 2020-11-09 10:19:47
Severity = MINOR Severity Code = 4"""
formatted_text = ' '.join(x.split())
id = re.findall(r"Id = ([^\s] )", formatted_text)
cause = re.findall(r"Cause = ([^\s] )", formatted_text)
severity = re.findall(r"Severity = ([^\s] )", formatted_text)
severity_code = re.findall(r"Severity Code = ([^\s] )", formatted_text)
event_time = re.findall(r"Event Time = ([^\s] )", formatted_text)
info_dict = {
"Id": id,
"Cause": cause,
"Severity": severity,
"Severity Code": severity_code,
"Event Time": event_time
}
df = pd.DataFrame.from_dict(info_dict)
print(df)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/470914.html