我有一個大約 900k 值的非常大的檔案。這是重復的價值觀,比如
/begin throw
COLOR red
DESCRIPTION
"cashmere sofa throw"
10
10
156876
DIMENSION
140
200
STORE_ADDRESS 59110
/end throw
值不斷變化,但我需要它,如下所示:
/begin throw
STORE_ADDRESS 59110
COLOR red
DESCRIPTION "cashmere sofa throw" 10 10 156876
DIMENSION 140 200
/end throw
目前,我的方法是洗掉新行并在其中包含空格:
存盤地址在整個檔案中是不變的,所以我想把它從索引中洗掉并在描述之前插入它
text_file = open(filename, 'r')
filedata = text_file.readlines();
for num,line in enumerate(filedata,0):
if '/begin' in line:
for index in range(num, len(filedata)):
if "store_address 59110 " in filedata[index]:
filedata.remove(filedata[index])
filedata.insert(filedata[index-7])
break
if "DESCRIPTION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index 1] = filedata[index 1].replace(" ","").replace("\n", " ")
filedata[index 2] = filedata[index 2].replace(" ","").replace("\n", " ")
filedata[index 3] = filedata[index 3].replace(" ","").replace("\n", " ")
filedata[index 4] = filedata[index 4].replace(" ","").replace("\n", " ")
filedata[index 5] = filedata[index 5].replace(" ","").replace("\n", " ")
filedata[index 6] = filedata[index 6].replace(" ","").replace("\n", " ")
filedata[index 7] = filedata[index 7].replace(" ","").replace("\n", " ")
filedata[index 8] = filedata[index 8].replace(" ","")
except IndexError:
print("Error Index DESCRIPTION:", index, num)
if "DIMENSION" in filedata[index]:
try:
filedata[index] = filedata[index].replace("\n", " ")
filedata[index 1] = filedata[index 1].replace(" ","").replace("\n", " ")
filedata[index 2] = filedata[index 2].replace(" ","").replace("\n", " ")
filedata[index 3] = filedata[index 3].replace(" ","")
except IndexError:
print("Error Index DIMENSION:", index, num)
之后我寫入filedata另一個檔案。
這種方法運行時間太長(幾乎一個半小時),因為如前所述,它是一個大檔案。我想知道是否有更快的方法來解決這個問題
uj5u.com熱心網友回復:
您可以逐個結構地讀取檔案結構,這樣您就不必將整個內容存盤在記憶體中并在那里進行操作。通過結構,我的意思是所有介于 和 之間的值,包括/begin throw和/end throw。這應該快得多。
def rearrange_structure_and_write_into_file(structure, output_file):
# TODO: rearrange the elements in structure and write the result into output_file
current_structure = ""
with open(filename, 'r') as original_file:
with open(output_filename, 'w') as output_file:
for line in original_file:
current_structure = line
if "/end throw" in line:
rearrange_structure_and_write_into_file(current_structure, output_file)
current_structure = ""
uj5u.com熱心網友回復:
從長串列中插入和洗掉值可能會使此代碼比它需要的速度慢,并且還使它容易受到任何錯誤的影響并且難以推理。如果沒有任何條目,store_address則代碼將無法正常作業,并將搜索剩余的條目,直到找到商店地址。
更好的方法是將代碼分解為決議每個條目并輸出它的函式:
KEYWORDS = ["STORE_ADDRESS", "COLOR", "DESCRIPTION", "DIMENSION"]
def parse_lines(lines):
""" Parse throw data from lines in the old format """
current_section = None
r = {}
for line in lines:
words = line.strip().split(" ")
if words[0] in KEYWORDS:
if words[1:]:
r[words[0]] = words[1]
else:
current_section = r[words[0]] = []
else:
current_section.append(line.strip())
return r
def output_throw(throw):
""" Output a throw entry as lines of text in the new format """
yield "/begin throw"
for keyword in KEYWORDS:
if keyword in throw:
value = throw[keyword]
if type(value) is list:
value = " ".join(value)
yield f"{keyword} {value}"
yield "/end throw"
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
for line in in_file:
line = line.strip()
if line == "/begin throw":
entry = []
elif line == "/end throw":
throw = parse_lines(entry)
for line in output_throw(throw):
out_file.write(line "\n")
else:
entry.append(line)
或者,如果您確實需要通過洗掉所有不必要的操作來最大化性能,您可以在單個長條件下讀取、決議和寫入,如下所示:
with open(filename) as in_file, open("output.txt", "w") as out_file:
entry = []
in_section = True
def write(line):
out_file.write(line "\n")
for line in in_file:
line = line.strip()
first = line.split()[0]
if line == "/begin throw":
in_section = False
write(line)
entry = []
elif line == "/end throw":
in_section = False
for line_ in entry:
write(line_)
write(line)
elif first == "STORE_ADDRESS":
in_section = False
write(line)
elif line in KEYWORDS:
in_section = True
entry.append(line)
elif first in KEYWORDS:
in_section = False
entry.append(line)
elif in_section:
entry[-1] = " " line
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/531520.html
標籤:Python文件
