我有 txt 檔案,如下所示:
Quod equidem non reprehendo;
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Quibus natura iure responderit non esse verum aliunde finem beate vivendi, a se principia rei gerendae peti; Quae enim adhuc protulisti, popularia sunt, ego autem a te elegantiora desidero. Duo Reges: constructio interrete. Tum Lucius: Mihi vero ista valde probata sunt, quod item fratri puto. Bestiarum vero nullum iudicium puto. Nihil enim iam habes, quod ad corpus referas; Deinde prima illa, quae in congressu solemus: Quid tu, inquit, huc? Et homini, qui ceteris animantibus plurimum praestat, praecipue a natura nihil datum esse dicemus?
=========================================================================
Planet Number festival animal
colour book
Mercury First firecrack phone
Venus Last kite computer
Earth Country rangoli tv
Jupiter C.COD bomb
---------------------------------------------------------------------
11 4526 diwali dog
holi bigb
12 Joe diwali 111
45 Doe sankaranti acer
65 UK diwali pan
67 22 diwali
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Planet Number festival animal
colour book
Mercury First firecrack phone
Venus Last kite computer
Earth Country rangoli tv
Jupiter C.COD bomb
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
45 5637 ganesh tiger
holi cinema
67 micael holi 222
78 john diwali xamoi
90 france diwali hp
34 34 diwali
我想將此文本檔案轉換為 csv 格式。我想顯示的輸出:輸出:輸出
我的代碼:
from itertools import groupby, chain
with open("file.txt", "r") as fin,\
open("file.csv", "w") as fout:
for key, group in groupby(fin, key=lambda line: bool(line.strip())):
if key:
zipped = zip(*(line.rstrip().split() for line in group))
fout.write(",".join(chain(*zipped)) "\n")
uj5u.com熱心網友回復:
這將滿足您的要求。這只是收集欄位的問題,直到我們獲得寫入它們的觸發器,并忽略開始文本,并忽略除第一個以外的所有標題。
fin = open('file.txt')
fout = open('file.csv','w')
gather = []
skipping = True
first = True
for line in fin:
if skipping:
skipping = line.find('====') < 0
elif line.find('----') >= 0:
if gather and (first or gather[0] != 'Planet'):
print( ','.join(gather), file=fout )
gather = []
first = False
else:
gather.extend( line.strip().split() )
if gather:
print( ','.join(gather), file=fout )
uj5u.com熱心網友回復:
我相信您可以使用 Pandas lib 將 txt 檔案轉換為 csv
# importing panda library
import pandas as pd
# readinag given csv file
# and creating dataframe
dataframe1 = pd.read_csv("input_file.txt")
# storing this dataframe in a csv file
dataframe1.to_csv('output_file.csv',
index = None)
uj5u.com熱心網友回復:
檔案的相關塊似乎具有大致固定的寬度列結構,因此您可以嘗試pandas.read_fwf在它們上使用:
from io import StringIO
from itertools import groupby
import pandas as pd
def keep(line): return bool(line.strip()) and not line.startswith("---")
with open('file.txt', 'r') as fin,\
open('file.csv','w') as fout:
while True:
if next(fin).startswith("==="): break
first = True
for key, group in groupby(fin, key=keep):
if key:
line = ",".join(
pd.read_fwf(StringIO("".join(group)), header=None)
.stack().sort_index(level=1).dropna().astype(str)
.str.replace(r"^(-?\d )\.0 $", r"\1", regex=True)
) "\n"
if first:
header, first = line, False
fout.write(line)
elif line != header:
fout.write(line)
結果file.csv:
Planet,Mercury,Venus,Earth,Jupiter,Number,First,Last,Country,C.COD,festival,colour,firecrack,kite,rangoli,bomb,animal,book,phone,computer,tv
11,12,45,65,67,4526,Joe,Doe,UK,22,diwali,holi,diwali,sankaranti,diwali,diwali,dog,bigb,111,acer,pan
45,67,78,90,34,5637,micael,john,france,34,ganesh,holi,holi,diwali,diwali,diwali,tiger,cinema,222,xamoi,hp
如果您不關心數字格式,則可以洗掉.str.replace(r"^(-?\d )\.0 $", r"\1", regex=True).
但是:這真的是您檔案的真實格式嗎?
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/441871.html
