我有大量的傳感器日志資料,其形式為 [key=value] 對我需要明智地決議資料列我發現此代碼解決了我的問題
import pandas as pd
lines = []
with open('/path/to/test.txt', 'r') as infile:
for line in infile:
if "," not in line:
continue
else:
lines.append(line.strip().split(","))
row_names = []
column_data = {}
max_length = max(*[len(line) for line in lines])
for line in lines:
while(len(line) < max_length):
line.append(f'{len(line)-1}=NaN')
for line in lines:
row_names.append(" ".join(line[:2]))
for info in line[2:]:
(k,v) = info.split("=")
if k in column_data:
column_data[k].append(v)
else:
column_data[k] = [v]
df = pd.DataFrame(column_data)
df.index = row_names
print(df)
df.to_csv('/path/to/test.csv')
上面的代碼適用于資料形式為“Priority=0, X=776517049”但我的資料是這樣的 [Priority=0][X=776517049] 并且兩列之間沒有分隔符我怎么能在python中執行此操作,我在這里共享示例資料的鏈接原始資料和bilow我手動完成的預期決議資料https://docs.google.com/spreadsheets/d/1EVTVL8RAkrSHhZO48xV1uEGqOzChQVf4xt7mHkTcqzs/edit?usp=sharing 請檢查此關聯
uj5u.com熱心網友回復:
我已經下載為csv。
由于您的檔案在一張紙上有多個表格,因此我限制為 100 行,您可以洗掉該引數。
raw = pd.read_csv(
"logdata - Sheet1.csv", # filename
skiprows=1, # skip the first row
nrows=100, # use 100 rows, remove in your example
usecols=[0], # only use the first column
header=None # your dataset has no column names
)
然后您可以使用正則運算式來提取值:
df = raw[0].str.extract(r"\[Priority=(\d*)\] \[GPS element=\[X=(\d*)\] \[Y=(\d*)\] \[Speed=(\d*)\]")
并設定列名:
df.columns = ["Priority", "X", "Y", "Speed"]
結果:
Priority X Y Speed
0 0 776517049 128887449 4
1 0 776516816 128887733 0
2 0 776516816 128887733 0
3 0 776516833 128887166 0
4 0 776517200 128886133 0
5 0 776516883 128885933 8
.....................................
99 0 776494483 128908783 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/525820.html
上一篇:嵌套結構的基本決議結構?
