我有一個資料源,我會定期將其下載到 csv 中。看起來像這樣
TABLE # 196712 / 9000_
>= 10 : 0.002
>= 5 : 0.001
>= 2 : 0.0005
>= 1 : 0.0002
>= 0.5 : 0.0001
>= 0.2 : 0.0001
>= 0.1 : 0.0001
>= 0.0001 : 0.0001
TABLE # 196714 / Dark
>= 0.0001 : 5e-05
TABLE # 196715 / GBD
>= 25 : 0.01
>= 10 : 0.005
>= 5 : 0.0025
>= 0.1 : 0.001
>= 0.0005 : 0.005
我想決議檔案并將資料分類到字典中,其中哈希后的數字是唯一的 id(新的 dict 鍵),以下行(以 >= 開頭)是卷加上相關的懲罰值。
s.th 這樣的作業:
{196712: [(10,0.002),(5,0.001),(2,0.0005),(1,0.0002),(0.5,0.0001),(0.2,0.0001),(0.1,0.0001),(0.0001, 0.0001)],
196714: [(0.0001,5e-05)],
196715: [(25,0.01),(10,0.005),(5,0.0025),(0.1,0.001),(0.0005,0.005)]}
我將在 python 之外過濾它會是一個 grep 并獲得以下行,但是 ID 之間的不同行數使其更加復雜。也可以使用任何其他建議的更方便的資料結構。
uj5u.com熱心網友回復:
嘗試:
s = """\
TABLE # 196712 / 9000_
>= 10 : 0.002
>= 5 : 0.001
>= 2 : 0.0005
>= 1 : 0.0002
>= 0.5 : 0.0001
>= 0.2 : 0.0001
>= 0.1 : 0.0001
>= 0.0001 : 0.0001
TABLE # 196714 / Dark
>= 0.0001 : 5e-05
TABLE # 196715 / GBD
>= 25 : 0.01
>= 10 : 0.005
>= 5 : 0.0025
>= 0.1 : 0.001
>= 0.0005 : 0.005"""
import re
out = {}
for table, data in re.findall(
r"^TABLE # (\d ).*?\n(.*?)(?=^TABLE|\Z)", s, flags=re.M | re.S
):
table = int(table)
for a, b in re.findall(r"([\de. -] )\s*:\s*([\de. -] )", data):
out.setdefault(table, []).append((float(a), float(b)))
print(out)
印刷:
{
196712: [
(10.0, 0.002),
(5.0, 0.001),
(2.0, 0.0005),
(1.0, 0.0002),
(0.5, 0.0001),
(0.2, 0.0001),
(0.1, 0.0001),
(0.0001, 0.0001),
],
196714: [(0.0001, 5e-05)],
196715: [
(25.0, 0.01),
(10.0, 0.005),
(5.0, 0.0025),
(0.1, 0.001),
(0.0005, 0.005),
],
}
uj5u.com熱心網友回復:
import fileinput
import sys
import re
from collections import defaultdict
from pprint import pprint
def parse_records(lines):
for l in lines:
if m := re.match(r'TABLE # (\d ) /.*', l):
yield m.groups()[0]
if m := re.match(r'>= (\S )\s : (.*)', l):
yield m.groups()
result = defaultdict(list)
record_id = None
for l in parse_records(fileinput.input()):
match l:
case (volume, penality):
result[record_id].append((float(volume), float(penality)))
case id:
record_id=int(id)
print("{")
for key, value in result.items():
print(f" {key}: {value}")
print("}")
運行:
% python3 t2.py < input.txt
{
196712: [(10.0, 0.002), (5.0, 0.001), (2.0, 0.0005), (1.0, 0.0002), (0.5, 0.0001), (0.2, 0.0001), (0.1, 0.0001), (0.0001, 0.0001)]
196714: [(0.0001, 5e-05)]
196715: [(25.0, 0.01), (10.0, 0.005), (5.0, 0.0025), (0.1, 0.001), (0.0005, 0.005)]
}
%
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/520936.html
