我從網站上抓取資料,并設法獲得以下輸出:
['Datas',
'1999',
'2000',
'2001',
'Receita',
'líquida',
'29592',
'49782',
'57511',
'Custos',
'-18937',
'-28938',
'-34855',
'IR',
'e',
'CSSL',
'-486',
'-4361',
'-3875',
[...]]
我需要連接文本資料,以便稍后在 Pandas 中將它們用作列標題,因此我撰寫了以下 if 函式作為測驗:
array = ['FIRST',1,2,3,'PALAVRA',8,3,"FRASE","SEGUIDA",3,"CUSTO","DE","OPERA??O",5]
arrayb = str(array).replace(",","\n").replace(" ","").replace("'","").replace("[","").replace("]","").replace("'","")
arrayc = arrayb.splitlines()
last_key = None
fszsr = {}
for i in arrayc:
if i.isalpha():
idx=arrayc.index(i)
idxp1=idx 1
idxp1b=arrayc[idxp1]
if idxp1b.isalpha():
idxp2=idxp1 1
idxp2b=arrayc[idxp2]
arrayc.remove(idxp1b)
if idxp2b.isalpha():
last_key=i idxp1b idxp2b
arrayc.remove(idxp2b)
fszsr[last_key] = [i idxp1b idxp2b]
else:
last_key=i idxp1b
fszsr[last_key] = [i idxp1b]
else:
last_key=i
fszsr[last_key] = [i]
else:
last_key=i
fszsr[last_key].append(i)
fszsr
但輸出只是顯示了這一點:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
~\AppData\Local\Temp/ipykernel_19444/1702247249.py in <module>
26 else:
27 last_key=i
---> 28 fszsr[last_key].append(i)
29 fszsr
KeyError: '1'
無法弄清楚我做錯了什么,嘗試將串列更改為 str 但仍然不起作用,如果我只是更改附加部分以保留 last_key 它顯示如下輸出:
{'FIRST': ['FIRST'],
'1': ['1'],
'2': ['2'],
'3': ['3'],
'PALAVRA': ['PALAVRA'],
'8': ['8'],
'FRASESEGUIDA': ['FRASESEGUIDA'],
'CUSTODEOPERA??O': ['CUSTODEOPERA??O'],
'5': ['5']}
另外,這是我管理抓取資料的方式,也許原始資料有更好的方式?這是我從網站抓取中得到的第一個輸出:
'1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021\nReceita líquida\n29.592 49.782 57.511 69.176 95.742 108.201 136.605 158.238 170.577 215.118 182.710 213.273 244.176 281.379 304.889 337.259 321.638 282.589 283.695 349.836 302.245 272.069 393.450\nCustos\n-18.937 -28.938 -34.855 -44.205 -52.893 -63.100 -77.107 -94.665 -104.398 -141.623 -109.037 -136.051 -166.939 -210.472 -233.725 -256.335 -223.062 -192.611 -192.100 -225.293 -180.140 -148.107 -192.500\nResultado bruto\n10.655 20.844 22.656 24.971 42.849 45.101 59.498 63.573 66.179 73.495 73.673 77.222 77.237 70.907 71.164 80.924 98.576 89.978 91.595 124.543 122.105 123.962 200.950\nMargem bruta\n36% 42% 39% 36% 45% 42% 44% 40% 39% 34% 40% 36% 32% 25% 23% 24% 31% 32% 32% 36% 40% 46% 51%\nDespesas oper.\n-7.975 -6.815 -9.777 -14.236 -14.082 -15.209 -19.728 -21.625 -29.854 -24.591 -28.116 -31.647 -33.008 -39.431 -36.807 -102.841 -111.764 -73.496 -53.822 -59.667 -40.404 -74.341 19.601\nRes. operacional\n2.680 14.029 12.879 10.735 28.767 29.892 39.770 41.948 36.325 48.904 45.557 45.575 44.229 31.476 34.357 -21.917 -13.188 16.482 37.773 64.876 81.701 49.621 220.551\nMargem Oper.\n9% 28% 22% 16% 30% 28% 29% 27% 21% 23% 25% 21% 18% 11% 11% -6% -4% 6% 13% 19% 27% 18% 56%\nRes. Financeiro\n-399 309 1.032 1.166 -1.377 -3.171 -3.213 -1.341 -785 -698 -2.349 2.562 122 -3.722 -6.202 -3.899 -28.041 -27.185 -31.599 -21.100 -34.459 -49.584 -38.640\nIR e CSSL\n-486 -4.361 -3.875 -4.008 -7.815 -7.249 -10.802 -11.896 -11.272 -15.961 -9.977 -12.235 -11.241 -6.794 -5.147 3.892 6.058 -2.342 -5.797 -17.078 -16.400 6.209 -45.918\nOp. descontin.\n0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 10.128 0 0\nLucro líquido\n1.795 9.977 10.036 7.893 19.575 19.472 25.755 28.711 24.268 32.245 33.231 35.902 33.110 20.960 23.008 -21.924 -35.171 -13.045 377 26.698 40.970 6.246 135.993\nMargem líquida\n6% 20% 17% 11% 20% 18% 19% 18% 14% 15% 18% 17% 14% 7% 8% -7% -11% -5% 0% 8% 14% 2% 35%'
提前致謝!
uj5u.com熱心網友回復:
您可以讀取每個元素并檢查這是否是數字,如果不使用它來構建密鑰,如果數字添加為具有最后一個可用密鑰的元素。我們可以使用collections.defaultdict來簡化代碼(但不是絕對必要的)。
from collections import defaultdict
out = defaultdict(list)
key = None
reset_key = True
for item in array:
if str(item).isdigit():
out[key].append(item)
reset_key = True
elif reset_key:
key = item
reset_key = False
else:
key = f' {item}'
dict(out)
輸出:
{'FIRST': [1, 2, 3],
'PALAVRA': [8, 3],
'FRASE SEGUIDA': [3],
'CUSTO DE OPERA??O': [5]}
輸入:
array = ['FIRST',1,2,3,'PALAVRA',8,3,"FRASE","SEGUIDA",3,"CUSTO","DE","OPERA??O",5]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/412059.html
標籤:
