我正在嘗試從一個網站獲取資料,但在如何處理“索引超出范圍”錯誤或在 .csv 檔案中產生兩個單獨的行時遇到了困難。我所說的錯誤“索引超出范圍”的意思是,在這個站點上,某些記錄可能有空值,我不知道如何將正確的條件放入回圈中。我使用了一些指南,但它讓我無處可去。
my_url = uReq('website', context=ssl.create_default_context(cafile=certifi.where()))
uClient = my_url
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
containers = page_soup.select('div.header__title, div.info__cta')
container = containers[0]
filename = "products.csv"
f = open(filename,"w")
headers="Product_Name, PriceWithVAT, PriceWithoutVAT, Stock\n"
f.write(headers)
for container in containers:
productName = container.findAll("span", {"class":"sku"})
name = productName[0].text if container.findAll("span", {"class":"sku"}) else "lack name"
priceWithVAT = container.findAll("span", {"class":"price-intax"})
price = priceWithVAT[0].text if container.findAll("span", {"class":"price-intax"}) else "lack price"
priceWithoutVAT = container.findAll("span", {"class":"price-extax"})
priceNot = priceWithoutVAT[0].text if container.findAll("span", {"class":"price-extax"}) else "lack price2"
stock = container.findAll("p", {"class":"stock in-stock"})
stock = stock[0].text if container.findAll("p", {"class":"stock in-stock"}) else "lack on stock"
f.write(name "," price "," priceNot "," stock "\n" "\n")
f.close()
然后在 .csv 檔案中,我得到了整個頁面的結果,每個產品都分為兩行,例如:
CORRECT,lack price,lack price2,lack on stock
lack name,CORRECT,CORRECT,CORRECT
我的預期輸出:
CORRECT, CORRECT, CORRECT, CORRECT
(正確的意思是從網站上抓取正確的資料)
當我
if container.findAll("span", {"class":"sku"}) else "lack name"從回圈中洗掉和類似內容時,它向我顯示索引超出范圍錯誤,因為它應該有,因為有一些空值。
你能幫我改一下代碼嗎?
uj5u.com熱心網友回復:
需要在這里稍微改變你的邏輯。我要做的不是獲取每個container產品名稱,然后是產品資訊,而是獲取包含所有資訊的整個容器。您會注意到每個產品都在<li>標簽下的<ul >標簽中。
因此,讓我們首先獲取<ul>具有以 . 開頭的類的標簽'products'。然后從那里獲取所有<li>標簽。然后我們將遍歷其中的每一個并提取所需的資料。
正如你所說,一些標簽不存在,所以我們會做一個try/except. 它會嘗試獲取資料,如果失敗,它將默認為except例外。
此外,pandas它是一個非常好的和有用的庫,可以使用/學習。所以我同意了,而不是像你擁有的那樣寫入 csv 檔案。
代碼:
import requests
from bs4 import BeautifulSoup
import re
url = 'https://specjal.com/sklep/'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
products = soup.find('ul', {'class':re.compile('^products')}).find_all('li')
rows = []
for product in products:
try:
productName = product.find('span',{'class':'sku'}).text
except:
productName = 'lack name'
try:
priceWithVAT = product.find('span',{'class':'price-intax'}).text
except:
priceWithVAT = 'lack price'
try:
priceWithoutVAT = product.find('span',{'class':'price-extax'}).text
except:
priceWithoutVAT = 'lack price2'
try:
stock = int(product.find('p',{'class':'stock in-stock'}).text.split()[0])
except:
stock = 'lack on stock'
# consider changing the above line to stock = 0
row = {
'productName':productName,
'priceWithVAT':priceWithVAT,
'priceWithoutVAT':priceWithoutVAT,
'stock':stock}
rows.append(row)
df = pd.DataFrame(rows)
df.to_csv('products.csv', index=False)
輸出:
print(df)
productName priceWithVAT priceWithoutVAT stock
0 ZZ 90*105*4 VAY 14.86z?/szt. 12.08 z? bez VAT 10
1 ZZ 85*100*5 VAY 13.76z?/szt. 11.19 z? bez VAT 10
2 ZZ 80*95*4 VAY 12.66z?/szt. 10.29 z? bez VAT 20
3 ZZ 75*90*4 VAY 11.01z?/szt. 8.95 z? bez VAT 20
4 ZZ 70*85*4 VAY 9.91z?/szt. 8.06 z? bez VAT 20
5 ZZ 65*80*5 VAY 9.36z?/szt. 7.61 z? bez VAT 20
6 ZZ 65*80*4 VAY 9.36z?/szt. 7.61 z? bez VAT 20
7 ZZ 60*75*5 VAY 8.25z?/szt. 6.71 z? bez VAT 14
8 ZZ 55*65*4 VAY 7.71z?/szt. 6.27 z? bez VAT 10
9 ZZ 50*60*4 VAY 6.61z?/szt. 5.37 z? bez VAT 20
10 ZZ 45*55*4 VAY 6.05z?/szt. 4.92 z? bez VAT 20
11 ZZ 40*50*4 VAY 5.39z?/szt. 4.38 z? bez VAT 17
12 ZZ 35*45*4 VAY 4.8z?/szt. 3.9 z? bez VAT 30
13 ZZ 30*40*4 VAY 4.26z?/szt. 3.46 z? bez VAT 20
14 XPA 710 CT 39.61z?/szt. 32.2 z? bez VAT lack on stock
15 UCP 202 KBF 19.7z?/szt. 16.02 z? bez VAT lack on stock
16 U298/U291 SET9 188.04z?/szt. 152.88 z? bez VAT lack on stock
17 U 64*80*8 11.8z?/szt. 9.59 z? bez VAT 2
18 U 6*10*3 2.51z?/szt. 2.04 z? bez VAT 4
19 U 45*53*10 RSB 7.55z?/szt. 6.14 z? bez VAT lack on stock
20 U 30*40*7 K21 NBR 8z?/szt. 6.5 z? bez VAT 5
21 U 180*200*14 K50 37.74z?/szt. 30.68 z? bez VAT lack on stock
22 U 16*24*5,5 NI300 8.56z?/szt. 6.96 z? bez VAT 13
23 U 140*160*14 K50 21.92z?/szt. 17.82 z? bez VAT lack on stock
24 U 140*160*14 K23 23.71z?/szt. 19.28 z? bez VAT 3
25 TR16*4*540MM 38.27z?/szt. 31.11 z? bez VAT lack on stock
26 TP 600 8M/20 156.7z?/szt. 127.4 z? bez VAT lack on stock
27 TP 15*1,5 27.56z?/szt. 22.41 z? bez VAT lack on stock
28 ST 3568 LFT 94.34z?/szt. 76.7 z? bez VAT lack on stock
29 SC07A87CS32 47.32z?/szt. 38.47 z? bez VAT lack on stock
30 SC04B19CS31PX2 46.3z?/szt. 37.64 z? bez VAT 3
31 R28-9 96.05z?/szt. 78.09 z? bez VAT 2
32 R 2-6 ZZ SS 13.47z?/szt. 10.95 z? bez VAT lack on stock
33 QJ 213 MPA C3 412.06z?/szt. 335.01 z? bez VAT lack on stock
34 PJ 1219 5.97z?/szt. 4.85 z? bez VAT lack on stock
35 OW1 115*94*8,1 15.72z?/szt. 12.78 z? bez VAT 2
36 OGNIWO 08B-3 CL 7.23z?/szt. 5.88 z? bez VAT 7
37 NU 2311 ETVP2 C3 408.34z?/szt. 331.98 z? bez VAT lack on stock
38 NJ 2210 ET C4 195.19z?/szt. 158.69 z? bez VAT 4
39 NJ 209 ETVP 101.89z?/szt. 82.84 z? bez VAT 2
40 NA 4901 CZH 11.64z?/szt. 9.46 z? bez VAT lack on stock
41 MR 16277 2RS 32z?/szt. 26.02 z? bez VAT 4
42 ?A?CUCH 08 B-3 76.38z?/szt. 62.1 z? bez VAT 20
43 KP 16 L100 33.86z?/szt. 27.53 z? bez VAT lack on stock
44 K 81130 SRBF 132.45z?/szt. 107.68 z? bez VAT 2
45 JL 68145/111 NAF 17.59z?/szt. 14.3 z? bez VAT lack on stock
46 HTF O 45-7 A G5 N C3 lack price lack price2 lack on stock
47 HRC 35*45*45 37.08z?/szt. 30.15 z? bez VAT 6
48 HK 3520 B 22.39z?/szt. 18.2 z? bez VAT lack on stock
49 HGY 15*21*1 0.74z?/szt. 0.6 z? bez VAT 8
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/441867.html
