我試圖從這個
代碼
import bs4, requests
import pandas as pd
wagon_stock_url = 'https://parramattamg.com.au/up4053-961230-mg-hs-2020.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like
Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
response = requests.get(wagon_stock_url, headers = headers)
soup = bs4.BeautifulSoup(response.text, 'html.parser')
name = soup.select(".stockItemInfo").
我知道soup.select(".stockItemInfo")只選擇類專案作為串列,但是如何在迭代中獲取每個專案?
uj5u.com熱心網友回復:
您接近解決方案 - 只需將一個添加li到您的css selector,就會為您提供所有串列元素的結果集:
name = soup.select(".stockItemInfo li")
--> [<li> <span><strong>Vehicle</strong></span>: 2020 MG HS </li>, <li> <span><strong>Series</strong></span>: SAS23 MY20 </li>, <li> <span><strong>Badge</strong></span>: Vibe DCT FWD </li>, <li> <span><strong>Colour</strong></span>: White </li>, <li> <span><strong>Odometer</strong></span>: 11,213kms </li>, <li> <span><strong>Body</strong></span>: Wagon </li>, <li> <span><strong>Engine</strong></span>: 1.5 litre, 4-cylinder </li>, <li> <span><strong>Fuel Type</strong></span>: Petrol </li>, <li> <span><strong>Transmission</strong></span>: 7-speed Automatic </li>, <li> <span><strong>Doors</strong></span>: 5-door </li>, <li> <span><strong>Seats</strong></span>: 5 </li>, <li> <span><strong>Trim</strong></span>: Black </li>, <li> <span><strong>VIN</strong></span>: LSJA24U92LN012249 </li>, <li> <span><strong>Registration</strong></span>: EIT61T </li>, <li> <span><strong>Stock Number</strong></span>: UP4053 </li>, <li> <span><strong>MY</strong></span>: 20 </li>]
或僅獲取串列中的名稱:
names = [x.text for x in soup.select(".stockItemInfo li strong")]
--> ['Vehicle', 'Series', 'Badge', 'Colour', 'Odometer', 'Body', 'Engine', 'Fuel Type', 'Transmission', 'Doors', 'Seats', 'Trim', 'VIN', 'Registration', 'Stock Number', 'MY']
獲取帶有名稱和值的字典串列
如果您想發布流程,請推送至pd.DataFrame(data)...
data = []
for x in soup.select(".stockItemInfo li"):
item = x.text.strip().split(':')
data.append({
'name': item[0],
'value': item[1]
})
data
輸出
[{'name': 'Vehicle', 'value': ' 2020 MG HS'},
{'name': 'Series', 'value': ' SAS23 MY20'},
{'name': 'Badge', 'value': ' Vibe DCT FWD'},
{'name': 'Colour', 'value': ' White'},
{'name': 'Odometer', 'value': ' 11,213kms'},
{'name': 'Body', 'value': ' Wagon'},
{'name': 'Engine', 'value': ' 1.5 litre, 4-cylinder'},
{'name': 'Fuel Type', 'value': ' Petrol'},
{'name': 'Transmission', 'value': ' 7-speed Automatic'},
{'name': 'Doors', 'value': ' 5-door'},
{'name': 'Seats', 'value': ' 5'},
{'name': 'Trim', 'value': ' Black'},
{'name': 'VIN', 'value': ' LSJA24U92LN012249'},
{'name': 'Registration', 'value': ' EIT61T'},
{'name': 'Stock Number', 'value': ' UP4053'},
{'name': 'MY', 'value': ' 20'}]
uj5u.com熱心網友回復:
到目前為止,最小的作業解決方案:
代碼
import bs4, requests
import pandas as pd
wagon_stock_url = 'https://parramattamg.com.au/up4053-961230-mg-hs-2020.html'
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36'
}
response = requests.get(wagon_stock_url, headers = headers)
soup = bs4.BeautifulSoup(response.text, 'html.parser')
data=[]
names = soup.select(".stockItemInfo > ul >li")
for name in names:
name= name.get_text(strip=True).split(':')
Name= name[0]
Value= name[1]
data.append([Name,Value])
cols=["Name","Value"]
df = pd.DataFrame(data,columns=cols)
print(df)
#df.to_csv('info.csv',index=False) #to store data in your system
輸出:
Name Value
0 Vehicle 2020 MG HS
1 Series SAS23 MY20
2 Badge Vibe DCT FWD
3 Colour White
4 Odometer 11,213kms
5 Body Wagon
6 Engine 1.5 litre, 4-cylinder
7 Fuel Type Petrol
8 Transmission 7-speed Automatic
9 Doors 5-door
10 Seats 5
11 Trim Black
12 VIN LSJA24U92LN012249
13 Registration EIT61T
14 Stock Number UP4053
15 MY 20
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/370853.html
