獲取a標簽的href-有解無憂

我想從cinch.co.uk網站上抓取資料。我將Python與BeautifulSoup4和 Request 庫一起使用。

對于每個汽車廣告，我想進入每個鏈接，然后抓取汽車資料。這是每個廣告的HTML 和 CSS。我可以看到，當我沒有點擊 h3 標簽時，文本是...，但是，如果我點擊它是不同的。

我遇到的問題是，當我進入 h3 標簽級別（a標簽所在的位置）時，它似乎無法看到它，因為在我運行ad = car.find('div', {'class': 'jB_k1'}).find('h3')之后我列印（廣告）我得到了這個。廣告鏈接的唯一參考是標簽，因此我無法從其他標簽獲取鏈接。我有這個問題是因為網站使用 ::before 嗎？

這是我迄今為止嘗試過的：

"""
Method to get the HTML of a page
website - URL of the page

return - HTML of the page

"""
def getData(website):
       response = session.get(website)
       soup = BeautifulSoup(response.text, 'html.parser')
       return soup

"""
Method to get to  the next page
soup - html of a page

return - url of the next page or none if it doesn't exist
"""
def getNextPage(soup):
    pages = soup.find('ul', {'class' :'cf_gY'})
    pages = soup.find_all('li', {'class' : 'cf_kD'})
       
    website = None
    for page in pages:
        if page.find('a', {'aria-label' : 'Next page'}):
            website = 'http://www.cinch.co.uk'   str(page.find('a')['href'])
    
    return website
        
"""
Method to click onto a car ad
car - HTML of the car ad

return - URL of the car ad or none if it doesn't exist
"""
def getIntoPage(car):
    ad = 'https://www.cinch.co.uk'   car.find('a', {'class' : 'jB_dD'})['href']
    return ad

while True:

soup = getData(website)
website = getNextPage(soup)
nr =1

#finds all the cars
cars = soup.find('ol', {'class': 'fJ_gY'})
cars = soup.find_all('article', {'class': 'lC_gQ lC_RB'})

for car in cars:
    
    ad = car.find('div', {'class': 'jB_k1'}).find('h3')
    getIntoPage(ad)
    break
break

我的中斷陳述句僅用于測驗一個廣告，因為網站上有大量這些廣告。

uj5u.com熱心網友回復：

您遇到此問題是因為該網站使用了請求模塊無法呈現的 javascript。到目前為止，我找到的唯一解決方案是將 selenium 與 webdriver 一起使用并使用 javascript 呈現頁面。不幸的是，據我所知，請求模塊無法處理動態內容。

轉載請註明出處，本文鏈接：https://www.uj5u.com/shujuku/361671.html

標籤：Python html css 网页抓取美汤

上一篇：如何通過xpath選擇上一個元素？

下一篇：WebScraping&BeautifulSoup-下一頁決議