PythonBeautifulSoup無法從具有特定類的div獲取資料-有解無憂

我正在開發一個程式，該程式將從我的圖書館中刮取 metacritic 的電影資訊并顯示它，但在某些部分，如獲取評級總是什么都不回傳我做錯了什么？

from bs4 import BeautifulSoup
import requests
import os

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers) 
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("div", {"class": "movie_rating"})
    print(g_data)

    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

def getMovieInfo():
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_5_0) AppleWebKit/536.1 (KHTML, like Gecko) Chrome/58.0.849.0 Safari/536.1'}
    
    for movie in os.listdir("D:/Movies/"):
        movie = movie.lower().replace(".mp4","")
        print(movie)
        print("Rating: "   ratingsGet(headers,movie))
        print("Home release year: "   rYearGet(headers,movie))
        break

html片段：

<table class="details" summary="13 Going on 30 Details and Credits">
<tr class="runtime">
<td class="label">Runtime:</td>
<td class="data">98 min</td>
</tr>
<tr class="movie_rating">
<td class="label">Rating:</td>
<td class="data">
                                                                            Rated PG-13 for some sexual content and brief drug references.
                                                                    </td>
</tr>
<tr class="company">
<td class="label">Production:</td>
<td class="data">Revolution Studios</td>
</tr>

uj5u.com熱心網友回復：

正如您所說，您需要尋找“tr”（而不是“div”）。我還將附加到這個答案。

僅嘗試使用find（無需全部查找）
如果結果find不是無，則在其中再次查找以僅獲取文本，如下所示：

g_data.find("td", { "class": "data" }).text

一般代碼將是這樣的：

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    g_data = soup.find("tr", {"class": "movie_rating"})

    # Check if that tr exists
    if g_data is not None:
        g_data = g_data.find("td", { "class": "data" })

    # Check if the td inside of it exists
    if g_data is not None:
        return g_data.text.strip()
    return "Failed"

uj5u.com熱心網友回復：

我只是在尋找錯誤的元素....

def ratingsGet(headers, movie):
    movie = movie.lower().replace(" ","-")
    detail_link="https://www.metacritic.com/movie/"   movie   "/details"
    detail_page = requests.get(detail_link, headers = headers)
    soup = BeautifulSoup(detail_page.content, "html.parser")
    #g_data = soup.select('tr.movie_rating td.data span')
    g_data = soup.find_all("tr", {"class": "movie_rating"})
    print(g_data[0].text.strip(" "))
    
    if g_data!= []:
        return g_data[0].text
    else:
        return "Failed"

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/480109.html

標籤：Python 网络网页抓取美丽的汤

上一篇：如何使用shadow-root訪問網站中的產品元素？

下一篇：網頁抓取回傳一個空串列