BeautfiulSoup在抓取YouTubechanel時回傳空串列-有解無憂

我正在嘗試使用此代碼來獲取有關 youtube 頻道的一些公共資訊（API 不適合此任務）。

代碼示例：

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

# Uncomment to view all the data
# print(json.dumps(data))

# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)

# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]

print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])

預期結果（6個月前效果很好）：

Joined: Jun 30, 2007

. . 但現在得到了：

AttributeError: 'NoneType' object has no attribute 'group'

回溯顯示錯誤在這一行：

data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

您能否幫助解決此代碼繼續作業并回傳資料的問題？

任何幫助表示贊賞，謝謝

uj5u.com熱心網友回復：

你的代碼作業正常

import re
import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")

# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)

# Uncomment to view all the data
# print(json.dumps(data))

# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)

# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]

print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])

輸出：

Channel Views: 1,12,94,125?? ???
Joined: 30 ???, 2007

uj5u.com熱心網友回復：

您實際上并沒有在這里使用 BeautifulSoup。您只是獲取原始文本并在其中搜索字串。

這是網路抓取的問題。YouTube 更改了他們的 JavaScript，并且該變數不再存在。我們不知道您要查找什么，但您當前的方法行不通。您實際上可能需要使用 Selenium 來運行 Javascript 并從 DOM 中提取資訊。

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/448263.html

標籤：Python 网页抓取美丽的汤 YouTube

上一篇：如何在匯編x86中列印多個變數？

下一篇：Seleniumwebdriver回圈遍歷所有頁面，但只抓取第一頁的資料