我正在嘗試使用此代碼來獲取有關 youtube 頻道的一些公共資訊(API 不適合此任務)。
代碼示例:
import re
import json
import requests
from bs4 import BeautifulSoup
URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)
# Uncomment to view all the data
# print(json.dumps(data))
# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)
# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]
print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])
預期結果(6個月前效果很好):
Joined: Jun 30, 2007
. . 但現在得到了:
AttributeError: 'NoneType' object has no attribute 'group'
回溯顯示錯誤在這一行:
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)
您能否幫助解決此代碼繼續作業并回傳資料的問題?
任何幫助表示贊賞,謝謝
uj5u.com熱心網友回復:
你的代碼作業正常
import re
import json
import requests
from bs4 import BeautifulSoup
URL = "https://www.youtube.com/c/Rozziofficial/about"
soup = BeautifulSoup(requests.get(URL).content, "html.parser")
# We locate the JSON data using a regular-expression pattern
data = re.search(r"var ytInitialData = ({.*});", str(soup)).group(1)
# Uncomment to view all the data
# print(json.dumps(data))
# This converts the JSON data to a python dictionary (dict)
json_data = json.loads(data)
# This is the info from the webpage on the right-side under "stats", it contains the data you want
stats = json_data["contents"]["twoColumnBrowseResultsRenderer"]["tabs"][5]["tabRenderer"]["content"]["sectionListRenderer"]["contents"][0]["itemSectionRenderer"]["contents"][0]["channelAboutFullMetadataRenderer"]
print("Channel Views:", stats["viewCountText"]["simpleText"])
print("Joined:", stats["joinedDateText"]["runs"][1]["text"])
輸出:
Channel Views: 1,12,94,125?? ???
Joined: 30 ???, 2007
uj5u.com熱心網友回復:
您實際上并沒有在這里使用 BeautifulSoup。您只是獲取原始文本并在其中搜索字串。
這是網路抓取的問題。YouTube 更改了他們的 JavaScript,并且該變數不再存在。我們不知道您要查找什么,但您當前的方法行不通。您實際上可能需要使用 Selenium 來運行 Javascript 并從 DOM 中提取資訊。
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/448263.html
上一篇:如何在匯編x86中列印多個變數?
