因此,我嘗試使用以下代碼從 href 屬性與模式 /how-to-use/[a-zA-Z] 匹配的網站中抓取所有標簽
代碼在這里:
import requests
from bs4 import BeautifulSoup
import re
webpage = requests.get('https://www.talkenglish.com/vocabulary/top-1500-nouns.aspx').content
soup = BeautifulSoup(webpage, "html.parser")
def has_how_to_use(tag):
pattern = re.compile('\/how-to-use\/[a-zA-Z] ')
return bool(re.search(pattern, tag.attr('href')))
word_list = soup.find_all(has_how_to_use)
但我不斷收到關于無法呼叫 NoneType 物件的錯誤,我只是不確定哪個位正在評估為 NoneType 物件
uj5u.com熱心網友回復:
您可以將正則運算式模式作為關鍵字引數find_all()傳遞給以查找href包含您的模式的所有's :
soup = BeautifulSoup(webpage, "html.parser")
for tag in soup.find_all("a", href=re.compile(r"/how-to-use/[a-zA-Z] ")):
print(tag)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/341152.html
