我想撰寫一個計算 YouTube 頻道之類的程式。這是我的代碼。
import re
import requests
from bs4 import BeautifulSoup
r = requests.get("https://filmot.com/channel/UCX6OQ3DkcsbYNE6H8uQQuVA")
soup = BeautifulSoup(r.text , "html.parser")
val=soup.find_all("span",attrs={"class":"badge"})
res = re.findall(r"class=\"fa fa-thumbs-up\"></i>(.*)\<" , str(val))
print(res)
但它回傳結果。
['404.1K</span>, <span >Entertainment</span>, <span >8m1s</span>, <span >18 Dec 2021</span>, <span ><i aria-hidden="true" ></i>10M</span>, <span ><i aria-hidden="true" ></i>957.2K</span>, <span >Entertainment</span>, <span >12m9s</span>, <span >16 Dec 2021</span>, <span ><i aria-hidden="true" ></i>14.6M</span>, <span ><i aria-hidden="true" ></i>1.4M</span>, <span >Entertainment</span>, <span >12m4s</span>, <span >10 Dec 2021</span>, <span ><i aria-hidden="true" ></i>11.3M</span>, <span ><i aria-hidden="true" ></i>1.1M</span>, <span ><i aria-hidden="true" ></i>5.1K</span>, <span >Entertainment</span>, <span >11m1s</span>, <span >24 Nov 2021</span>, <span ><i aria-hidden="true" ></i>17.5M</span>, <span ><i aria-hidden="true" ></i>2.8M</span>, <span ><i aria-hidden="true" ></i>3.5K</span>, <span >Entertainment</span>, <span >25m41s</span>, <span >29 Oct 2021</span>, <span ><i aria-hidden="true" ></i>17M</span>, <span ><i aria-hidden="true" ></i>2M</span>, <span ><i aria-hidden="true" ></i>6K</span>, <span >Entertainment</span>, <span >4m55s</span>, <span >23 Oct 2021</span>, <span ><i aria-hidden="true" ></i>19.4M</span>, <span ><i aria-hidden="true" ></i>1.4M</span>, <span ><i aria-hidden="true" ></i>12.5K</span>, <span >Entertainment</span>, <span >15m42s</span>, <span >12 Oct 2021</span>, <span ><i aria-hidden="true" ></i>127.7K</span>, <span ><i aria-hidden="true" ></i>15.3K</span>, <span >Entertainment</span>, <span >5m20s</span>, <span >26 Sep 2021</span>, <span ><i aria-hidden="true" ></i>7.7M</span>, <span ><i aria-hidden="true" ></i>777.1K</span>, <span ><i aria-hidden="true" ></i>6.1K</span>, <span >Entertainment</span>, <span >8m2s</span>, <span >04 Sep 2021</span>, <span ><i aria-hidden="true" ></i>48.4M</span>, <span ><i aria-hidden="true" ></i>2.5M</span>, <span ><i aria-hidden="true" ></i>24.1K</span>, <span >Entertainment</span>, <span >12m40s</span>, <span >31 Aug 2021</span>, <span ><i aria-hidden="true" ></i>69.8M</span>, <span ><i aria-hidden="true" ></i>3M</span>, <span ><i aria-hidden="true" ></i>38.6K</span>, <span >Entertainment</span>, <span >19m25s</span>, <span >07 Aug 2021</span>, <span ><i aria-hidden="true" ></i>53.3M</span>, <span ><i aria-hidden="true" ></i>2.2M</span>, <span ><i aria-hidden="true" ></i>29.1K</span>, <span >Entertainment</span>, <span >16m40s</span>, <span >24 Jul 2021</span>, <span ><i aria-hidden="true" ></i>44.6M</span>, <span ><i aria-hidden="true" ></i>1.7M</span>, <span ><i aria-hidden="true" ></i>21.4K</span>, <span >Entertainment</span>, <span >10m45s</span>, <span >10 Jul 2021</span>, <span ><i aria-hidden="true" ></i>42.2M</span>, <span ><i aria-hidden="true" ></i>1.7M</span>, <span ><i aria-hidden="true" ></i>24.1K</span>, <span >Entertainment</span>, <span >11m34s</span>, <span >26 Jun 2021</span>, <span ><i aria-hidden="true" ></i>53.6M</span>, <span ><i aria-hidden="true" ></i>1.8M</span>, <span ><i aria-hidden="true" ></i>30.6K</span>, <span >Entertainment</span>, <span >12m33s</span>, <span >12 Jun 2021</span>, <span ><i aria-hidden="true" ></i>49.5M</span>, <span ><i aria-hidden="true" ></i>1.9M</span>, <span ><i aria-hidden="true" ></i>29.2K</span>, <span ....
我在 regex101.com 網站上對其進行了測驗,結果是正確的。你可以在這張圖片中看到。 在此處輸入影像描述
uj5u.com熱心網友回復:
如果您想使用正則運算式,那么在這種情況下最好使用積極的后視,例如
(?<=class=\"fa fa-thumbs-up\"></i>)[\d\w.] 在res = re.findall(r"(?<=class=\"fa fa-thumbs-up\"></i>)[\d\w.] " , str(val)). .*可能會很棘手,因為捕獲.任何字符并*在零次和無限次之間捕獲它(這是一個貪婪的正則運算式運算子的示例)。
uj5u.com熱心網友回復:
如果您已經在使用 BeautifulSoup,則無需使用正則運算式。
從包含fa fa-thumbs-up節點的所有val專案中提取文本:i class
for v in val:
if v.find("i", attrs={'class': 'fa fa-thumbs-up'}):
print(v.text)
或者,將它們放入串列中:
values = [v.text for v in val if v.find("i", attrs={'class': 'fa fa-thumbs-up'})]
uj5u.com熱心網友回復:
您可能可以跳過很多正則運算式,只需遍歷 ResultSet 并在進行更簡單的匹配時使用正則運算式:
res = list()
for entry in val:
if "fa-thumbs-up" in str(entry):
tmp = re.search(r"</i>(.*)</span>", str(entry))
if tmp:
res.append(tmp.group(1))
然后:
print(res[:10])
輸出:
['404.1K', '957.2K', '1.4M', '1.1M', '2.8M', '2M', '1.4M', '15.3K', '777.1K', '2.5M']
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/429560.html
