我想只獲取 URL 而無需重定向鏈接。我的代碼是:
html = '<a href="/biz_redir?url=https://aceplumbingandrooter.com&cachebuster=1642876680&website_link_type=website&src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f" rel="noopener nofollow" role="link" target="_blank">https://aceplumbingandrooter.c…</a>'
soup=BeautifulSoup(html,'lxml')
在標記['href']內容中:
href="/biz_redir?url=https://aceplumbingandrooter.com&cachebuster=1642876680&website_link_type=website&src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f"
我只想要鏈接網址: aceplumbingandrooter.com
uj5u.com熱心網友回復:
你可以使用urllib.parse包。您要查找的網址確實是 的引數之一/biz_redir,因此我們需要先從中獲取'url'引數。
from urllib.parse import urlparse, parse_qs
url = '/biz_redir?url=https://aceplumbingandrooter.com&' \
'cachebuster=1642876680&website_link_type=website&' \
'src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00' \
'f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f'
parsed_url = urlparse(url)
print(parse_qs(parsed_url.query)['url'][0])
這為您提供了完整的 URL https://aceplumbingandrooter.com。然后,您可以進一步決議它并獲取netloc,這是完整的代碼:
from urllib.parse import urlparse, parse_qs
url = '/biz_redir?url=https://aceplumbingandrooter.com&' \
'cachebuster=1642876680&website_link_type=website&' \
'src_bizid=hqjCHBGnEj4nECnLJBvjQw&s=2caa69aa7350cca9ad00' \
'f1fd1d5a6346f341dd43e1ede874aa2eaa94d6a3458f'
parsed_url = urlparse(url)
new = parse_qs(parsed_url.query)['url'][0]
new = urlparse(new)
print(new.netloc)
輸出:
aceplumbingandrooter.com
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/419203.html
標籤:
