我無法從字串中保存 url。
我嘗試過這樣的事情
url = "https://in.indeed.com/jobs?q=software engineer &l=Kerala"
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find_all("div",{"class:","pagination"})
url = [Links1.find(('a')['href'] for tag in Links1)]
WEbsite=f'https://in.indeed.com{url[0]}'
但它沒有回傳完整的 url 串列。我需要 url 導航到下一頁。
uj5u.com熱心網友回復:
您是在“下一頁”之后還是想要所有鏈接?
所以你想要:
/jobs?q=software engineer &l=Kerala&start=10
還是你在所有這些之后?
/jobs?q=software engineer &l=Kerala&start=10
/jobs?q=software engineer &l=Kerala&start=20
/jobs?q=software engineer &l=Kerala&start=30
/jobs?q=software engineer &l=Kerala&start=40
/jobs?q=software engineer &l=Kerala&start=10
幾個問題:
Links1是一個元素串列。然后你.find('a')在一個串列上使用,這是行不通的。- 由于您需要 href 屬性,請考慮使用
find('a',href=True)
所以這就是我將如何去做:
import requests
from bs4 import BeautifulSoup
url = "https://in.indeed.com/jobs?q=software engineer &l=Kerala"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find_all("div",{"class":"pagination"})
url = [tag.find('a',href=True)['href'] for tag in Links1]
website=f'https://in.indeed.com{url[0]}'
輸出:
print(website)
https://in.indeed.com/jobs?q=software engineer &l=Kerala&start=10
要獲取所有這些鏈接:
import requests
from bs4 import BeautifulSoup
url = "https://in.indeed.com/jobs?q=software engineer &l=Kerala"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
Links1 = soup.find("div",{"class":"pagination"})
urls = [tag['href'] for tag in Links1.find_all('a',href=True)]
website=f'https://in.indeed.com{url[0]}'
uj5u.com熱心網友回復:
您應該使用find()而不是find_all(),然后這個修改后的 url 串列應該可以作業:
Links1 = soup.find_all("div",{"class:","pagination"})
urls = [i['href'] for i in Links1.find_all('a') if 'href' in i.attrs]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/441998.html
上一篇:使用BS4排除跨度-Python
