我想把這本詞典刮掉,因為它是不同的動詞。動詞出現在這個“https://www.spanishdict.com/conjugate/”加上動詞。所以,例如:對于動詞“hacer”,我們將有:https : //www.spanishdict.com/conjugate/hacer
我想抓取包含每個動詞變位的所有可能鏈接,并將它們作為字串串列回傳。所以我做了以下事情:
import requests
from bs4 import BeautifulSoup
url = 'https://www.spanishdict.com/conjugate/'
for i in url:
reqs = requests.get(url str())
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a'):
urls.append(link.get('href'))
print(urls)
但是當我列印網址時,我只得到一些空串列。
預期輸出樣本:
['https://www.spanishdict.com/conjugate/hacer', 'https://www.spanishdict.com/conjugate/tener',...etc]
uj5u.com熱心網友回復:
當您遍歷“url”時,您正在遍歷一個字串。看看這段代碼:
url = 'https://www.spanishdict.com/conjugate/'
for i in url:
print(i)
這會產生 URL 的每個字母:
h
t
t
p
s
:
/
/
w
w
w
<truncated>
你在這里也做錯了什么:
reqs = requests.get(url str())
我不確定您要做什么,但 'url str()' 只是 URL 加上一個空字串,即 URL。
如果洗掉 for 回圈和不必要的空字串,你會得到我認為你想要得到的東西:
import requests
from bs4 import BeautifulSoup
url = 'https://www.spanishdict.com/conjugate/'
reqs = requests.get(url str())
soup = BeautifulSoup(reqs.text, 'html.parser')
urls = []
for link in soup.find_all('a'):
urls.append(link.get('href'))
print(urls)
這會產生:
['/', '/learn', '/translation', '/conjugation', '/vocabulary', '#', '/translation', '/conjugation', '/vocabulary', '/guide', '/pronunciation', '/wordoftheday', '/learn', '/guide/spanish-present-tense-forms', '/guide/spanish-present-progressive-forms', '/guide/spanish-preterite-tense-forms', '/guide/spanish-imperfect-tense-forms', '/guide/simple-future-regular-forms-and-tenses', '/guide/spanish-present-subjunctive', '/guide/commands', '/guide/spanish-imperfect-subjunctive', '/guide', '/drill?drill_start_source=conjugation hubpage', 'https://play.google.com/store/apps/details?id=com.spanishdict.spanishdict&referrer=utm_campaign=adhesion', '/wordoftheday', '/translate/patinar', '/', 'https://www.ingles.com/verbos', 'https://www.curiositymedia.com/', 'https://help.spanishdict.com/', '/company/privacy', '/company/tos', '/sitemap', '/', 'https://www.ingles.com/verbos', '/translation', '/conjugation', '/vocabulary', '/learn', '/guide', '/wordoftheday', 'https://www.curiositymedia.com/', '/company/privacy', '/company/tos', '/sitemap', 'https://help.spanishdict.com/', 'https://help.spanishdict.com/contact', 'https://www.facebook.com/pages/SpanishDict/92805940179', 'https://twitter.com/spanishdict', 'https://www.instagram.com/spanishdict/', 'https://itunes.apple.com/us/app/spanishdict/id332510494', 'https://play.google.com/store/apps/details?id=com.spanishdict.spanishdict&referrer=utm_source=sd-footer']
這個鏈接串列是您的目標嗎?
uj5u.com熱心網友回復:
編輯
Hopfully 明白你的意思 - 如果是這樣,應該改進問題。要從 javascript 中獲取資訊,您可以使用正則運算式決議回應:
import requests
import json
import re
r = requests.get('https://www.spanishdict.com/conjugation')
m = re.search(r'window.SD_COMPONENT_DATA = ({.*})', r.text)
['https://www.spanishdict.com/conjugate/' w for x in json.loads(m.group(1))['searchQuickLinkSections'] for w in x['words']]
輸出
['https://www.spanishdict.com/conjugate/tener',
'https://www.spanishdict.com/conjugate/hacer',
'https://www.spanishdict.com/conjugate/ser',
'https://www.spanishdict.com/conjugate/estar',
'https://www.spanishdict.com/conjugate/haber',
'https://www.spanishdict.com/conjugate/ir',
'https://www.spanishdict.com/conjugate/poder',
'https://www.spanishdict.com/conjugate/decir',
'https://www.spanishdict.com/conjugate/cerrar',
'https://www.spanishdict.com/conjugate/mentir',
'https://www.spanishdict.com/conjugate/dormir',
'https://www.spanishdict.com/conjugate/recordar',
'https://www.spanishdict.com/conjugate/seguir',
'https://www.spanishdict.com/conjugate/medir',
'https://www.spanishdict.com/conjugate/adquirir',
'https://www.spanishdict.com/conjugate/jugar',
'https://www.spanishdict.com/conjugate/vestirse',
'https://www.spanishdict.com/conjugate/divertirse',
'https://www.spanishdict.com/conjugate/acostarse',
'https://www.spanishdict.com/conjugate/ponerse',
'https://www.spanishdict.com/conjugate/despertarse',
'https://www.spanishdict.com/conjugate/sentirse',
'https://www.spanishdict.com/conjugate/levantarse',
'https://www.spanishdict.com/conjugate/sentarse',
'https://www.spanishdict.com/conjugate/gustar',
'https://www.spanishdict.com/conjugate/alegrar',
'https://www.spanishdict.com/conjugate/quedar',
'https://www.spanishdict.com/conjugate/encantar',
'https://www.spanishdict.com/conjugate/parecer',
'https://www.spanishdict.com/conjugate/faltar',
'https://www.spanishdict.com/conjugate/doler',
'https://www.spanishdict.com/conjugate/interesar']
獲得預期的輸出,您應該有一個動詞串列。雖然您的問題中沒有提供任何來源,但這是生成此類資訊的良好開端,但我使用了串列verbs-top-500和串列理解。
對于<a>其中包含的所有內容translate,href它將您的 url 和作為直接子項中<div>的文本的動詞連接起來<a>:
['https://www.spanishdict.com/conjugate/' a.div.text for a in soup.select('a[href*="translate"]')]
例子
import requests,json
from bs4 import BeautifulSoup
url='https://www.spanishdict.com/lists/1690101/verbs-top-500'
headers ={
'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36'
}
r =requests.get(url,headers=headers)
soup=BeautifulSoup(r.text, 'lxml')
urls = ['https://www.spanishdict.com/conjugate/' a.div.text for a in soup.select('a[href*="translate/"]')]
輸出
['https://www.spanishdict.com/conjugate/procurar', 'https://www.spanishdict.com/conjugate/podar', 'https://www.spanishdict.com/conjugate/pillar', 'https://www.spanishdict.com/conjugate/perrear', 'https://www.spanishdict.com/conjugate/perfeccionar', 'https://www.spanishdict.com/conjugate/perdonar', 'https://www.spanishdict.com/conjugate/pegar', 'https://www.spanishdict.com/conjugate/pasear', 'https://www.spanishdict.com/conjugate/ordenar', 'https://www.spanishdict.com/conjugate/ondear', 'https://www.spanishdict.com/conjugate/ojalar', 'https://www.spanishdict.com/conjugate/ocultar', 'https://www.spanishdict.com/conjugate/nombrar',...]
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/407914.html
標籤:
