如何從公司簡介框中獲得所有的"href"頁面？ -有解無憂

誰能告訴我哪里有問題，我是Python的新手，我想從這個頁面獲得所有的鏈接，這是我的代碼。輸入請求 from bs4 import BeautifulSoup import pandas as pd

re=requests.get('https://www.industrystock.com/en/companies/Agriculture')
re
soup = BeautifulSoup(re.text, 'lxml')
link_list = []
page1 = soup.find_all('a'/span>, class_ = 'btn awe-info gotoJS iconColor_white'/span>)
page1
for i in page1:
        link = (i.get('href'))
        link_list.append(link)

uj5u.com熱心網友回復：

公司簡介的鏈接被存盤在data-href=屬性中：

import requests
from bs4 import BeautifulSoup

r = requests.get("https://www.industrystock.com/en/companies/Agriculture"/span>)
soup = BeautifulSoup(r.content, "lxml")
page1 = soup.find_all("a"/span>, class_="btn awe-info gotoJS iconColor_white"/span>)
for i in page1:
    print(i["data-href"/span>])

列印：

https://www.industrystock.com/en/company/profile/ARCA-Internet-Services-Ltd./370071
https://www.industrystock.com/en/company/profile/Забайкальская-аграрная-Ассоциация-образовательных-и-научных-учреждений/256182
https://www.industrystock.com/en/company/profile/...Vá?-INTERIéR-s.r.o./534809
https://www.industrystock.com/en/company/profile/1-WITOS-s.r.o./529071
https://www.industrystock.com/en/company/profile/1.-TOU?E?SKá-s.r.o./544981
https://www.industrystock.com/en/company/profile/1.HEFAISTOS-s.r.o./541263
https://www.industrystock.com/en/company/profile/1.HRADECKá-ZEMěDěLSKá-a.s./548267
https://www.industrystock.com/en/company/profile/1.MAXIMA-INTERNATIONAL-s.r.o./530049
https://www.industrystock.com/en/company/profile/1.MIROSLAVSKá-STROJíRNA-spol.-s-r.o./544781
https://www.industrystock.com/en/company/profile/1.VASTO-spol.-s-r.o./535985
https://www.industrystock.com/en/company/profile/1C-PRO-s.r.o./534831
https://www.industrystock.com/en/company/profile/1CSC-a.s./528169
https://www.industrystock.com/en/company/profile/1P-CONTROL/549995
https://www.industrystock.com/en/company/profile/2-ES-spol.-s-r.o./547849
https://www.industrystock.com/en/company/profile/2-G-SERVIS-spol.-s-r.o./528391
https://www.industrystock.com/en/company/profile/2-JCP-a.s./537151
https://www.industrystock.com/en/company/profile/2-THETA-ASE-s.r.o./545079
https://www.industrystock.com/en/company/profile/2LMAKERS-s.r.o./542127
https://www.industrystock.com/en/company/profile/2M-SERVIS-s.r.o./550923
https://www.industrystock.com/en/company/profile/2M-STATIC-s.r.o./549935
https://www.industrystock.com/en/company/profile/2M-STROJE-s.r.o./539885
https://www.industrystock.com/en/company/profile/2TMOTORS-s.r.o./543869
https://www.industrystock.com/en/company/profile/2VV-s.r.o./538993
https://www.industrystock.com/en/company/profile/2xSERVIS-s.r.o./528321
https://www.industrystock.com/en/company/profile/3-PLUS-1-SERVICE-s.r.o./535103
https://www.industrystock.com/en/company/profile/3-TOOLING-s.r.o./540599
https://www.industrystock.com/en/company/profile/3B-SOCIáLNí-FIRMA-s.r.o./535127
https://www.industrystock.com/en/company/profile/3D-KOVáRNA-s.r.o./549765
https://www.industrystock.com/en/company/profile/3D-TECH-spol.-s-r.o./548047
https://www.industrystock.com/en/company/profile/3DNC-SYSTEMS-s.r.o./549379

uj5u.com熱心網友回復：

試一下：

response = requests.get('https://www.industrystock.com/en/companies/Agriculture')
soup = BeautifulSoup(response.text, 'lxml')
link_list = []
page1 = soup.find_all('a', {"class":'btn awe-info gotoJS iconColor_white'})
for i in page1:
        link = i['href']
        link_list.append(link)

而且我還建議使用html.parser，如果你不是在搜刮XML。

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/320234.html

標籤：

上一篇：使用powerhell在IE上保存檔案，繞過下載彈出視窗

下一篇：Scrapy：當文本后面有一些強的時候，提取li中沒有類的文本