問題是我希望這段代碼找到“input.html”檔案中的所有鏈接,但它只找到并顯示第一個鏈接。下面是代碼:
import codecs
from bs4 import BeautifulSoup
fd = codecs.open('input.html', 'r')
def clean(html):
soup = BeautifulSoup(html, "lxml")
for link in soup.find_all('a'):
link.extract()
text = link.get('href')
return text
uj5u.com熱心網友回復:
它可能是:
import codecs
from bs4 import BeautifulSoup
fd = codecs.open('input.html', 'r')
text = []
def clean(html):
soup = BeautifulSoup(html, "lxml")
for link in soup.find_all('a'):
link.extract()
text.append(link.get('href'))
return text
uj5u.com熱心網友回復:
您將在回圈結束時回傳僅迭代一次的文本。做這個:
def clean(html):
soup = BeautifulSoup(html, "lxml")
links = []
for link in soup.find_all('a'):
link.extract()
text = link.get('href')
links.append(text)
return links
此外,您可以使用簡單的串列理解來代替函式:
soup = BeautifulSoup(html, "lxml")
links = [link.extract().get('href') for link in soup.find_all('a')]
uj5u.com熱心網友回復:
您似乎在回圈結束時獲得了一個鏈接。你可以使用這個:
def clean(html):
soup = BeautifulSoup(html, 'html.parser')
hrefs = soup.find_all('a')
links = []
if hrefs:
for href in hrefs:
href.extract()
link = href.get('href')
links.append(link)
return links
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/348471.html
下一篇:在檔案的行之間插入星號
