import requests
from bs4 import BeautifulSoup
url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = soup.findAll('h3')
print(items)
我得到這個結論:
[<h3 id="contents">CONTENTS</h3>, <h3 id="basics">1. Creating a Web Page</h3>, <h3 id="syntax">2. HTML Syntax</h3>......
我怎樣才能得到這個輸出?
[內容,1. 創建網頁,2. HTML 語法...
uj5u.com熱心網友回復:
如果您正在尋找 h3 標簽內的文本串列,您可以遍歷所有 h3 標簽并僅保存文本。
import requests
from bs4 import BeautifulSoup
url = 'http://www.columbia.edu/~fdc/sample.html'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
items = [h3.text for h3 in soup.findAll('h3')]
print(items)
輸出:
['CONTENTS', '1. Creating a Web Page', '2. HTML Syntax', '3. Special Characters', '4. Converting Plain Text to HTML', '5. Effects', '6. Lists', '7. Links', '8. Tables', '9. Viewing Your Web Page', '10. Installing Your Web Page on the Internet', '11. Where to go from here', '12. Postscript: Cell Phones']
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/523630.html
標籤:Python网页抓取
