import requests
from bs4 import BeautifulSoup
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
headers = {"user-agent"/span>: USER_AGENT}。#添加用戶代理。
url ='https://www.meishichina.com/YuanLiao/category/rql'/span>
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.content, "html.parser")
print(soup)
print(soup.text)
回圈所有章節:
lists = ['rql'/span>, 'scl'/span>, 'shucailei'/span>, 'guopinlei', 'mmdr', 'tiaoweipinl', 'yaoshiqita']
for l in list:
url ='https://www.meishichina.com/YuanLiao/category/' l
resp = requests.get(url, headers=headers)
soup = BeautifulSoup(resp.content, "html.parser")
print(soup)
該類別可以在:
中找到。 <h3>
...
</h3>
這些專案可以在:
中找到。<li>
<a href="https://www.meishichina.com/YuanLiao/YaRou/" target="_blank" title="..."/span>>
...
</a>
我怎樣才能提取章節、類別、專案,并將它們保存為資料框架?謝謝。
更新:
for el in soup.find_all('ul') 。
for i in el.find_all('a', href=True) 。
print(list(i.children))
uj5u.com熱心網友回復:
為了獲得所有的章節/類別/專案到一個資料框架中,你可以使用這個例子:
。import requests
import pandas as ps
from bs4 import BeautifulSoup
url = "https://www.meishichina.com/YuanLiao/"/span>
headers = {
"User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:92.0) Gecko/20100101 Firefox/92.0"/span>。
}
soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
sections = [(a.text, a["href"]) for a in soup.select(" .nav_wrap2 li a")]
all_data = []
for s_name, s in sections:
soup = BeautifulSoup(
requests.get(s, headers=headers).content, "html.parser"for cat in soup.select(".category_sub") 。
print(cat.h3.text)
for i in cat.select("li a") 。
print(i.text, i["href"])
all_data.append([s_name, s, cat.h3.text, i.text, i["href"] ])
print("-"/span> * 80)
df = pd.DataFrame(
all_data,
columns=["section", "section_link", "category", " item", " item_link"] 。
)
print(df)
df.to_csv("data.csv", index=False)
印刷品:
--------------------------------------------------------------------------------
section section_link category item item_link
0 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 雞肉 https://www.meishichina.com/YuanLiao/JiRou/
1 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 雞翅 https://www.meishichina.com/YuanLiao/JiChi/
2 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 雞蛋 https://www.meishichina.com/YuanLiao/JiDan/
3 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 牛肉 https://www.meishichina.com/YuanLiao/NiuRou/
4 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 豬肉 https://www.meishichina.com/YuanLiao/ZhuRou/
5 首頁 https://www.meishichina.com/YuanLiao/ 時令與熱門 排骨 https://www.meishichina.com/YuanLiao/PaiGu/
....
并保存data.csv(LibreOffice的截圖):
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/331296.html
標籤:
下一篇:CPath是否進行參考計數?


