強文本 我正在嘗試從 page1,2,3... 在此頁面https://myanimelist.net/topanime.php?limit=0(而不是 page=1, page=2 等)獲取資訊就像限制 = 0,限制 = 50,限制 = 100 ...)。問題是,當代碼回圈遍歷我想要的頁數時,它會從所有頁面獲取資訊,但只會將最后一個頁面的資訊保存在新的 csv 檔案中。
定義主要(數字):
driver = webdriver.Chrome()
url = 'https://myanimelist.net/topanime.php?limit={}'
if number <= 1:
return url.format(0)
elif number >= 2:
for limit in range(0,(int(number)*50), 50):
driver.get(url.format(limit))
soup = BeautifulSoup(driver.page_source, 'html.parser')
results = soup.find_all('tr', class_= 'ranking-list')
with open('MAL_topanime.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
header = ['Anime','Date', 'No_eps', 'Ranking', 'Score']
writer.writerow(header)
for result in results:
Anime = result.find('h3', class_='hoverinfo_trigger fl-l fs14 fw-b anime_ranking_h3').text.replace('\n','')
Date = result.find('div', class_='information di-ib mt4').text.replace('\n','')
No_eps = result.find('div', class_='information di-ib mt4').text.replace('\n','')
Ranking = result.find('td', class_='rank ac').text.replace('\n','')
Score = result.find('div', class_='js-top-ranking-score-col di-ib al').text.replace('\n','')
info = [Anime, Date, No_eps, Ranking,Score]
writer.writerow(info)
uj5u.com熱心網友回復:
你可以試試下一個例子
import requests
from bs4 import BeautifulSoup
import pandas as pd
data = []
for limit in range(0,150,50):
r = requests.get(f'https://myanimelist.net/topanime.php?limit={limit}')
soup = BeautifulSoup(r.content, 'html.parser')
results = soup.find_all('tr', class_= 'ranking-list')
for result in results:
Anime = result.find('h3', class_='hoverinfo_trigger fl-l fs14 fw-b anime_ranking_h3').text.replace('\n','')
Date = result.find('div', class_='information di-ib mt4').text.replace('\n','')
No_eps = result.find('div', class_='information di-ib mt4').text.replace('\n','')
Ranking = result.find('td', class_='rank ac').text.replace('\n','')
Score = result.find('div', class_='js-top-ranking-score-col di-ib al').text.replace('\n','')
data.append({
'Anime':Anime,
'Date':Date,
'No_eps':No_eps,
'Ranking':Ranking,
'Score':Score
})
df = pd.DataFrame(data)
print(df)
輸出:
Anime ... Score
0 Fullmetal Alchemist: Brotherhood ... 9.12
1 Bleach: Sennen Kessen-hen ... 9.11
2 Kaguya-sama wa Kokurasetai: Ultra Romantic ... 9.10
3 Gintama° ... 9.08
4 Steins;Gate ... 9.08
.. ... ... ...
145 Mushishi Zoku Shou: Odoro no Michi ... 8.44
146 Saenai Heroine no Sodatekata Fine ... 8.44
147 Wu Liuqi Zhi Xuanwu Guo Pian ... 8.44
148 JoJo no Kimyou na Bouken Part 3: Stardust Crus... ... 8.44
149 Gintama: Yorinuki Gintama-san on Theater 2D ... 8.43
[150 rows x 5 columns]
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/516783.html
標籤:熊猫数据框网页抓取
上一篇:如何根據列值遞回分配組?
