我想從 imdb 下載一些電影評論,這樣我就不能將這些評論用于我的 LDA 模型。(對于我的學校)
但是評論的默認網站只包含 25 條評論(例如https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv)如果我想要更多,我需要按網站底部的“加載更多”按鈕,這又給了我 25 條評論。
我不知道如何在 python 中自動化,我不能去 * https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv*```/2```或添加范圍?page=2
如何使用python自動遍歷imdb評論站點的頁面?
還有,這是故意弄得這么難嗎?
uj5u.com熱心網友回復:
當我點擊Load More然后DevTools在Crome/Firefox(制表:Network,過濾器:XHR)我看到這樣的鏈接
https://www.imdb.com/title/tt0111161/reviews/_ajax?ref_=undefined&paginationKey=g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh
它有 paginationKey=g4x...
我在 HTML 中看到了類似的東西<div ... data-key="g4x..."- 所以data-key我使用它創建鏈接以獲取下一頁。
示例代碼。
首先,我從普通 URL 獲取 HTML,然后從評論中獲取標題。接下來,我獲取data-key并創建 URL 以獲取新評論。我在回圈中重復它for以獲得 3 頁,但您可以使用while True回圈并重復它,如果仍然存在data-key.
import requests
from bs4 import BeautifulSoup
s = requests.Session()
#s.headers['User-Agent'] = 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:93.0) Gecko/20100101 Firefox/93.0'
# get first/full page
url = 'https://www.imdb.com/title/tt0111161/reviews/?ref_=tt_ql_urv'
r = s.get(url)
soup = BeautifulSoup(r.text, 'html.parser')
items = soup.find_all('a', {'class': 'title'})
for number, title in enumerate(items, 1):
print(number, '>', title.text.strip())
# get next page(s)
for _ in range(3):
div = soup.find('div', {'data-key': True})
print('---', div['data-key'], '---')
url = 'https://www.imdb.com/title/tt0111161/reviews/_ajax'
payload = {
'ref_': 'tt_ql_urv',
'paginationKey': div['data-key']
}
#headers = {'X-Requested-With': 'XMLHttpRequest'}
r = s.get(url, params=payload) #, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
items = soup.find_all('a', {'class': 'title'})
for number, title in enumerate(items, 1):
print(number, '>', title.text.strip())
結果:
1 > Enthralling, fantastic, intriguing, truly remarkable!
2 > "I Had To Go To Prison To Learn To Be A Crook"
3 > Masterpiece
4 > All-time prison film classic
5 > Freeman gives it depth
6 > impressive
7 > Simply a great story that is moving and uplifting
8 > An incredible movie. One that lives with you.
9 > "I'm a convicted murderer who provides sound financial planning".
10 > IMDb and the Greatest Film of All Time
11 > never give up hope
12 > The Shawshank Redemption
13 > Brutal Anti-Bible Bigotry Prevails Again
14 > Time and Pressure.
15 > A classic
16 > An extraordinary and unforgettable film about a bank veep who is convicted of murders and sentenced to the toughest prison
17 > A genre picture, but a satisfying one...
18 > Why it is ranked so highly.
19 > Exceptional
20 > Shawshank Redemption- Prison Film is Redeemed by Quality ****
21 > A Classic Film On Hope And Redemption
22 > Compelling masterpiece
23 > Relentless Storytelling
24 > Some birds aren't meant to be caged.
25 > Good , But It Is Overrated By Some
--- g4xolermtiqhejcxxxgs753i36t52q343mpt34pjada6qpye4w6qtalmfyy7wfxcwfzuwsyh ---
1 > Stephen King's prison tale with a happy ending...
2 > Breaking Big Rocks Into Little Rocks
3 > Over Rated
4 > Terrific stuff!
5 > Highly Overrated But Still Good
6 > Superb
7 > Beautiful movie
8 > Tedious, overlong, with "hope" being the second word spoken in just about every sentence... who cares?
9 > Excellent Stephen King adaptation; flawless Robbins & Freeman
10 > Good for the spirit
11 > Entertaining Prison Movie Isn't Nearly as Good as Its Rabid Fan Base Would Lead You to Believe
12 > Observations...
13 > Why can't they make films like this anymore?
14 > Shawshank Redemption Comes Out Clean
15 > Hope Springs Eternal:Rita Hayworth And The Shawshank Redemption.
16 > Redeeming.
17 > You don't understand! I'm not supposed to be here!
18 > A Story Of Hope & Resilence
19 > Salvation lies within....
20 > Pretty good movie...one of those that you do not really need to watch from beginning to end.
21 > A film of Eloquence
22 > A great film of a helping hand leading to end-around justice
23 > about freedom
24 > Reputation notwithstanding, this is powerful stuff
25 > The best film ever made!
--- g4uorermtiqhejcxxxgs753i36t52q343eod34plapeoqp27z6b2lhdevccn5wyrz2vmgufh ---
1 > A Sort of Secular Redemption
2 > In virus times, we need this hope.
3 > The placement isn't an exaggeration
4 > A true story of friendship and hard times
5 > Escape from Shawshank
6 > Great Story Telling
7 > moving and emotionally impactful(if you liked "The Green Mile" you will like this movie)
8 > Super Good - Highly Recommended
9 > I can see why this is rated Number 1 on IMDb.
# ...
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/341154.html
上一篇:我想從隱藏了api但sendinf表單資料的網站上抓取資料也不起作用
下一篇:洗掉跨度內的內容
