美好的一天,伙計們。我的任務是從該站點收集人員的姓名和電子郵件: https ://www.espeakers.com/s/nsas/search?available_on=&awards&budget=0,10&bureau_id=304&distance=1000&fee=false&items_per_page=3701&language=en&location=&norecord =false&nt=0&page=0&presenter_type=&q=[]&require&review=false&sort=speakername&video=false&virtual=false
我使用 selenium 和 python 來抓取它,但是我在訪問人們的 url 時遇到了問題。人卡樣本結構為:
<div class="col-xs-12 col-sm-6 col-md-4 col-lg-3">
<div class="speaker-tile" id="sid12026">
<div class="speaker-thumb" style='background-image: url("https://streamer.espeakers.com/assets/6/12026/159445.jpg"); background-size: contain;'>
<div class="row">
<div class="col-xs-8 text-left">
</div>
<div class="col-xs-4 text-right speaker-top-actions">
<i class="fa fa-ellipsis-h fa-fw">
</i>
</div>
</div>
</div>
<div class="speaker-details">
<div class="speaker-name">
Alex Aanderud
</div>
<div class="row" style="margin-top: 15px;">
<div class="col-xs-12 col-sm-12">
<div class="speaker-location">
<i class="fa fa-map-marker mp-tertiary-background">
</i>
AZ
<span>
,
</span>
US
</div>
</div>
<div class="col-sm-6 col-xs-12">
<div class="speaker-awards">
</div>
</div>
</div>
<div class="speaker-oneline text-left">
<p>
</p>
<div>
Certified Trainer of Advanced Integrative Psychology and Certified John Maxwell Speaker, Trainer, Coach, will transform your organization and improve your results.
</div>
</div>
<div class="speaker-assets">
<div class="row">
</div>
</div>
<div class="speaker-actions">
<div class="row">
<div class="text-center col-xs-12">
<div class="btn btn-flat mp-primary btn-block">
<span class="hidden-xs hidden-sm">
View Profile
</span>
<span class="visible-xs visible-sm">
Profile
</span>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
當你點擊
<span class="hidden-xs hidden-sm">
View Profile
</span>
它會將您帶到包含人員資訊的頁面,我可以在其中訪問它。我如何使用硒來做到這一點,或者還有其他可以幫助我的解決方案。謝謝!
uj5u.com熱心網友回復:
如果您注意到,所有的個人資料 url 的形式都是
https://www.espeakers.com/s/nsas/profile/id
其中id是一個 5 位數字,例如 27397。因此您只需提取 id 并將其與基本 url 連接即可獲得組態檔 url。
url = 'https://www.espeakers.com/s/nsas/profile/'
profile_urls = [url el.get_attribute('id')[3:] for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-tile')]
names = [el.text for el in driver.find_elements(By.CSS_SELECTOR, '.speaker-name')]
names是一個包含所有名稱urls的串列,是一個包含相應組態檔 url 的串列
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/478013.html
上一篇:Selenium螢屏截圖被截斷
