最近生活對我來說很艱難。我已經嘗試使用下面的一些 HTML 代碼來抓取一個網站( https://osf.io/preprints/discover?subject=bepress|Social and Behavioral Sciences )并嘗試了多種方法來獲取它作業。ember499我需要一直在底部的 div ID 。這個 div 是包含整個網站的那個,如果我無法訪問它,我就無法抓取任何東西。主體標簽中有 4 個 div,, MaxJax_Message,ember-bootstrap-wormhole如下所示:ember-basic-dropdown-wormholeember499
<body class="ember-application">
<div id="MathJax_Message" style="display: none;"></div>
<noscript>
<p>
For full functionality of this site it is necessary to enable JavaScript.
Here are
<a href='https://www.enable-javascript.com/' target='_blank'> instructions for enabling JavaScript in your web browser</a>.
</p>
</noscript>
<script> window.prerenderReady = false; </script>
<script src="//cdnjs.cloudflare.com/ajax/libs/jquery/2.2.4/jquery.min.js"></script>
<script src="//cdnjs.cloudflare.com/ajax/libs/ember.js/2.18.0/ember.min.js"></script>
<script>
(function() {
var encodedConfig = document.head.querySelector("meta[name$='/config/environment']").content;
var config = JSON.parse(unescape(encodedConfig));
var assetSuffix = config.ASSET_SUFFIX ? '-' config.ASSET_SUFFIX : '';
var origin = window.location.origin;
window.isProviderDomain = !~config.OSF.url.indexOf(origin);
var prefix = '/' (window.isProviderDomain ? '' : 'preprints/') 'assets/';
[
'vendor',
'preprint-service'
].forEach(function (name) {
var script = document.createElement('script');
script.src = prefix name assetSuffix '.js';
script.async = false;
document.body.appendChild(script);
var link = document.createElement('link');
link.rel = 'stylesheet';
link.href = prefix name assetSuffix '.css';
document.head.appendChild(link);
});
})();
</script><script src="/preprints/assets/vendor-f46d275519d6cf7078493fc4564ccd3c7dc419ed.js"></script><script src="/preprints/assets/preprint-service-f46d275519d6cf7078493fc4564ccd3c7dc419ed.js"></script>
<script src="https://cdn.ravenjs.com/3.22.1/ember/raven.min.js"></script>
<script>
var encodedConfig = document.head.querySelector("meta[name$='/config/environment']").content;
var config = JSON.parse(unescape(encodedConfig));
if (config.sentryDSN) {
Raven.config(config.sentryDSN, config.sentryOptions || {}).install();
}
</script>
<div id="ember-basic-dropdown-wormhole"></div>
<div id="ember-bootstrap-wormhole"></div>
<div id="ember499" class="ember-view">
<!---->
<div id="ember538" class="__new-osf-navbar__b7554 ember-view"><div class="osf-nav-wrapper">
<nav id="navbarScope" role="navigation" class="navbar navbar-inverse navbar-fixed-top">
<div class="container">
<div class="navbar-header">
<a href="/" aria-label="Go home" class="navbar-brand">
<span class="osf-navbar-logo"></span>
</a>
<div class="service-name">
<a href="https://osf.io/preprints/">
<span class="hidden-xs"> OSF </span>
我嘗試列印包含在主體中的所有 div,例如:
wormhole = driver.find_element(By.CLASS_NAME, 'ember-application')
divs = wormhole.find_elements(By.TAG_NAME, 'div')
我嘗試通過XPATH, ID, 或多或少找到所有內容。當我列印每個div['MathJax_Message', 'ember-basic-dropdown-wormhole', 'ember-bootstrap-wormhole']
的ID 并將其附加到串列中時,我得到:
len(divs)它達到了div[3],這通常不會發生在其他網站上:
OUTPUT
MathJax_Message
ember-basic-dropdown-wormhole
ember-bootstrap-wormhole
Process finished with exit code 0
我嘗試滾動到頁面中間以防它被隱藏,找出上面的 3 個 div 中的每一個中的內容,直接轉到我想要的類名,使用find_elements(By.XPATH, '//*'). 它們要么只回傳上面提到的相同的 3 個 div,要么說“找不到元素”。我想不出還有什么可以做/嘗試的。
請指導我Stack Gods。
uj5u.com熱心網友回復:
您需要提供delaytime.sleep()
driver.get('https://osf.io/preprints/discover?subject=bepress|Social and Behavioral Sciences')
time.sleep(5)
print(len(driver.find_elements(By.CSS_SELECTOR,".ember-application div")))
for ele in driver.find_elements(By.CSS_SELECTOR,".ember-application div"):
print(ele.text)
輸出:
275
OSF PREPRINTS
Add a Preprint
Search
Support
Donate
Sign Up Sign In
Preprint Archive Search
powered by
Search
2,365,609 searchable as of November 01, 2022
Partner Repositories
Previous
Next
Sort by: Relevance
Active Filters:
Clear filters
Social and Behavioral Sciences
Refine your search by
Providers
OSF Preprints (50,302)
AfricArXiv (394)
AgriXiv (426)
Arabixiv (328)
arXiv (1,324,846)
BioHackrXiv (29)
bioRxiv (48,109)
BodoArXiv (110)
Cogprints (283)
CoP (1)
EarthArXiv (1,755)
EcoEvoRxiv (940)
ECSarXiv (241)
EdArXiv (1,088)
engrXiv (2,088)
FocUS Archive (52)
Frenxiv (148)
INA-Rxiv (16,605)
IndiaRxiv (148)
LawArXiv (1,374)
LIS Scholarship Archive (310)
MarXiv (456)
MediArXiv (201)
MetaArXiv (457)
MindRxiv (286)
NutriXiv (84)
PaleorXiv (219)
PeerJ (4,747)
Preprints.org (21,880)
PsyArXiv (25,458)
RePEc (848,335)
SocArXiv (11,629)
SportRxiv (386)
Thesis Commons (1,857)
Subject
Architecture
Arts and Humanities
Business
Education
Engineering
Law
Life Sciences
Medicine and Health Sciences
Physical Sciences and Mathematics
Social and Behavioral Sciences
Do you want to add your own research as a preprint?
Add a preprint
The Corporate Social Responsibility is just a twist in a M\"obius Strip
Solferino, NazariaSolferino, Viviana
Last edited: Oct 13, 2015 UTC
Finance Social and Behavioral Sciences Economics
In recent years economics agents and systems have became more and more interacting and juxtaposed, therefore the social sciences need to rely on the studies of physical sciences to analyze this complexity in the relationships. According to this point of view we rely on the geometrical model of the M ...
arXiv
Some suggestions on dealing with measurement error in linkage analyses
Marko BachlMichael Scharkow
Last edited: Jul 2, 2018 UTC
Social and Behavioral Sciences Communication
Linkage analysis is a sophisticated media effect research design that reconstructs the likely exposure to relevant media messages of individual survey respondents by complementing the survey data with a content analysis. It is an important improvement over survey-only designs: Instead of predicting ...
OSF Preprints
Imagined Interdependence: Manipulating Discourse Changes How People Construe Interdependence
Ji?í Münich..so on
uj5u.com熱心網友回復:
首先,您嘗試抓取的 divid每次加載頁面時都會發生變化。所以要抓住你必須使用的特定 div XPATH(如果你愿意,你也可以使用任何其他選擇器,但我更喜歡XPATH)。
請在下面找到代碼,如果您有任何疑問,請告訴我。
driver.get("https://osf.io/preprints/discover?subject=bepress|Social and Behavioral Sciences")
ember = wait.until(EC.presence_of_element_located(
(By.XPATH, '//*[@]/div[contains(@id,"ember") and @]')))
print(ember.get_attribute('innerHTML'))
time.sleep(2)
driver.quit()
您還應該使用顯式等待來獲取所需的 HTML 元素。time.sleep()不建議這樣做,因為它可能只是暫時停止該程序而沒有意識到該元素是否出現。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/526073.html
