我正在嘗試使用 Selenium 獲取一些 DOM 元素,并且我正在使用 Java 執行所有這些操作,但是在嘗試時出現此錯誤:
Exception in thread "main" org.openqa.selenium.StaleElementReferenceException: stale element reference: element is not attached to the page document
我仍然是這一切的新手,但我用來檢索 DOM 元素的代碼是:
driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");
我相信錯誤是它找不到給定的 XPath,盡管這個 xpath 存在。任何幫助,將不勝感激。
謝謝你。
uj5u.com熱心網友回復:
有一個
href屬性具有 pdf URL,但可以URL在網頁內打開 pdf。所以我
URL從href屬性中提取了 pdf 并從中獲取了 pdf 名稱,然后與https://www.qp.alberta.ca/documents/Acts/URL連接。
您可以撰寫如下代碼來獲取 pdf URL。
獲取PDF網址的代碼:
driver = new ChromeDriver();
/*I hard coded below URL. You need parameterize based on your requirement.*/
driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");
System.out.println("Page PDF URL: " pagePdfUrl);
String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&");
driver.get("https://www.qp.alberta.ca/documents/Acts/" pdfName ".pdf");
下載代碼PDF:
所需的 ChromOptions:
ChromeOptions options = new ChromeOptions();
HashMap<String, Object> chromeOptionsMap = new HashMap<String, Object>();
chromeOptionsMap.put("plugins.plugins_disabled", new String[] { "Chrome PDF Viewer" });
chromeOptionsMap.put("plugins.always_open_pdf_externally", true);
chromeOptionsMap.put("download.default_directory", "C:\\Users\\Downloads\\test\\");
options.setExperimentalOption("prefs", chromeOptionsMap);
options.addArguments("--headless");
訪問PDF:
driver = new ChromeDriver(options);
driver.get("https://www.qp.alberta.ca/570.cfm?frm_isbn=9780779808571&search_by=link");
String pagePdfUrl = driver.findElement(By.xpath("//img[@alt='View PDF']//..//parent::a")).getAttribute("href");
System.out.println("Page PDF URL: " pagePdfUrl);
String pdfName = StringUtils.substringBetween(pagePdfUrl, "page=", ".cfm&");
System.out.println("Only PDF URL: " "https://www.qp.alberta.ca/documents/Acts/" pdfName ".pdf");
driver.get("https://www.qp.alberta.ca/documents/Acts/" pdfName ".pdf");
輸出:
Page PDF URL: https://www.qp.alberta.ca/1266.cfm?page=2017ch18_unpr.cfm&leg_type=Acts&isbncln=9780779808571
Only PDF URL: https://www.qp.alberta.ca/documents/Acts/2017ch18_unpr.pdf
匯入StringUtils:
import org.apache.commons.lang3.StringUtils;
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/318137.html
標籤:javascript 爪哇 硒 网页抓取 dom
上一篇:你好。為什么不應用“不顯示”
