我正在重新訪問網路抓取以嘗試開發一種可以從資料庫中提取資料的工具。
在這里,我使用的是在以下位置找到的物質檔案:
因此,對于此示例,整個所需的輸出將跨越 3 列(雖然只是想提取資料,但現在在 3 列中并不重要):
Workers - Hazard via inhalation route, DNEL (Derived No Effect Level), 238 mg/m3
Workers - Hazard via dermal route, DNEL (Derived No Effect Level), 84 mg/kg bw/day
General Population - Hazard via inhalation route, DNEL (Derived No Effect Level), 70 mg/m3
General Population - Hazard via dermal route, DNEL (Derived No Effect Level), 51 mg/kg bw/day
General Population - Hazard via oral route, DNEL (Derived No Effect Level), 24 mg/kg bw/day
我遇到的問題是我使用類元素“HorDL”來獲取此資訊,但不幸的是,此類不僅限于每條路線以藍色突出顯示的部分。因此, ("HorDL")(0) 可以找到,但 ("HorDL")(1) 會立即為同一路線提取下面的資訊。
出于這個原因,我懷疑使用這個類元素來提取資訊并不是最好的方法,但是我想不出任何其他方法來做到這一點。
我已經有辦法拉出相關的檔案,所以如果這行得通,它將成為一個只提取相關資訊的 Neat 工具。我考慮過拉取所有資訊,然后在 excel 中應用過濾器,但我認為這不是一個特別優雅的解決方案。
非常感謝任何回應。
uj5u.com熱心網友回復:
這假設您只想要帶有關鍵字的 DNEL,Workers并且General Population在標題中,并且在其中排除 DNELHazard for the eyes
注意:你應該宣告你所有的變數,Option Explicit在你的模塊頂部插入以幫助你執行它。
Option Explicit
Public Sub GetContents()
Const DNELTitle As Long = 1
Const DNELAssessment As Long = 2
Const DNELValue As Long = 3
Const resultFirstCell As String = "A1" 'Change the first cell address to insert the result accordingly
Dim ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1") 'Change worksheet name accordingly
'Start ECHA Search via XML HTTP Request
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/registration-dossier/-/registered-dossier/16016/7/1", False
XMLReq.send
If XMLReq.Status = 200 Then
HTMLDoc.body.innerHTML = XMLReq.responseText
'==== Loop through each anchors and get the relevant ID for interested DNEL
Dim anchors As Object
Set anchors = HTMLDoc.getElementById("SectionAnchors")
Set anchors = anchors.getElementsByTagName("a")
Dim anchorsColl As Collection
Set anchorsColl = New Collection
Dim i As Long
For i = 0 To anchors.Length - 1
Dim anchorText As String
anchorText = anchors(i).innerText
If InStr(anchorText, "Workers - ") <> 0 Or _
InStr(anchorText, "General Population - ") <> 0 Then
If InStr(anchorText, "Additional Information") = 0 And _
InStr(anchorText, "Hazard for the eyes") = 0 Then
anchorsColl.Add Replace(anchors(i).href, "about:blank#", vbNullString)
End If
End If
Next i
'====
If anchorsColl.Count <> 0 Then
Dim outputArr() As String
ReDim outputArr(1 To anchorsColl.Count, 1 To 3) As String
For i = 1 To anchorsColl.Count
Dim anchorEle As Object
Set anchorEle = HTMLDoc.getElementById(anchorsColl(i))
outputArr(i, DNELTitle) = anchorEle.innerText
'Loop through the anchor's sibling until it finds the DL tag to extract the values
Do While anchorEle.nodeName <> "DL"
Set anchorEle = anchorEle.NextSibling
Loop
'Assumes that the assessment conclusion is in the first DD tag
'Assumes that the value is in the second DD tag
outputArr(i, DNELAssessment) = anchorEle.getElementsByTagName("dd")(0).innerText
outputArr(i, DNELValue) = anchorEle.getElementsByTagName("dd")(1).innerText
Next i
'Write the extraction result to the worksheet starting from A1
ws.Range(resultFirstCell).Resize(UBound(outputArr, 1), 3).Value = outputArr
Else
Debug.Print "No DNEL found."
End If
Set ws = Nothing
Set HTMLDoc = Nothing
Else
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
End If
Set XMLReq = Nothing
End Sub
uj5u.com熱心網友回復:
到目前為止我自己的答案。
接下來,我將回圈遍歷值串列以回傳每個值的 DNEL。還需要包括某種錯誤處理。
Sub GetData()
'Start ECHA Search via XML HTTP Request
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
XMLReq.Open "Get", "https://echa.europa.eu/registration-dossier/-/registered-dossier/16016/7/1", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLReq.Status & " - " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
'Retrieve Data for General population
'Defines class element for each route
Dim Route(1 To 3) As String
Route(1) = "sGeneralPopulationHazardViaInhalationRoute"
Route(2) = "sGeneralPopulationHazardViaDermalRoute"
Route(3) = "sGeneralPopulationHazardViaOralRoute"
'Loops through each element
r = 4
c = 6
Dim i As Long
For i = 1 To UBound(Route, 1)
Set Info = HTMLDoc.getElementById(Route(i))
Debug.Print Info.innerText
Set Info = HTMLDoc.getElementById(Route(i)).NextSibling.NextSibling.NextSibling
Set Data = Info.getElementsByTagName("dd")(0)
Debug.Print Data.innerText
Set Data = Info.getElementsByTagName("dd")(1)
Debug.Print Data.innerText
Cells(r, c) = Data.innerText
c = c 1
Next i
r = r 1
End Sub
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/374118.html
上一篇:ExcelJs動態超鏈接不路由
