我有一個 HTML 表格:
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
我需要捕獲/獲取屬性 4 值...
for item in response.css('div.parameters'):
name = item.xpath('//div[text()[contains(.,"property 4")]]/following::div[1]/text()').get()
但它不起作用,錯誤在哪里?
uj5u.com熱心網友回復:
//div[contains(.,"property 4")]/./div//text()
上面的 xpath 運算式將向上一級,從該級別將選擇以下所有 div,這意味著輸出是property 4 value
最終的 xpath 運算式:
' '.join(response.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
通過scrapy shell證明:
In [1]: from scrapy.selector import Selector
In [2]: %paste
html ='''
<div class="parameters">
<div class="property">property 1</div>
<div class="value">value 1</div>
</div>
<div class="parameters">
<div class="property">property 2</div>
<div class="value">value 2</div>
</div>
<div class="parameters">
<div class="property">property 3</div>
<div class="value">value 3</div>
</div>
<div class="parameters">
<div class="property">property 4</div>
<div class="value">value</div>
</div>
'''
## -- End pasted text --
In [3]: sel = Selector(text=html)
In [4]:
...: ' '.join(sel.xpath('//div[contains(.,"property 4")]/./div//text()').getall())
Out[4]: 'property 4 value'
uj5u.com熱心網友回復:
嘗試:
from lxml import etree as ET
xml_doc = """
<root>
<div >
<div >property 1</div>
<div >value 1</div>
</div>
<div >
<div >property 2</div>
<div >value 2</div>
</div>
<div >
<div >property 3</div>
<div >value 3</div>
</div>
<div >
<div >property 4</div>
<div >value 4</div>
</div>
</root>
"""
parsed = ET.fromstring(xml_doc)
properties = parsed.xpath('//div[contains(@class, "property")]')
values = parsed.xpath('//div[contains(@class, "value")]')
out = {p.text: v.text for p, v in zip(properties, values)}
print(out["property 4"])
印刷:
value 4
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/478735.html
