4G 頻段 (A)	4G 頻段 (B)	5G 頻段 (A)	5G 頻段 (B)	5G 頻段 (C)
1、2、3 - A2643	1, 2, 3 - A2484	1、2、3 - A2643	1, 2, 3 - A2484	1、2、3 - A2641

如何從以下 html 表中獲取上述輸出？

<table cellspacing="0">

    <tr class="tr-hover">
    <th rowspan="15" scope="row">Network</th>
    <td class="ttl"><a href="network-bands.php3">Technology</a></td>
    <td class="nfo"><a href="#" class="link-network-detail" data-spec="nettech">GSM / CDMA / HSPA / EVDO / LTE / 5G</a></td>
    </tr>

    <tr class="tr-toggle">
    <td class="ttl"><a href="network-bands.php3">2G bands</a></td>
    <td class="nfo" data-spec="net2g">GSM 850 / 900 / 1800 / 1900 - SIM 1 & SIM 2 (dual-SIM)</td>
    </tr>
    
    <tr class="tr-toggle" data-spec-optional>
    <td class="ttl">&nbsp;</td>
    <td class="nfo">CDMA 800 / 1900 </td>
    </tr>

    <tr class="tr-toggle">
    <td class="ttl"><a href="network-bands.php3">3G bands</a></td>
    <td class="nfo" data-spec="net3g">HSDPA 850 / 900 / 1700(AWS) / 1900 / 2100 </td>
    </tr>

    <tr class="tr-toggle" data-spec-optional>
    <td class="ttl">&nbsp;</td>
    <td class="nfo">CDMA2000 1xEV-DO </td>
    </tr>

    <tr class="tr-toggle">
    <td class="ttl"><a href="network-bands.php3">4G bands</a></td>
    <td class="nfo" data-spec="net4g">1, 2, 3, 4, 5, 7, 8, 12, 13, 17, 18, 19, 20, 25, 26, 28, 30, 32, 34, 38, 39, 40, 41, 42, 46, 48, 66 - A2643, A2644, A2645</td>
    </tr>

    <tr class="tr-toggle" data-spec-optional>
    <td class="ttl">&nbsp;</td>
    <td class="nfo">1, 2, 3, 4, 5, 7, 8, 11, 12, 13, 14, 17, 18, 19, 20, 21, 25, 26, 28, 29, 30, 32, 34, 38, 39, 40, 41, 42, 46, 48, 66, 71 - A2484, A2641</td>
    </tr>

    <tr class="tr-toggle">
    <td class="ttl"><a href="network-bands.php3">5G bands</a></td>
    <td class="nfo" data-spec="net5g">1, 2, 3, 5, 7, 8, 12, 20, 25, 28, 30, 38, 40, 41, 48, 66, 77, 78, 79 SA/NSA/Sub6 - A2643, A2644</td>
    </tr>

    <tr class="tr-toggle" data-spec-optional>
    <td class="ttl">&nbsp;</td>
    <td class="nfo">1, 2, 3, 5, 7, 8, 12, 20, 25, 28, 29, 30, 38, 40, 41, 48, 66, 71, 78, 79, 258, 260, 261 SA/NSA/Sub6/mmWave - A2484</td>
    </tr>

    <tr class="tr-toggle" data-spec-optional>
    <td class="ttl">&nbsp;</td>
    <td class="nfo">1, 2, 3, 5, 7, 8, 12, 20, 25, 28, 29, 30, 38, 40, 41, 48, 66, 71, 77, 78, 79 SA/NSA/Sub6 - A2641</td>
    </tr>

    <tr class="tr-toggle">
    <td class="ttl"><a href="glossary.php3?term=3g">Speed</a></td>
    <td class="nfo" data-spec="speed">HSPA 42.2/5.76 Mbps, LTE-A, 5G, EV-DO Rev.A 3.1 Mbps</td>
    </tr>

</table>

我的問題是這行： <tr data-spec-optional>。它們不受任何層次結構的約束。

我開發了一個蜘蛛，它成功地爬取了在線門戶的整個結構并收集了必要的資訊。只有表中的這些可選行會給我帶來問題。

這種方法給了我希望。但我沒有成功實施它。如何選擇和提取兩個元素之間的文本？

任何形式的幫助都會非常有幫助！

編輯

我必須使用物品。使用@SuperUser 的解決方案時，出現以下錯誤：

TypeError：“str”物件不支持專案分配

uj5u.com熱心網友回復：

老實說，我確信這不是最好的方法，但它確實有效。如果我會想到另一種方式，那么我將編輯答案。

In [1]: html="""<html>
   ...: <body>
   ...: <table cellspacing="0">
   ...: 
   ...:    <tr >
   ...:     <td ><a href="network-bands.php3">4G bands</a></td>
   ...:     <td  data-spec="net4g">1, 2, 3 - A2643</td>
   ...:     </tr>
   ...: 
   ...:     <tr  data-spec-optional>
   ...:     <td >&nbsp;</td>
   ...:     <td >1, 2, 3 - A2484</td>
   ...:     </tr>
   ...: 
   ...:     <tr >
   ...:     <td ><a href="network-bands.php3">5G bands</a></td>
   ...:     <td  data-spec="net5g">1, 2, 3 - A2643</td>
   ...:     </tr>
   ...: 
   ...:     <tr  data-spec-optional>
   ...:     <td >&nbsp;</td>
   ...:     <td >1, 2, 3 - A2484</td>
   ...:     </tr>
   ...: 
   ...:     <tr  data-spec-optional>
   ...:     <td >&nbsp;</td>
   ...:     <td >1, 2, 3 - A2641</td>
   ...:     </tr>
   ...:     
   ...: </table>
   ...: </body>
   ...: </html>"""

In [2]: from scrapy import Selector

In [3]: response = Selector(text=html)

In [4]: for row in response.xpath('//tr[@]'):
   ...:     if row.xpath('.//a'):
   ...:         ch = 'A'
   ...:         title1 = row.xpath('.//a/text()').get()
   ...:     else:
   ...:         ch = chr(ord(ch) 1)
   ...:     title = title1   f' ({ch})'
   ...:     data = row.xpath('.//td[@]/text()').get()
   ...:     print(f'\"{title}\" : \"{data}\"')
   ...:
"4G bands (A)" : "1, 2, 3 - A2643"
"4G bands (B)" : "1, 2, 3 - A2484"
"5G bands (A)" : "1, 2, 3 - A2643"
"5G bands (B)" : "1, 2, 3 - A2484"
"5G bands (C)" : "1, 2, 3 - A2641"

uj5u.com熱心網友回復：

這個答案的靈感來自@SuperUser 的使用序數值來增加計數的答案。您可以使用 xpath 獲取標頭和值，然后將它們組合起來形成最終字典

import re

titles = response.xpath("//tr[@class='tr-toggle']/td[@class='ttl']/descendant-or-self::*/text()").getall()

for i, title in enumerate(titles):
    if title.strip() == '':
        prev_letter = re.search(r"\((.)\)$", titles[i-1]).group(1)
        new_title = f"{titles[i-1][:-4]} ({chr(ord(prev_letter)   1)})"
        titles[i] = new_title
    else:
        titles[i] = f"{title} (A)"

values = response.xpath("//tr[@class='tr-toggle']/td[@class='nfo']/text()").getall()

results = {}
for title, value in zip(titles, values):
    results[title] = value

當您列印結果字典時，您將獲得以下內容

{'4G bands (A)': '1, 2, 3 - A2643',
 '4G bands (B)': '1, 2, 3 - A2484',
 '5G bands (A)': '1, 2, 3 - A2643',
 '5G bands (B)': '1, 2, 3 - A2484',
 '5G bands (C)': '1, 2, 3 - A2641'}

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/421566.html

標籤：

上一篇：無法使用Python中的請求從站點獲取資料

下一篇：Selenium（python）：檢索錨的href和文本

我怎樣才能用scrapy正確刮掉這個html表？

編輯