如何在python中決議本地html檔案時跳過第一個表，并跳過第二個表頭？-有解無憂

我正在嘗試決議本地 html 檔案，我不知道為什么相同的代碼在示例 html 文本和整個 html 檔案之間會產生不同的結果。任何人都可以幫忙嗎？對此，我真的非常感激。示例 html 文本：

s = '''
<table width=90%>
   <tr>
      <td align="center" width=18%></td>
      <td align="left" width=15%></td>
   </tr>
</table>
<table border>
   <tr>
      <td nowrap="nowrap"><b>Rec</b></td>
      <td align="RIGHT" nowrap="nowrap"><b>ID</b></td>
   </tr>
   <tr>
      <td align="RIGHT" nowrap="nowrap" VALIGN=TOP>1</td>
      <td nowrap="nowrap" VALIGN=TOP><a href="smthing?DID=ID">ID<br />100100</a></td>
   </tr>
</table>
<p>
<style type="text/css">
 .....
</style>
<table border>
   <tr>
      <td nowrap="nowrap"><b>Rec</b></td>
      <td align="RIGHT" nowrap="nowrap"><b>ID</b></td>
   </tr>
   <tr>
      <td align="RIGHT" nowrap="nowrap" VALIGN=TOP>2</td>
      <td nowrap="nowrap" VALIGN=TOP><a href="smthing?DID=ID">ID<br />101101</a></td>
   </tr>
</table>
 '''

我試過以下：

''''

# with open('myfile.html', 'r', encoding='utf-8') as f: # when use the whole file 
# s = f.read() # when use the whole file 
    soup = BeautifulSoup(s, "html.parser")
    tables = [
        [
            [td.get_text(strip=True) for td in tr.find_all('td')]
            for tr in table.find_all('tr')
        ]
        for table in soup.find_all('table')
    ]
    table_data = [i.text for i in soup.find_all('td')]

    print(table_data)

'''' 預期輸出：

Rec   ID
1     ID100100
2     ID101101

當前輸出為：

['', '', 'Rec', 'ID', '1', 'ID100100', 'Rec', 'ID', '2', 'ID101101']

另外，當我用整個 HTML 檔案實作相同的代碼時，結果如下所示，我在這里錯過了什么嗎：

'', '</tr>', '', '</table>', '', '</table>', '', '</center>', '', '<hr />', '', '<center>', '',

uj5u.com熱心網友回復：

您可以應用串列切片

from bs4 import BeautifulSoup

s = '''
<table width=90%>
   <tr>
      <td align="center" width=18%></td>
      <td align="left" width=15%></td>
   </tr>
</table>
<table border>
   <tr>
      <td nowrap="nowrap"><b>Rec</b></td>
      <td align="RIGHT" nowrap="nowrap"><b>ID</b></td>
   </tr>
   <tr>
      <td align="RIGHT" nowrap="nowrap" VALIGN=TOP>1</td>
      <td nowrap="nowrap" VALIGN=TOP><a href="smthing?DID=ID">ID<br />100100</a></td>
   </tr>
</table>
<p>
<style type="text/css">
 .....
</style>
<table border>
   <tr>
      <td nowrap="nowrap"><b>Rec</b></td>
      <td align="RIGHT" nowrap="nowrap"><b>ID</b></td>
   </tr>
   <tr>
      <td align="RIGHT" nowrap="nowrap" VALIGN=TOP>2</td>
      <td nowrap="nowrap" VALIGN=TOP><a href="smthing?DID=ID">ID<br />101101</a></td>
   </tr>
</table>
 '''


 
soup = BeautifulSoup(s, "html.parser")
  
table = soup.find_all('table')[2]
#print(len(table))

data=[]
table_data = [i.text for i in soup.find_all('td')]

rec=table_data[-3]
num_1= table_data[-5]
num_2= table_data[-1]

data.append([rec,num_1,num_2])
print(data)

輸出：

[['ID', 'ID100100', 'ID101101']]

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/443612.html

標籤：Python html 熊猫解析美丽的汤

上一篇：多列的可視化

下一篇：如何在Python/pandas中完成這個分組程序？