使用pandas.read_html()將多個表格的內容添加到csv檔案中失敗。 -有解無憂

我試圖從這個網頁中抓取表格內容，并使用pandas.read_html()將其寫入一個csv檔案中。里面有兩個具有相同選擇器的表table.table--overflow[aria-label^='Financials']，我希望能把它們全部抓出來。我目前的實作可以列印兩個表的內容，但只把最后一個表寫到csv檔案中。

import requests
import pandas as pd
from bs4 import BeautifulSoup

link = 'https://www.marketwatch.com/investing/stock/mbin/financials/balance-sheet'/span>

def get_tabular_content（s,link）。
    res = s.get(link)
    soup = BeautifulSoup(res.text,"lxml"/span>)
    for selector in soup.select("table.table--overflow[aria-label^='Financials']") 。
        df = pd.read_html(str（selector）)[0]
        df.to_csv('marketwatch.csv', header=True, index=False)
        print（df）。

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36'
    get_tabular_content(s,link)

如何使用pandas.read_html()將多個表格的內容添加到csv檔案中？

uj5u.com熱心網友回復：

停止重寫輸出檔案--使用一個唯一的名字。這樣你會得到多個輸出檔案--每一個都代表頁面的HTML表格。

如果你想有一個包含所有HTML表的資料的csv，就把df添加到list_of_df，在回圈完成后呼叫frame = pd.concat(list_of_df, axis=0, ignore_index=True)

list_of_df = [] 。
for selector in soup.select("table.table--overflow[aria-label^='Financials']"）。)
    df = pd.read_html(str（selector）)[0]
    list_of_df.append(df)  


frame = pd.concat(list_of_df, axis=0, ignore_index=True)
frame.to_csv('marketwatch.csv', header=True, index=False)

輸出（'marketwatch.csv'）- 75條記錄

Item 專案,2016, 2017, 2018,2019,2020,5年趨勢
現金及應付款項總額來自銀行 現金及應付款項總額來自銀行,10. 04M,18.91M,25.86M,13.91M,10.06M。
現金及應付款項來自銀行增長現金及應付款項來自銀行增長,-,88. 37%,36.76%,-46.20%,-27.65%。
投資-總額 投資-總額,1.69B,1.92B,1.67B, 3.19B, 3.95B。
...
平均總股本回報率 平均總股本回報率,-,-,-,24.66%。
累積的少數股東權益 累積的少數股東權益,-,-,-,-,
總股本 總股本,206.29M,367.47M,421.24M,653.73M,810.62M,
負債及股東' 權益 負債及股東'權益,2. 72B,3.39B,3.88B,6.37B,9.65B。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/320264.html

標籤：

上一篇：在沒有RSelenium的情況下，在R中搜刮框架？

下一篇：不能在RegSetValue中設定cbData