BeautifulSoup在iShares上找不到表-有解無憂

一段時間以來，我一直在嘗試從 iShares.com 上為一個正在進行的專案抓取 ETF 資料。我正在嘗試為多個網站創建網路抓取工具，但它們都是相同的。基本上我遇到了兩個問題：

我不斷收到錯誤訊息：“AttributeError: 'NoneType' object has no attribute 'tr'”雖然我很確定我選擇了正確的表。
當我查看某些網站上的“檢查元素”時，我必須單擊“顯示更多”才能查看所有行的代碼。

我不是計算機科學家，但我嘗試了許多不同的方法，但遺憾的是都沒有成功，所以我希望你能提供幫助。

網址：https : //www.ishares.com/uk/individual/en/products/251382/ishares-msci-world-minimum-volatility-ucits-etf

該表格可在“Holdings”下的 URL 中找到。或者，可以在以下路徑下找到它： JS Path: <document.querySelector("#allHoldingsTable > tbody")> xPath: //*[@id="allHoldingsTable"]/tbody

代碼：

import requests
import pandas as pd
from bs4 import BeautifulSoup


urls = [
'https://www.ishares.com/uk/individual/en/products/251382/ishares-msci-world-minimum-volatility-ucits-etf'
]

all_data = []
for url in urls:
    print("Loading URL {}".format(url))

    # load the page into soup:
    soup = BeautifulSoup(requests.get(url).content, "html.parser")

    # find correct table:
    tbl = soup.select_one(".allHoldingsTable")

    # remove the first row (it's not header):
    tbl.tr.extract()

    # convert the html to pandas DF:
    df = pd.read_html(str(tbl),thousands='.', decimal=',')[0]

    # move the first row to header:
    df.columns = map(lambda x: str(x).replace("*", "").strip(), df.loc[0])
    df = df.loc[1:].reset_index(drop=True).rename(columns={"nan": "Name"})

    df["Company"] = soup.h1.text.split("\n")[0].strip()
    df["URL"] = url
    all_data.append(df.loc[:, ~df.isna().all()])

df = pd.concat(all_data, ignore_index=True)
print(df)


from openpyxl import load_workbook
path= '/Users/karlemilthulstrup/Downloads/ishares.xlsx'
book = load_workbook(path ,read_only = False, keep_vba=True)
writer = pd.ExcelWriter(path, engine='openpyxl')
writer.book = book
df.to_excel(writer, index=False)
writer.save()
writer.close()

uj5u.com熱心網友回復：

如評論中所述，資料是動態呈現的。如果您不想直接訪問資料，您可以使用 Selenium 之類的東西，這將允許頁面呈現，然后您可以按照上面的方式進入那里。

此外，還有一個按鈕可以為您將其下載到 csv 中。為什么不這樣做呢？

但是如果你必須抓取頁面，你會得到 json 格式的資料。只需決議它：

import requests
import json
import pandas as pd

url = 'https://www.ishares.com/uk/individual/en/products/251382/ishares-msci-world-minimum-volatility-ucits-etf/1506575576011.ajax?tab=all&fileType=json'
r = requests.get(url)
r.encoding='utf-8-sig'
jsonData = json.loads(r.text)


rows = []
for each in jsonData['aaData']:
    row = {'Issuer Ticker':each[0],
     'Name':each[1],
     'Sector':each[2],
     'Asset Class':each[3],
     'Market Value':each[4]['display'],
     'Market Value Raw':each[4]['raw'],
     'Weight (%)':each[5]['display'],
     'Weight (%) Raw':each[5]['raw'],
     'Notaional Value':each[6]['display'],
     'Notaional Value Raw':each[6]['raw'],
     'Nominal':each[7]['display'],
     'Nominal Raw':each[7]['raw'],
     'ISIN':each[8],
     'Price':each[9]['display'],
     'Price Raw':each[9]['raw'],
     'Location':each[10],
     'Exchange':each[11],
     'Market Currency':each[12]}
     
    rows.append(row)
     
df = pd.DataFrame(rows)

輸出：

print(df)
    Issuer Ticker  ... Market Currency
0              VZ  ...             USD
1             ROG  ...             CHF
2            NESN  ...             CHF
3              WM  ...             USD
4             PEP  ...             USD
..            ...  ...             ...
309          ESH2  ...             USD
310          TUH2  ...             USD
311           JPY  ...             USD
312    MARGIN_JPY  ...             JPY
313    MARGIN_SGD  ...             SGD

[314 rows x 18 columns]

轉載請註明出處，本文鏈接：https://www.uj5u.com/caozuo/407899.html

標籤：

上一篇：如何從網站上抓取測驗問題？

下一篇：Scrapy不會跟隨下一頁它給出一個錯誤