我從這個網站上抓取了一些關于標準普爾 500 指數股票的資訊:https : //www.slickcharts.com/sp500。實際的網路抓取位作業正常,就好像我在包含的 for 回圈之后添加了一個列印陳述句,顯示所有資料。換句話說,代碼:
# Web-scraped S&P 500 data for 500 US stocks.
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})
for row in table1.find_all('tr'):
all_td_tags = row.find_all('td')
if len(all_td_tags) > 0:
company = all_td_tags[1].text
symbol = all_td_tags[2].text
weight = all_td_tags[3].text
price = all_td_tags[4].text
chg = all_td_tags[5].text
perChg = all_td_tags[6].text
print(company, '|', symbol, '|', weight, '|', price, '|', chg, '|', perChg)
輸出:
Apple Inc. | AAPL | 6.866056 | 176.34 | 0.06 | (0.03%)
Microsoft Corporation | MSFT | 6.279809 | 334.50 | -0.19 | (-0.06%)
Amazon.com Inc. | AMZN | 3.729209 | 3,418.46 | -2.91 | (-0.09%)
Alphabet Inc. Class A | GOOGL | 2.208863 | 2,938.00 | -0.33 | (-0.01%)
Tesla Inc | TSLA | 2.169114 | 1,069.30 | 2.30 | (0.22%)
Alphabet Inc. Class C | GOOG | 2.056323 | 2,942.00 | -0.85 | (-0.03%)
Meta Platforms Inc. Class A | FB | 1.982391 | 336.00 | 0.76 | (0.23%)
NVIDIA Corporation | NVDA | 1.851853 | 295.60 | -0.80 | (-0.27%)
...
但是,在撰寫此代碼時,使用 DataFrame(我想使用它以便我可以搜索特定股票的資料,例如我輸入“AAPL”并獲得股票的價格、重量等):
# Web-scraped S&P 500 data for 500 US stocks.
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})
for row in table1.find_all('tr'):
all_td_tags = row.find_all('td')
if len(all_td_tags) > 0:
company = all_td_tags[1].text
symbol = all_td_tags[2].text
weight = all_td_tags[3].text
price = all_td_tags[4].text
chg = all_td_tags[5].text
perChg = all_td_tags[6].text
df = pd.DataFrame({'Company': [company], 'Symbol': [symbol], 'Weight': [weight], 'Price': [price], 'Change': [chg], 'Percent_Change': [perChg]})
print(df.head())
我只得到一只股票的資訊,當我應該得到整個表格時:
Company Symbol Weight Price Change Percent_Change
0 News Corporation Class B NWS 0.006948 22.75 0.20 (0.89%)
我對 DataFrame 做錯了什么,以至于它只顯示那只股票(顯示的股票恰好是表格中的最后一只)?
更新
我替換了df像這樣的定義:
# Web-scraped S&P 500 data for 500 US stocks.
import requests
import pandas as pd
from bs4 import BeautifulSoup
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36 Edg/96.0.1054.62'}
url = 'https://www.slickcharts.com/sp500' # Data from SlickCharts
page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')
table1 = soup.find('table', attrs={'class':'table table-hover table-borderless table-sm'})
for row in table1.find_all('tr'):
all_td_tags = row.find_all('td')
if len(all_td_tags) > 0:
company = all_td_tags[1].text
symbol = all_td_tags[2].text
weight = all_td_tags[3].text
price = all_td_tags[4].text
chg = all_td_tags[5].text
perChg = all_td_tags[6].text
# print(company, '|', symbol, '|', weight, '|', price, '|', chg, '|', perChg)
df = pd.read_html(str(table1))[0]
print(df)
但是我的輸出看起來像這樣:
# Company Symbol Weight Price Chg % Chg
0 1 Apple Inc. AAPL 6.866056 176.34 0.06 (0.03%)
1 2 Microsoft Corporation MSFT 6.279809 334.50 -0.19 (-0.06%)
2 3 Amazon.com Inc. AMZN 3.729209 3418.46 -2.91 (-0.09%)
3 4 Alphabet Inc. Class A GOOGL 2.208863 2938.00 -0.33 (-0.01%)
4 5 Tesla Inc TSLA 2.169114 1069.30 2.30 (0.22%)
.. ... ... ... ... ... ... ...
500 501 Discovery Inc. Class A DISCA 0.009951 24.25 -0.17 (-0.70%)
501 502 Under Armour Inc. Class A UAA 0.009792 20.62 0.00 (0.00%)
502 503 Gap Inc. GPS 0.008945 17.28 0.00 (0.00%)
503 504 Under Armour Inc. Class C UA 0.008667 17.55 0.00 (0.00%)
504 505 News Corporation Class B NWS 0.006948 22.75 0.20 (0.89%)
如何使第二列數字消失?
uj5u.com熱心網友回復:
由于您在每次迭代中不斷重新分配company、symbol、weight等,因此這些變數僅保存您決議的最后一行的值。
你可以pd.read_html改用。它回傳一個資料幀串列,<table> HTML 片段中的每個標簽對應一個。您只找到了一張表格,soup.find因此它是元素 #0:
df = pd.read_html(str(table1))[0]
輸出:
# Company Symbol Weight Price Chg % Chg
1 Apple Inc. AAPL 6.866056 176.34 0.06 (0.03%)
2 Microsoft Corporation MSFT 6.279809 334.50 -0.19 (-0.06%)
3 Amazon.com Inc. AMZN 3.729209 3418.46 -2.91 (-0.09%)
4 Alphabet Inc. Class A GOOGL 2.208863 2938.00 -0.33 (-0.01%)
5 Tesla Inc TSLA 2.169114 1069.30 2.30 (0.22%)
...
根據需要修剪和重命名框架。
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/392630.html
