python3 網頁抓取)我正在嘗試從 html 資料中提取表格并將其存盤到一個新的資料幀中。我需要所有 'td' 值,但是當我嘗試迭代時,回圈只回傳第一行,而不是所有行。下面是我的代碼和輸出
!pip install yfinance
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly
import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots
def make_graph(stock_data, revenue_data, stock):
fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
fig.update_xaxes(title_text="Date", row=1, col=1)
fig.update_xaxes(title_text="Date", row=2, col=1)
fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
fig.update_layout(showlegend=False,
height=900,
title=stock,
xaxis_rangeslider_visible=True)
fig.show()
tsla = yf.Ticker("TSLA")
tsla
tesla_data = tsla.history(period="max")
tesla_data
tesla_data.reset_index(inplace=True)
tesla_data.head()
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'):
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue
| 日期 | 收入 | |
|---|---|---|
| 0 | 2008年 | 15$ |
uj5u.com熱心網友回復:
怎么了?
它作業正常,但您將資料附加到回圈之外,因此您始終會獲得上次迭代的結果。
怎么修?
修復縮進并將附加部分放入回圈中
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'):
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue
例子
from bs4 import BeautifulSoup
import requests
import pandas as pd
url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data = requests.get(url).text
soup = BeautifulSoup(html_data, 'html.parser')
tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'):
col = row.find_all("td")
date = col[0].text
revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue
輸出
| 日期 | 收入 | |
|---|---|---|
| 0 | 2020年 | 31,536 美元 |
| 1 | 2019年 | 24,578 美元 |
| 2 | 2018年 | 21,461 美元 |
| 3 | 2017年 | 11,759 美元 |
| 4 | 2016年 | 7,000 美元 |
| 5 | 2015年 | 4,046 美元 |
| 6 | 2014年 | 3,198 美元 |
| ... | ... | ... |
uj5u.com熱心網友回復:
使用適當的類和標簽查找主表
res=requests.get("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
soup=BeautifulSoup(res.text,"html.parser")
teable=soup.find("table",class_="historical_data_table table")
main_data=table.find_all("tr")
現在將資料附加到串列并創建串列資料串列,以便為 DataFrame 創建行資料
main_lst=[]
for i in main_data[1:]:
lst=[data.get_text(strip=True) for data in i.find_all("td")]
main_lst.append(lst)
現在使用該資料顯示為 df
import pandas as pd
df=pd.DataFrame(columns=["Date","Price"],data=main_lst)
df
輸出:
Date Price
0 2020 $31,536
1 2019 $24,578
2 2018 $21,461
3 2017 $11,759
...
在一個班輪中使用 pandas
df=pd.read_html("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
print(len(df))
print(df[0])
輸出
6
Date Price
0 2020 $31,536
1 2019 $24,578
2 2018 $21,461
3 2017 $11,759
...
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/378863.html
標籤:Python 蟒蛇-3.x for循环 网页抓取 美汤
