python3webscraping-回圈只回傳一次迭代-有解無憂

python3 網頁抓取）我正在嘗試從 html 資料中提取表格并將其存盤到一個新的資料幀中。我需要所有 'td' 值，但是當我嘗試迭代時，回圈只回傳第一行，而不是所有行。下面是我的代碼和輸出

!pip install yfinance
!pip install pandas
!pip install requests
!pip install bs4
!pip install plotly

import yfinance as yf
import pandas as pd
import requests
from bs4 import BeautifulSoup
import plotly.graph_objects as go
from plotly.subplots import make_subplots

def make_graph(stock_data, revenue_data, stock):
 fig = make_subplots(rows=2, cols=1, shared_xaxes=True, subplot_titles=("Historical Share Price", "Historical Revenue"), vertical_spacing = .3)
 stock_data_specific = stock_data[stock_data.Date <= '2021--06-14']
 revenue_data_specific = revenue_data[revenue_data.Date <= '2021-04-30']
 fig.add_trace(go.Scatter(x=pd.to_datetime(stock_data_specific.Date, infer_datetime_format=True), y=stock_data_specific.Close.astype("float"), name="Share Price"), row=1, col=1)
 fig.add_trace(go.Scatter(x=pd.to_datetime(revenue_data_specific.Date, infer_datetime_format=True), y=revenue_data_specific.Revenue.astype("float"), name="Revenue"), row=2, col=1)
 fig.update_xaxes(title_text="Date", row=1, col=1)
 fig.update_xaxes(title_text="Date", row=2, col=1)
 fig.update_yaxes(title_text="Price ($US)", row=1, col=1)
 fig.update_yaxes(title_text="Revenue ($US Millions)", row=2, col=1)
 fig.update_layout(showlegend=False,
 height=900,
 title=stock,
 xaxis_rangeslider_visible=True)
 fig.show()

tsla = yf.Ticker("TSLA")
tsla

tesla_data = tsla.history(period="max")
tesla_data


tesla_data.reset_index(inplace=True)
tesla_data.head()

url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text


soup = BeautifulSoup(html_data, 'html.parser')

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
 col = row.find_all("td")
 date = col[0].text
 revenue = col[1].text
tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

	日期	收入
0	2008年	15$

uj5u.com熱心網友回復：

怎么了？

它作業正常，但您將資料附加到回圈之外，因此您始終會獲得上次迭代的結果。

怎么修？

修復縮進并將附加部分放入回圈中

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

例子

from bs4 import BeautifulSoup
import requests
import pandas as pd

url = "https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue"
html_data  = requests.get(url).text

soup = BeautifulSoup(html_data, 'html.parser')

tesla_revenue = pd.DataFrame(columns=["Date", "Revenue"])
for row in soup.find("tbody").find_all('tr'): 
    col = row.find_all("td")
    date = col[0].text
    revenue = col[1].text
    tesla_revenue = tesla_revenue.append({"Date":date, "Revenue":revenue}, ignore_index=True)
tesla_revenue

輸出

	日期	收入
0	2020年	31,536 美元
1	2019年	24,578 美元
2	2018年	21,461 美元
3	2017年	11,759 美元
4	2016年	7,000 美元
5	2015年	4,046 美元
6	2014年	3,198 美元
...	...	...

uj5u.com熱心網友回復：

使用適當的類和標簽查找主表

res=requests.get("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")

soup=BeautifulSoup(res.text,"html.parser")
teable=soup.find("table",class_="historical_data_table table")
main_data=table.find_all("tr")

現在將資料附加到串列并創建串列資料串列，以便為 DataFrame 創建行資料

main_lst=[]
for i in main_data[1:]:
    lst=[data.get_text(strip=True) for data in i.find_all("td")]
    main_lst.append(lst)

現在使用該資料顯示為 df

import pandas as pd
df=pd.DataFrame(columns=["Date","Price"],data=main_lst)
df

輸出：

    Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759
...

在一個班輪中使用 pandas

df=pd.read_html("https://www.macrotrends.net/stocks/charts/TSLA/tesla/revenue")
print(len(df))
print(df[0])

輸出

6

    Date    Price
0   2020    $31,536
1   2019    $24,578
2   2018    $21,461
3   2017    $11,759

...

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/378863.html

標籤：Python 蟒蛇-3.x for循环网页抓取美汤

上一篇：Pythonselenium-無法捕獲和發送密鑰到輸入框

下一篇：使用R在iframe中抓取資料HTML表