我有以下代碼
# Import libraries
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.ipma.pt/pt/otempo/obs.superficie/table-top-stations-all.jsp'
page = requests.get(url)
soup = BeautifulSoup(page.text, 'lxml')
# Get the content for tab_Co id
temp_table = soup.find('table', id='tab_Co')
# Create Headers
headers = []
for i in temp_table.find_all('th'):
title = i.text
headers.append(title)
# Create DataFrame with the headers as columns
mydata = pd.DataFrame(columns = headers)
# This is where the script goes wrong
# Create loop that retrieves information and appends it to the DataFrame
for j in table1.find_all('tr')[1:]:
row_data = j.find_all('td')
row = [i.text for i in row_data]
length = len(mydata)
mydata.loc[length] = row
我究竟做錯了什么?最終目的是有一個資料框,我可以在其中提取每列的前 4 個值
'Temperatura Max (oC)',
'Temperatura Min (oC)',
'Prec. acumulada (mm)',
'Rajada máxima (km/h)',
'Humidade Max (%)',
'Humidade Min (%)',
'Press?o atm. (hPa)']
然后使用這些來生成每日影像。 有任何想法嗎?先感謝您!
免責宣告:這是一個非盈利專案,不會對該解決方案進行商業用途。
uj5u.com熱心網友回復:
所以這很有效,基于Falsovsky 在 GitHub上的這個解決方案
# Import libraries
import requests
import pandas as pd
import regex
# Define target URL
url = 'https://www.ipma.pt/pt/otempo/obs.superficie/table-top-stations-all.jsp'
# Get URL information
page = requests.get(url)
# After inspecting the page apply a regex search
search = re.search('var observations = (.*?);',page.text,re.DOTALL);
# Create dict by loading the json information
json_data = json.loads(search.group(1))
# Create Dataframe from json result
df1 = pd.concat({k: pd.DataFrame(v).T for k, v in json_data.items()}, axis=0)
uj5u.com熱心網友回復:
從源頭view-source:https://www.ipma.pt/pt/otempo/obs.superficie/table-top-stations-all.jsp上,很明顯資料在th屬性中,所以試著用row_data = j.find_all('th')
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/455152.html
上一篇:嘗試基于xpath從亞馬遜獲取價格但ifelse陳述句無法正常作業
下一篇:通過抓取從網頁中提取單個URL
