我試圖用每小時的能源價格抓取一個網頁。我想將這些資料用于家庭自動化。如果每小時價格 =< 基本負載價格,則應通過 Mqtt 打開某些時間。我設法從基本負載價格和其列中的每小時價格中獲取資料。該列的輸出似乎不在一個串列中,而是在 24 個串列中。正確的?如何解決這個問題,以便可以將每小時價格與基本負載價格進行比較?
import datetime
import pytz
import requests
from bs4 import BeautifulSoup as bs
today_utc = pytz.utc.localize(datetime.datetime.utcnow())
today = today_utc.astimezone(pytz.timezone("Europe/Amsterdam"))
text_today = today.strftime("%y-%m-%d")
print(today)
print(text_today)
yesterday = datetime.datetime.now(tz=pytz.timezone("Europe/Amsterdam")) - datetime.timedelta(1)
text_yesterday = yesterday.strftime("%y-%m-%d")
print(yesterday)
print(text_yesterday)
url_part1 = 'https://www.epexspot.com/en/market-data?market_area=NL&trading_date='
url_part2 = '&delivery_date='
url_part3 = '&underlying_year=&modality=Auction&sub_modality=DayAhead&technology=&product=60&data_mode=table&period=&production_period='
url_text = url_part1 text_yesterday url_part2 text_today url_part3
print(url_text)
html_text = requests.get(url_text).text
#print(html_text)
soup = bs(html_text,'lxml')
#print(soup.prettify())
baseload = soup.find_all('div', class_='flex day-1')
for baseload_price in baseload:
baseload_price = baseload_price.find('span').text.replace(' ', '')
print(baseload_price)
table = soup.find_all('tr',{'class':"child"})
#print(table)
for columns in table:
column3 = columns.find_all('td')[3:]
#print(columns)
column3_text = [td.text.strip() for td in column3]
column3_text = column3_text
print(column3_text)
uj5u.com熱心網友回復:
你只需要使用加入:
column3_text = "".join([td.text.strip() for td in column3])
uj5u.com熱心網友回復:
在 for 回圈for columns in table中,您正在創建一個新串列column3_text。如果您打算將 column3 文本作為未來 24 小時的串列,則可以將此 for 回圈替換為:
column3_text = [column.find_all("td")[3].text.strip() for column in table]
此外,如果您要將基本負載價格與每小時價格進行比較,您需要將字串轉換為浮點數或小數。:)
uj5u.com熱心網友回復:
如果要比較值,請使用pandas.
就是這樣:
import datetime
import urllib.parse
import pandas as pd
import requests
from bs4 import BeautifulSoup
headers = {
"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:104.0) Gecko/20100101 Firefox/104.0",
}
today = datetime.datetime.today().strftime("%Y-%m-%d")
yesterday = (
datetime.datetime.today() - datetime.timedelta(days=1)
).strftime("%Y-%m-%d")
url = "https://www.epexspot.com/en/market-data?"
data = {
"market_area": "NL",
"trading_date": yesterday,
"delivery_date": today,
"underlying_year": "",
"modality": "Auction",
"sub_modality": "DayAhead",
"technology": "",
"product": "60",
"data_mode": "table",
"period": "",
"production_period": "",
}
query_url = f"{url}{urllib.parse.urlencode(data)}"
with requests.Session() as s:
s.headers.update(headers)
response = s.get(query_url).text
baseload = (
BeautifulSoup(response, "html.parser")
.select_one(".day-1 > span:nth-child(1)")
.text
)
print(f"Baselaod: {baseload}")
df = pd.concat(pd.read_html(response, flavor="lxml"), ignore_index=True)
df.columns = range(df.shape[1])
df = df.drop(df.columns[[4, 5, 6, 7]], axis=1)
df['is_higher'] = df[[3]].apply(lambda x: (x >= float(baseload)), axis=1)
df['price_diff'] = df[[3]].apply(lambda x: (x - float(baseload)), axis=1)
df = df.set_axis(
[
"buy_volume",
"sell_volume",
"volume",
"price",
"is_higher",
"price_diff",
],
axis=1,
copy=False,
)
df.insert(
0,
"hours",
[
f"0{value}:00 - {value 1}:00" if value < 10
else f"{value}:00 - {value 1}:00"
for value in range(0, 24)
],
)
print(df)
輸出:
Baselaod: 144.32
hours buy_volume sell_volume ... price is_higher price_diff
0 00:00 - 1:00 2052.2 3608.7 ... 124.47 False -19.85
1 01:00 - 2:00 2467.8 3408.9 ... 119.09 False -25.23
2 02:00 - 3:00 2536.8 3220.5 ... 116.32 False -28.00
3 03:00 - 4:00 2552.0 3206.5 ... 114.60 False -29.72
4 04:00 - 5:00 2524.4 3010.0 ... 115.07 False -29.25
5 05:00 - 6:00 2542.4 3342.7 ... 123.54 False -20.78
6 06:00 - 7:00 2891.2 3872.2 ... 145.42 True 1.10
7 07:00 - 8:00 3413.2 3811.0 ... 166.40 True 22.08
8 08:00 - 9:00 3399.4 3566.0 ... 168.00 True 23.68
9 09:00 - 10:00 2919.3 3159.4 ... 153.30 True 8.98
10 10:00 - 11:00 2680.2 3611.5 ... 143.35 False -0.97
11 11:00 - 12:00 2646.8 3722.3 ... 141.95 False -2.37
12 12:00 - 13:00 2606.4 3723.3 ... 141.96 False -2.36
13 13:00 - 14:00 2559.7 3232.3 ... 145.96 True 1.64
14 14:00 - 15:00 2544.9 3261.2 ... 155.00 True 10.68
15 15:00 - 16:00 2661.7 3428.0 ... 169.15 True 24.83
16 16:00 - 17:00 3072.2 3529.4 ... 173.36 True 29.04
17 17:00 - 18:00 3593.7 3091.4 ... 192.00 True 47.68
18 18:00 - 19:00 3169.0 3255.4 ... 182.86 True 38.54
19 19:00 - 20:00 2710.1 3630.3 ... 167.96 True 23.64
20 20:00 - 21:00 2896.3 3728.8 ... 147.17 True 2.85
21 21:00 - 22:00 3160.3 3639.2 ... 136.78 False -7.54
22 22:00 - 23:00 3506.2 3196.3 ... 119.90 False -24.42
23 23:00 - 24:00 3343.8 3414.1 ... 100.00 False -44.32
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/523629.html
上一篇:Selenium在頁面上找到元素,但在另一個頁面上沒有
下一篇:在h3標簽中抓取HTML網站
