我有一個記錄了 2005-2014 年每一天的最高和最低溫度的資料集。我想創建一個圖表,在其中繪制此期間一年中每一天的最高和最低溫度(因此每天繪制的最高和最低溫度只有一個)。我能夠從每天的絕對最小值和最大值的資料集中創建一個 df,這是最大值的示例:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('data/C2A2_data/BinnedCsvs_d400/fb441e62df2d58994928907a91895ec62c2c42e6cd075c2700843b89.csv')
# splitting 2005-2014 df dates into separate columns for easier analysis
weather_05_14['Year'] = weather_05_14['Date'].dt.strftime('%Y')
weather_05_14['Month'] = weather_05_14['Date'].dt.strftime('%m')
weather_05_14['Day'] = weather_05_14['Date'].dt.strftime('%d')
# extracting the min and max temperatures for each day, regardless of year
max_temps = weather_05_14.loc[weather_05_14.groupby(['Day', 'Month'], sort=False)
['Data_Value'].idxmax()][['Data_Value', 'Date']]
max_temps.rename(columns={'Data_Value': 'Max'}, inplace=True)
這是資料框的樣子:

現在這就是我的問題所在。我想根據月/日將這些資料繪制在折線圖中,不考慮年份,所以它是有序的。我的想法是,我可以通過將每個資料點的年份更改為相同來做到這一點(因為無論如何它不會是最終圖表中的資料),這就是我為實作這一目標所做的事情:
max_temps['Date'] = max_temps['Date'].apply(lambda x: x.replace(year=2005)
但我收到了這個錯誤:
ValueError: day is out of range for month

I have also tried to take my separate Day, Month, Year columns that I used to group by, include those with the max_temps df, change the year, and then move those all to a new column and convert them to a datetime object, but I get a similar error
max_temps['Year'] = 2005

max_temps['New Date'] = pd.to_datetime[max_temps[['Year', 'Month', 'Day']])
Error: ValueError: cannot assemble the datetimes: day is out of range for month
I have also tried to ignore this issue and then plot with the pandas plot function like:
max_temps.plot(x=['Month', 'Day'], y=['Max'])
Which does work but then I don't get the full functionality of matplotlib (as far as I can tell anyway, I'm new to these libraries).
It gives me this graph:

This is close to the result I'm looking for, but I'd like to use matplotlib to do it.
I feel like I'm making the problem harder than it needs to be but I don't know how. If anyone has any advice or suggestions I would greatly appreciate it, thanks!
uj5u.com熱心網友回復:
正如@Jody Klymak 指出的那樣,原因max_temps['Date'] = max_temps['Date'].apply(lambda x: x.replace(year=2005)不起作用是因為在您的完整資料集中,可能有閏年并且包括第 29 年。這意味著當您嘗試將年份設定為 2005 年時,pandas 正在嘗試創建2005-02-29將拋出
ValueError: day is out of range for month. 您可以通過選擇 2004 年而不是 2005 年來解決此問題。
我的解決方案是完全忽略年份,并創建一個包含“01-01”格式的月份和日期的新列。由于月份排在第一位,因此無論年份如何,所有這些字串都保證按時間順序排列。
這是一個例子:
import pandas as pd
import matplotlib.pyplot as plt
max_temps = pd.DataFrame({
'Max': [15.6,13.9,13.3,10.6,12.8,18.9,21.7],
'Date': ['2005-01-01','2005-01-02','2005-01-03','2007-01-04','2007-01-05','2008-01-06','2008-01-07']
})
max_temps['Date'] = pd.to_datetime(max_temps['Date'])
## use string formatting to create a new column with Month-Day
max_temps['Month_Day'] = max_temps['Date'].dt.strftime('%m') "-" max_temps['Date'].dt.strftime('%d')
plt.plot(max_temps['Month_Day'], max_temps['Max'])
plt.show()

轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/412595.html
標籤:
