從df獲取一個月的第一天和最后一天-有解無憂

這就是我的資料框的樣子：

datetime      open      high     low       close    
2006-01-02    4566.95   4601.35  4542.00   4556.25
2006-01-03    4531.45   4605.45  4531.45   4600.25  
2006-01-04    4619.55   4707.60  4616.05   4694.14
.
.
.

需要以百分比計算每月回報

Formula: (Month Closing Price - Month Open Price) / Month Open Price

我似乎無法獲得一個月的開盤價和收盤價，因為在我的 df 中，大多數月份都沒有該月 1 日的日志。所以計算起來有困難。

任何幫助將不勝感激！

uj5u.com熱心網友回復：

您需要使用groupbyandagg函式來獲取每個月每列的第一個和最后一個值：

import pandas as pd
df = pd.read_csv("dt.txt")
df["datetime"] = pd.to_datetime(df["datetime"])
df.set_index("datetime", inplace=True)
resultDf = df.groupby([df.index.year, df.index.month]).agg(["first", "last"])
resultDf["new_column"] = (resultDf[("close", "last")] - resultDf[("open", "first")])/resultDf[("open", "first")]
resultDf.index.rename(["year", "month"], inplace=True)
resultDf.reset_index(inplace=True)
resultDf

上面的代碼將生成一個具有多索引列的資料框。因此，例如，如果您想獲取 2010 年的行，您可以執行以下操作：

resultDf[resultDf["year"] == 2010]

uj5u.com熱心網友回復：

您可以創建自定義分組器，如下所示：

import pandas as pd
import numpy as np
from io import StringIO

csvfile = StringIO(
"""datetime\topen\thigh\tlow\tclose
2006-01-02\t4566.95\t4601.35\t4542.00\t4556.25
2006-01-03\t4531.45\t4605.45\t4531.45\t4600.25  
2006-01-04\t4619.55\t4707.60\t4616.05\t4694.14""")

df = pd.read_csv(csvfile, sep = '\t', engine='python')

df.datetime = pd.to_datetime(df.datetime, format = "%Y-%m-%d")

dg = df.groupby(pd.Grouper(key='datetime', axis=0, freq='M'))

然后每組 dg 按月份分開，并且由于我們將 datetime 轉換為 pandas.datetime 我們可以對其使用經典算術：

def monthly_return(datetime, close_value, open_value):
    index_start = np.argmin(datetime)
    index_end = np.argmax(datetime)
    return (close_value[index_end] - open_value[index_start]) / open_value[index_start]

dg.apply(lambda x : monthly_return(x.datetime, x.close, x.open))
Out[97]: 
datetime
2006-01-31    0.02785
Freq: M, dtype: float64

當然，可以使用純函式方法而不是使用monthly_return函式

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/451458.html

標籤：Python 熊猫数据框日期

上一篇：計算Python中資料框每個月的計數器

下一篇：限制日期選擇器（從今天開始）-HTML