假設我有如下資料:
Date time price minute FOMC Daily.Return
<date> <time> <dbl> <dbl> <fct> <dbl>
1 2005-01-03 16:00:00 120. 960 FALSE -1.24
2 2005-01-04 16:00:00 119. 960 FALSE -1.44
3 2005-01-05 16:00:00 118. 960 FALSE -0.354
4 2005-01-06 16:00:01 119. 960 FALSE 0.245
5 2005-01-07 15:59:00 119. 959 FALSE -0.328
6 2005-01-10 16:00:00 119. 960 FALSE 0.506
7 2005-01-11 16:00:00 118. 960 FALSE -0.279
8 2005-01-12 16:00:01 119. 960 FALSE 0.329
9 2005-01-13 16:00:00 118. 960 FALSE -0.787
10 2005-01-14 16:00:00 118. 960 FALSE 0.372
我想Daily.Return使用FOMCTRUE 或 FALSE的變數來總結每個組。使用 dplyr 很容易。我得到以下資訊:
daily.SPY %>% group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
正如預期的那樣,我得到了以下內容:
FOMC Mean Median Vol
<fct> <dbl> <dbl> <dbl>
1 FALSE 0.00551 5.24 14.9
2 TRUE 20.8 1.20 17.6
但是,我希望有第三行可以在不分組的情況下執行相同的計算。它將計算整個樣本的平均值、中位數和標準差,而不以組為條件。在 內執行此操作的最簡單方法是什么tidyverse?謝謝!
uj5u.com熱心網友回復:
一種選擇是只排系結,你整個資料的重復mutate()的FOMC變數"ALL",讓你結了,作為一個獨立的組,當你group_by()和summarise()。
library(tidyverse)
set.seed(1)
daily.SPY <- tibble(
FOMC = factor(rep(c(T, F), each = 25)),
Daily.Return = c(cumsum(rnorm(25)), cumsum(rnorm(25)))
)
daily.SPY %>%
bind_rows(., mutate(., FOMC = "ALL")) %>%
group_by(FOMC) %>%
summarise(Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return/100))
#> # A tibble: 3 x 4
#> FOMC Mean Median Vol
#> <chr> <dbl> <dbl> <dbl>
#> 1 ALL 58.4 -6.57 32.3
#> 2 FALSE -80.3 -53.6 13.9
#> 3 TRUE 197. 151. 30.5
由reprex 包于 2022-01-11 創建(v2.0.1)
uj5u.com熱心網友回復:
您可以創建一個用于匯總資料的函式:
summarize_returns = function(data) {
data %>%
summarise(
Mean = 100 * mean(Daily.Return),
Median = 100 * median(Daily.Return),
Vol = 100 * sqrt(252) * sd(Daily.Return / 100),
.groups = "drop"
)
}
然后,您可以使用dplyr::bind_rows()以下方法組合兩個摘要:
data %>%
group_by(FOMC) %>%
summarize_returns() %>%
bind_rows(
data %>% summarize_returns() %>% mutate(FOMC = "Total")
)
# A tibble: 3 x 4
FOMC Mean Median Vol
<chr> <dbl> <dbl> <dbl>
1 FALSE -13.6 -13.3 15.5
2 TRUE 14.4 8.79 16.6
3 Total 0.992 -1.08 16.2
我的資料:
library(tidyverse)
set.seed(123)
data = tibble(
FOMC = as.character(sample(c(TRUE, FALSE), 100, replace = TRUE),
Daily.Return = rnorm(100)
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/409148.html
標籤:
上一篇:對R中的非順序元素求和
