我有一個包含兩個變數的資料框。第一個變數是從 1 到 2016 年連續跨越的年份,第二個變數是我感興趣的值。例如
ts <- data.frame(Year=c(1:2016), TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
我需要從上次觀察 (2016-50=1966) 之前的 50 年開始計算 21 個移動平均值,每個值包含一個范圍為 10、11、12、...30 年的視窗,第一個值與年份對齊2016. 我需要的是一個包含 23 列(“Year”、“TS”、“av10”、“av11”、“av12”、...“av30”)和從第 1 年到 2016 年的 2016 行的資料框. 因此,我這樣做了:
subset1966 <- ts[1:1966,]
subset1966$Year.50 <- c(51:2016)
ts.50yrs <- subset1966%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.50yrs.df <- ts.50yrs[,-c(1:2)]
colnames(ts.50yrs.df)[1] <- "Year"
df.ts.50 <- full_join(ts,ts.50yrs.df)
我在上次觀察前 100、150、200 和 250 年的不同年份重復相同的程式。例如
subset1766 <- data[1:1766,]
subset1766$Year.250 <- c(251:2016)
ts.250yrs <- subset1766%>%
mutate(av10 = rollmean(TS, k = 10, fill = NA, align = "right"),
av11 = rollmean(TS, k = 11, fill = NA, align = "right"),
av12 = rollmean(TS, k = 12, fill = NA, align = "right"),
av13 = rollmean(TS, k = 13, fill = NA, align = "right"),
av14 = rollmean(TS, k = 14, fill = NA, align = "right"),
av15 = rollmean(TS, k = 15, fill = NA, align = "right"),
av16 = rollmean(TS, k = 16, fill = NA, align = "right"),
av17 = rollmean(TS, k = 17, fill = NA, align = "right"),
av18 = rollmean(TS, k = 18, fill = NA, align = "right"),
av19 = rollmean(TS, k = 19, fill = NA, align = "right"),
av20 = rollmean(TS, k = 20, fill = NA, align = "right"),
av21 = rollmean(TS, k = 21, fill = NA, align = "right"),
av22 = rollmean(TS, k = 22, fill = NA, align = "right"),
av23 = rollmean(TS, k = 23, fill = NA, align = "right"),
av24 = rollmean(TS, k = 24, fill = NA, align = "right"),
av25 = rollmean(TS, k = 25, fill = NA, align = "right"),
av26 = rollmean(TS, k = 26, fill = NA, align = "right"),
av27 = rollmean(TS, k = 27, fill = NA, align = "right"),
av28 = rollmean(TS, k = 28, fill = NA, align = "right"),
av29 = rollmean(TS, k = 29, fill = NA, align = "right"),
av30 = rollmean(TS, k = 30, fill = NA, align = "right"))
ts.250yrs.df <- ts.250yrs[,-c(1:2)]
colnames(ts.250yrs.df)[1] <- "Year"
df.ts.250 <- full_join(ts,ts.50yrs.df)
And then I merged the data frames together. Eg.:
df <- cbind(df.ts.50, df.ts.100, df.ts.150, df.ts.200, df.ts.250)
However, what I would like to do next is to repeat the same moving average with the start year that goes from 50, 51, 52, ...250, meaning starting in 1966 (2016-50) and going until 1766 (2016-250) and having a moving average ranging from 10, 11, 12, ...30 years for each starting year. Additionally, all the values would need to be aligned to the year 2016.
因此,這將是一個包含 4221 列(201(= 50 到 250 年)* 21(= 10 到 30 年的視窗))的資料框,加上“Year”和“TS”兩列。列名類似于“Year”、“TS”、“av50.10yr”、“av50.11yr”、“av50.12yr”、...、“av50.30yr”、“av51.10yr”、“ av51.11yr", "av51.12yr", ... "av51.30yr", ... "av250.10yr", "av250.11yr", "av250.12yr", ..., "av250.30yr" ) 和 2016 行,從第 1 年到 2016 年。
uj5u.com熱心網友回復:
在這里,我創建了一個自定義函式moving_mean(),它接受資料框和起始年份。然后,我使用mutate從 2016 年開始的年份序列(例如 50)創建一個新列。然后,我創建了一個名稱串列,這些名稱將用于資料框中的列名(例如,av50.10yr)。然后,我用map2對mutate新列滾動的年數(例如,10,11,...,30)。然后,我過去常常purrr::reduce將給定的起始年份(例如 50)的所有資料幀合并為一個。接下來,我用map跑的功能,moving_mean從50對原資料幀,每個起始年到250 ts。然后,我將 201 個資料幀放入同一個資料幀(再次使用reduce)。最后,我將這個資料框加入了原始資料框,它添加了 4,221 個新列到ts.
library(tidyverse)
moving_mean <- function(data, y){
new_year <- paste0("Year.", as.character(y))
x_new <- data %>%
filter(Year <= (tail(Year, n=1) - y)) %>%
rowwise %>%
dplyr::mutate({{new_year}} := (Year y)) %>%
ungroup()
varnames <- unlist(map(10:30, function(x) paste0("av", y, ".", x, "yr")))
map2(10:30, varnames, function(x, y) x_new %>%
dplyr::mutate({{y}} := rollmean(TS, k = x, fill = NA, align = "right")) %>%
as.data.frame()) %>%
reduce(left_join, by = c(names(x_new)[1:3])) %>%
select(-c("Year", "TS")) %>%
dplyr::rename(Year = 1)
}
results <- map(c(seq(50, 250, 1)), ~ moving_mean(ts, .x)) %>%
reduce(left_join, by = "Year") %>%
left_join(ts, ., by = "Year")
輸出(這里只列印一小部分)
dim(results)
# [1] 2016 4223
results[60:70, 1:15]
Year TS av50.10yr av50.11yr av50.12yr av50.13yr av50.14yr av50.15yr av50.16yr av50.17yr av50.18yr av50.19yr av50.20yr av50.21yr av50.22yr
60 60 0.63 -0.080 NA NA NA NA NA NA NA NA NA NA NA NA
61 61 -1.50 0.123 -0.01727273 NA NA NA NA NA NA NA NA NA NA NA
62 62 -0.07 0.295 0.15545455 0.02416667 NA NA NA NA NA NA NA NA NA NA
63 63 -0.84 0.338 0.30000000 0.17166667 0.04923077 NA NA NA NA NA NA NA NA NA
64 64 -0.36 0.317 0.39363636 0.35416667 0.23153846 0.1135714 NA NA NA NA NA NA NA NA
65 65 0.34 0.442 0.29000000 0.36250000 0.32846154 0.2164286 0.1073333 NA NA NA NA NA NA NA
66 66 -0.45 0.312 0.35363636 0.22166667 0.29384615 0.2671429 0.1666667 0.067500 NA NA NA NA NA NA
67 67 -1.17 0.415 0.25000000 0.29333333 0.17615385 0.2464286 0.2246667 0.133125 0.04176471 NA NA NA NA NA
68 68 -0.77 0.355 0.42363636 0.27166667 0.31000000 0.2000000 0.2640000 0.242500 0.15529412 0.06777778 NA NA NA NA
69 69 0.51 0.477 0.45090909 0.50583333 0.35923077 0.3885714 0.2806667 0.335625 0.31117647 0.22500000 0.1384211 NA NA NA
70 70 -0.25 0.331 0.42272727 0.40333333 0.45769231 0.3250000 0.3546667 0.255625 0.30882353 0.28722222 0.2068421 0.1255 NA NA
資料
set.seed(28)
ts <- data.frame(Year = c(1:2016),
TS = sample(
seq(from = -1.5, to = 1.5, by = 0.01),
size = 2016,
replace = TRUE
))
uj5u.com熱心網友回復:
您可以將邏輯放入一個函式中makeAvg(),該函式只需要年數并從中減去max(dat$year),即50給出所需的1:1966,
# makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
# sbs <- 1:(max(data$Year) - s)
# X <- data[sbs, ]
# cbind(X, sapply(ks, \(k) zoo::rollmean(X[[x]], k=k, fill=NA, align="right")) |>
# `colnames<-`(paste0('av', ks, '_', s)))
# }
(編輯:為了避免緩慢zoo::rollmean,請嘗試下面的這個版本,它使用的速度RcppRoll::roll_mean大約快 20 倍。)
makeAvg <- \(s, ks=10:30, x='TS', data=dat) {
sbs <- 1:(max(data$Year) - s)
X <- data[sbs, ]
cbind(X,
sapply(ks, \(k) RcppRoll::roll_mean(X[[x]], n=k, fill=NA, align='r')) |>
`colnames<-`(paste0('av', ks, '_', s)))
}
注意: R 版本 >= 4.1
然后拋出這個:
makeAvg(50)[8:13, ]
# Year TS av10_50 av11_50 av12_50 av13_50 av14_50 av15_50 av16_50
# 8 8 -0.23 NA NA NA NA NA NA NA
# 9 9 -1.27 NA NA NA NA NA NA NA
# 10 10 -0.62 -0.448 NA NA NA NA NA NA
# 11 11 0.14 -0.332 -0.3945455 NA NA NA NA NA
# 12 12 -0.41 -0.375 -0.3390909 -0.3958333 NA NA NA NA
# 13 13 -1.31 -0.429 -0.4600000 -0.4200000 -0.4661538 NA NA NA
# av17_50 av18_50 av19_50 av20_50 av21_50 av22_50 av23_50 av24_50 av25_50 av26_50
# 8 NA NA NA NA NA NA NA NA NA NA
# 9 NA NA NA NA NA NA NA NA NA NA
# 10 NA NA NA NA NA NA NA NA NA NA
# 11 NA NA NA NA NA NA NA NA NA NA
# 12 NA NA NA NA NA NA NA NA NA NA
# 13 NA NA NA NA NA NA NA NA NA NA
# av27_50 av28_50 av29_50 av30_50
# 8 NA NA NA NA
# 9 NA NA NA NA
# 10 NA NA NA NA
# 11 NA NA NA NA
# 12 NA NA NA NA
# 13 NA NA NA NA
然后把它starts放入一個向量中,
starts <- seq(50, 250, 1)
在您使用回圈lapply,并把它變成Reduce()有merge():
res <- Reduce(\(...) merge(..., all=TRUE), lapply(starts, makeAvg)) ## runs ~25 s
這給出了一個具有完全預期維度維度的資料框,
dim(res)
# [1] 1966 4223
和這些名字:
names(res)
# [1] "Year" "TS" "av10_50" "av11_50" "av12_50" "av13_50" "av14_50"
# [8] "av15_50" "av16_50" "av17_50" "av18_50" "av19_50" "av20_50" "av21_50"
# [...]
資料:
set.seed(42)
dat <- data.frame(Year=c(1:2016),
TS=sample(seq(from=-1.5, to=1.5,by=0.01), size=2016, replace=TRUE))
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/399449.html
