如何為R中的n行重復回圈線性回歸？[復制]-有解無憂

這個問題在這里已經有了答案： R 中使用 roll apply 的滾動回歸 2 個回答 3 小時前關閉。

我目前正在研究一個看起來像這樣的資料集

水流	x1	x2	x3	x4
145	24	25	43	44
198	30	45	66	54
285	32	39	54	47
...	...	...	...	..

總共下降了 4408 行。我想要做的是streamflow ~ x1 x2 x3 x4從第 1 行到第 20 行，然后從第 2 行到第 21 行，從第 3 行到第 22 行進行線性回歸，直到最后，以便我每次運行都能得到一組系數。我知道我可能需要for啟動回圈，但我無法弄清楚如何讓它在每 20 行上作業。任何建議將不勝感激。先感謝您。

for(i in 1:nrow(CFbasin)) {
  y <- CFbasin[i:(i 20), 2]
  x1 <- CFbasin[i:(i 20), 3]
  x2 <- CFbasin[i:(i 20), 4]
  mod_coef[i] <- coef(lm(y ~ x1   x2))
}

所以這就是我寫的，它沒有給我理想的結果

uj5u.com熱心網友回復：

這是使用iris資料集的基本 R 方法：

data(iris)
str(iris)
# 'data.frame': 150 obs. of  5 variables:
#  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

由于有 150 行，我們計算 20 個滾動組的數量，并使用它來創建一個包含 20 行和 131 列的矩陣，列出每個回歸中要使用的行號：

rows <- nrow(iris)
last <- rows   1 - 20
idx <- sapply(1:last, seq, length.out=20)
str(idx)
#  num [1:20, 1:131] 1 2 3 4 5 6 7 8 9 10 ...

所以我們有 131 列，每列標識一組 20 行用于回歸。現在計算 131 個回歸并保存系數：

results <- lapply(1:131, function(x) lm(Sepal.Length ~ Sepal.Width   Petal.Length   Petal.Width, iris[idx[, x], ]))
coeffs <- t(sapply(results, coef))
head(coeffs)
#      (Intercept) Sepal.Width Petal.Length Petal.Width
# [1,]  0.88165253   1.1027541    0.4335847  -1.3039612
# [2,]  0.64094220   1.1111668    0.6186075  -1.4860753
# [3,]  0.28030724   1.2120241    0.6477881  -1.7022181
# [4,] -0.01943516   1.1879500    0.8971728  -1.6773764
# [5,]  0.46106345   0.9888293    0.9230228  -0.8457783
# [6,]  0.92206667   0.9734378    0.5716684  -0.5058189

每個回歸都存盤為一個串列，results因此第一個回歸是results[[1]]。

summary(results[[1]])
# 
# Call:
# lm(formula = Sepal.Length ~ Sepal.Width   Petal.Length   Petal.Width, 
#     data = iris[idx[, x], ])
# 
# Residuals:
#      Min       1Q   Median       3Q      Max 
# -0.26396 -0.17137 -0.00562  0.13582  0.36386 
# 
# Coefficients:
#              Estimate Std. Error t value Pr(>|t|)    
# (Intercept)    0.8817     0.6730   1.310    0.209    
# Sepal.Width    1.1028     0.1748   6.309 1.04e-05 ***
# Petal.Length   0.4336     0.3448   1.257    0.227    
# Petal.Width   -1.3040     0.7924  -1.646    0.119    
# ---
# Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
# 
# Residual standard error: 0.204 on 16 degrees of freedom
# Multiple R-squared:  0.8078,  Adjusted R-squared:  0.7717 
# F-statistic: 22.41 on 3 and 16 DF,  p-value: 5.666e-06

獲取計算的統計資料summary稍微復雜一些：

Rsq <- sapply(results, function(x) summary(x)$adj.r.squared)
# quantile(Rsq)
#        0%       25%       50%       75%      100% 
# 0.1635166 0.4471409 0.6298927 0.8417655 0.9278258

uj5u.com熱心網友回復：

你可以做這樣的事情，如果我理解你的意思是正確的：

直到這一行，mutate(group20rows = as.integer(gl(n(), 20, n())), .before=1) %>%它只是準備假資料
使用所需的列創建一個 20 的序列。
group 和 apply group_split，然后你有這些資料組每 20
適用map_dfr于迭代每組您的回歸
使用glance()from broompackage 在一個資料框中整齊地顯示所有內容

library(tidyverse)
library(broom)

colnames1 <- c("streamflow", "x1", "x2", "x3", "x4")

iris %>% 
  select(-5) %>% 
  mutate(Sepal.Length1 = Sepal.Length) %>% 
  rename_with(~colnames1) %>% 
  mutate(streamflow=streamflow*100) %>% 
  mutate(group20rows = as.integer(gl(n(), 20, n())), .before=1) %>% 
  mutate(group20rows = as_factor(group20rows)) %>% 
  group_by(group20rows) %>% 
  group_split() %>% 
  map_dfr(.f = function(df){
    lm(streamflow ~ x1 x2 x3, data = df) %>% 
      glance() %>% 
      add_column(group20rows = unique(df$group20rows), .before=1)
  })

輸出：

  group20rows r.squared adj.r.squared sigma statistic     p.value    df logLik   AIC   BIC deviance df.residual  nobs
  <fct>           <dbl>         <dbl> <dbl>     <dbl>       <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>       <int> <int>
1 1               0.808         0.772  20.4     22.4  0.00000567      3  -86.5 183.   188.    6662.          16    20
2 2               0.335         0.210  26.2      2.69 0.0815          3  -91.4 193.   198.   10961.          16    20
3 3               0.873         0.850  32.0     36.8  0.000000207     3  -95.5 201.   206.   16420.          16    20
4 4               0.511         0.419  34.4      5.57 0.00819         3  -96.9 204.   209.   18919.          16    20
5 5               0.530         0.442  30.0      6.01 0.00609         3  -94.2 198.   203.   14417.          16    20
6 6               0.877         0.854  28.0     38.0  0.000000165     3  -92.8 196.   200.   12502.          16    20
7 7               0.787         0.747  32.5     19.7  0.0000128       3  -95.8 202.   207.   16895.          16    20
8 8               0.644         0.466  28.0      3.62 0.0845          3  -45.0  99.9  101.    4719.           6    10

轉載請註明出處，本文鏈接：https://www.uj5u.com/yidong/352284.html

標籤：r

上一篇：用R中矩陣中最頻繁的值替換列中的值與兩個變數

下一篇：提取R中矩陣列中每個值的列名