我正在嘗試對資料集運行非線性回歸,因此我想為每個組運行一個新的回歸。資料框很像這樣:
Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
"t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)
df
Date Collar Ts Exp_flux
1 2021-05-25 t1 9.596453 3.361570
2 2021-05-20 t2 8.870983 3.126334
3 2021-05-21 t3 10.011902 3.504168
4 2021-05-22 t4 10.480873 3.672418
5 2021-05-23 t5 10.264998 3.593989
6 2021-05-24 t1 10.196256 3.569368
7 2021-05-25 t2 9.523135 3.337014
8 2021-05-20 t3 10.315953 3.612349
9 2021-05-21 t4 9.510503 3.332801
10 2021-05-22 t5 10.300981 3.606945
11 2021-05-23 t1 10.788605 3.787187
12 2021-05-24 t2 10.226902 3.580323
13 2021-05-25 t3 9.005530 3.168683
14 2021-05-20 t4 10.752006 3.773351
15 2021-05-21 t5 9.335704 3.275051
16 2021-05-22 t1 9.345418 3.278234
17 2021-05-23 t2 10.034693 3.512164
18 2021-05-24 t3 10.754786 3.774401
19 2021-05-25 t4 9.655313 3.381415
20 2021-05-20 t5 10.670903 3.742872
21 2021-05-21 t1 8.986950 3.162801
22 2021-05-22 t2 10.441217 3.657883
23 2021-05-23 t3 10.446326 3.659753
24 2021-05-24 t4 10.550104 3.697931
25 2021-05-25 t5 10.442247 3.658260
我的目標是對每種衣領型別運行 Exp_flux 與 Ts 的單獨回歸。我知道我可以將主要資料集分成每個衣領的子集并手動執行每個回歸,但實際上有 20 多種衣領型別,我認為必須有更有效的方法來做到這一點。我嘗試使用包的nlsList功能nlme,它只給出一個空串列或(在以前的情況下)僅第一個項圈的回歸:
fit.collars <- nlsList(Exp_Flux ~ SRref*q^((Ts-10)/10)| Collar,
data=df, start=list(SRref=3, q=2), na.action = na.omit )
summary(fit.collars)
Error in class(val) <- c("summary.nlsList", class(val)) :
attempt to set an attribute on NULL
我一定是錯誤地使用了 nlsList 函式,但我不知道是怎么回事。關于此功能的教程在線非常少。任何人都可以就此或相對簡單的替代方案提出建議嗎?
uj5u.com熱心網友回復:
讓我參考help("nls"):
nls 的默認設定通常會在人為的“零殘差”資料問題上失敗。
如果我添加一些白噪聲并修復錯字,我會成功匹配。
set.seed(42)
Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10) rnorm(25, sd = 0.01)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
"t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)
library(nlme)
fit.collars <- nlsList(Exp_flux ~ SRref*q^((Ts-10)/10)| Collar,
data=df, start=list(SRref=3, q=2), na.action = na.omit )
summary(fit.collars)
#works
如果您真的想要合并的殘差標準誤,請仔細考慮。
uj5u.com熱心網友回復:
有幾個問題:
- 公式中的 Exp_Flux 與 Exp_flux 不同的是列名
- 該問題使用沒有 set.seed 的亂數,因此資料不可重現。我們使用了最后注釋中顯示的資料來保證重現性。
- 可能需要更好的起始值。使用最后注釋中的資料,問題中的起始值按原樣作業,但由于資料不可重現,我們添加了更好的起始值,以防它們不在實際資料中。
- 添加 control = nls.control(scaleOffset = 1) 引數以處理零殘差。請注意, scaleOffset 是在 R 4.1.2 中引入的,并且在 R 的早期版本中不可用。
代碼 -
library(nlme)
# get starting values
fit0 <- lm(log(Exp_flux) ~ I((Ts-10)/10), df)
st <- setNames(exp(coef(fit0)), c("SRref", "q"))
fo2 <- Exp_flux ~ SRref * q^((Ts-10)/10) | Collar
fit2 <- nlsList(fo2, data=df, start = st, na.action = na.omit,
control = nls.control(scaleOffset = .1))
fit2
給予:
Call:
Model: Exp_flux ~ SRref * q^((Ts - 10)/10) | Collar
Data: df
Coefficients:
SRref q
t1 3.5 2.718282
t2 3.5 2.718282
t3 3.5 2.718282
t4 3.5 2.718282
t5 3.5 2.718282
Degrees of freedom: 25 total; 15 residual
Residual standard error: 1.152352e-15
分組與未分組
請注意,此資料按領分組并不重要。我們已經可以從系數相同的情況下觀察到,但如果實際資料不是這種情況,這就是使用方差分析執行測驗的方法。
# ungrouped
fo3 <- Exp_flux ~ SRref * q^((Ts-10)/10)
fit3 <- nls(fo3, data = df, start = st,
na.action = na.omit, control = list(scaleOffset = 1))
# grouped
fo4 <- Exp_flux ~ SRref[Collar] * q[Collar]^((Ts-10)/10)
fit4 <- nls(fo4, data = df,
start = list(SRref = rep(st[[1]], 5), q = rep(st[[2]], 5)),
na.action = na.omit, control = list(scaleOffset = 1))
anova(fit3, fit4)
給予:
Analysis of Variance Table
Model 1: Exp_flux ~ SRref * q^((Ts - 10)/10)
Model 2: Exp_flux ~ SRref[Collar] * q[Collar]^((Ts - 10)/10)
Res.Df Res.Sum Sq Df Sum Sq F value Pr(>F)
1 23 1.9919e-29
2 15 1.9919e-29 8 0 0 1
線性模型
請注意,簡單的線性模型非常適合資料。
plot(Exp_flux ~ Ts, df, col = df$Collar)
fm0 <- lm(Exp_flux ~ Ts, df)
abline(fm0)

筆記
我們使用了這些資料:
set.seed(123)
Date <- as.POSIXct(c("2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25","2021-05-20", "2021-05-21","2021-05-22",
"2021-05-23","2021-05-24" ,"2021-05-25"))
Ts <- rnorm(25, mean=10, sd=0.5)
Exp_flux <- 3.5*exp((Ts-10)/10)
Collar <- as.factor(c("t1","t2","t3","t4","t5","t1","t2","t3","t4","t5","t1","t2","t3","t4",
"t5","t1","t2","t3","t4","t5","t1","t2","t3","t4","t5"))
df <- data.frame(Date,Collar,Ts,Exp_flux)
更新 3
從本答案的先前版本中的問題中復制公式時出錯,并進行了更改以使其正常作業。現在已經修復了這些問題,現在它可以使用 (1) 改進可能需要或不需要的起始值和 (2) 添加 scaleOffset 引數。@Roland 指出模型錯誤,模型殘差為零。
還添加了關于比較分組與未分組以及使用簡單線性模型的部分。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/375194.html
上一篇:在R中合并多對資料幀
