如何有效地使用我的腳本來糾正記錄器在R中的季節性漂移？-有解無憂

我在水中安裝了一堆二氧化碳記錄儀，在開放水域每小時記錄一次二氧化碳。在安裝之前和之后，我在 3 種不同濃度的 CO2 下對記錄器進行了表征。

我假設誤差的季節性漂移將是線性的
我假設我的特征點之間的誤差是線性的

我的腳本基于一個遍歷每個時間戳并更正值的 for 回圈，這可行，但不幸的是不夠快。我知道這可以在一秒鐘內完成，但我不確定如何。我尋求一些建議，如果有人能告訴我如何做，我將不勝感激。

基于基本 R 的可重現示例：

start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))
#generate dummy dataframe
dummy <- data.frame(dt,co2)

#actual values used in characterization
actual <- c(0,400,1000)

#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)

diff.pre <- measured.pre-actual#diff at precharacterization
diff.post <- measured.post-actual#diff at post

#linear interpolation of how deviance from actual values change throughout the season
#I assume that the temporal drift is linear 
diff.0 <- seq(diff.pre[1],diff.post[1],length.out=length(dummy$dt))
diff.400 <- seq(diff.pre[2],diff.post[2],length.out = length(dummy$dt))
diff.1000 <-  seq(diff.pre[3],diff.post[3],length.out = length(dummy$dt))

#creates a data frame with the assumed drift at each increment throughout the season
dummy <- data.frame(dummy,diff.0,diff.400,diff.1000)

#this loop makes a 3-point calibration at each day in the dummy data set
co2.corrected <- vector()
for(i in 1:nrow(dummy)){
  print(paste0("row: ",i))#to show the progress of the loop
  diff.0 <- dummy$diff.0[i]#get the differences at characterization increments
  diff.400 <- dummy$diff.400[i]
  diff.1000 <- dummy$diff.1000[i]
  #values below are only used for encompassing the range of measured values in the characterization
  #this is based on the interpolated difference at the given time point and the known concentrations used 
  measured.0 <- diff.0 0
  measured.400 <- diff.400 400
  measured.1000 <- diff.1000 1000
  
  #linear difference between calibration at 0 and 400
  seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
  #linear difference between calibration at 400 and 1000
  seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
  #bind them together to get one vector
  correction.ppm <- c(seg1,seg2)
  
  
  #the complete range of measured co2 in the characterization.
  #in reality it can not be below 0 and thus it can not be below the minimum measured in the range
  measured.co2.range <- round(seq(measured.0,measured.1000,length.out=length(correction.ppm)))
  #generate a table from which we can characterize the measured values from
  correction.table <- data.frame(measured.co2.range,correction.ppm)
  
  co2 <- dummy$co2[i] #measured co2 at the current row
  #find the measured value in the table and extract the difference
  diff <- correction.table$correction.ppm[match(co2,correction.table$measured.co2.range)]
  #correct the value and save it to vector
  co2.corrected[i] <- co2-diff
  
}
#generate column with calibrated values
dummy$co2.corrected <- co2.corrected

uj5u.com熱心網友回復：

這是我查看代碼后的理解。您有一系列 CO2 濃度讀數，但它們需要根據在時間序列開始和時間序列結束時進行的表征測量進行校正。兩組表征測量均使用三個已知濃度進行：0、400 和 1000。

您的代碼似乎正在嘗試應用雙線性插值（隨著時間和濃度）來應用所需的校正。這很容易矢量化：

set.seed(1)
start <- as.POSIXct("2022-08-01 00:00:00")#time when logger is installed
stop <- as.POSIXct("2022-09-01 00:00:00")#time when retrieved
dt <- seq.POSIXt(start,stop,by=3600)#generate datetime column, measured hourly
#generate a bunch of values within my measured range
co2 <- round(rnorm(length(dt),mean=600,sd=100))

#actual values used in characterization
actual <- c(0,400,1000)

#measured in the container by the instruments being characterized
measured.pre <- c(105,520,1150)
measured.post <- c(115,585,1250)
# interpolate the reference concentrations over time
cref <- mapply(seq, measured.pre, measured.post, length.out = length(dt))
#generate dummy dataframe with corrected values
dummy <- data.frame(
  dt,
  co2,
  co2.corrected = ifelse(
    co2 < cref[,2],
    actual[1]   (co2 - cref[,1])*(actual[2] - actual[1])/(cref[,2] - cref[,1]),
    actual[2]   (co2 - cref[,2])*(actual[3] - actual[2])/(cref[,3] - cref[,2])
  )
)
head(dummy)
#>                    dt co2 co2.corrected
#> 1 2022-08-01 00:00:00 537      416.1905
#> 2 2022-08-01 01:00:00 618      493.2432
#> 3 2022-08-01 02:00:00 516      395.9776
#> 4 2022-08-01 03:00:00 760      628.2707
#> 5 2022-08-01 04:00:00 633      507.2542
#> 6 2022-08-01 05:00:00 518      397.6533

uj5u.com熱心網友回復：

我不知道你在計算什么（我覺得這可以做不同的），但你可以通過以下方式提高速度：

remove print，在回圈中需要很多時間
在每次迭代中洗掉data.frame創建，這很慢，這里不需要

這個回圈應該更快：

for(i in 1:nrow(dummy)){
  diff.0 <- dummy$diff.0[i]
  diff.400 <- dummy$diff.400[i]
  diff.1000 <- dummy$diff.1000[i]
  
  measured.0 <- diff.0 0
  measured.400 <- diff.400 400
  measured.1000 <- diff.1000 1000
  
  seg1 <- seq(diff.0,diff.400,length.out=measured.400-measured.0)
  seg2 <- seq(diff.400,diff.1000,length.out=measured.1000-measured.400)
  correction.ppm <- c(seg1,seg2)
  
  s <- seq(measured.0,measured.1000,length.out=length(correction.ppm))
  measured.co2.range <- round(s)
  
  co2 <- dummy$co2[i]
  diff <- correction.ppm[match(co2, measured.co2.range)]
  co2.corrected[i] <- co2-diff
}

ps 現在我的測驗中最慢的部分是round(s). 也許可以洗掉或重寫...

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/529032.html

標籤：r表现for循环校准

上一篇：使用PHP將陣列放入csv

下一篇：如何從python串列中的字串中提取單詞？