我有一些溫度資料。我想撰寫一個簡單的 QA/QC 腳本來查看它并標記(在 QA/QC 意義上)需要驗證/手動檢查的資料。我希望它本質上將標志附加到現有列,而不為每個單獨的標志創建一個全新的列。我有辦法做到這一點,但它不優雅。有沒有更清潔的方法可以做到這一點?
d<-data.frame(time=1:20, temp=c(1:5,-60,7:10,NA,12:15,160,17:20))
time只是順序觀察,temp是一些虛構的溫度資料。
d$Flag[is.na(d$temp)]<-"MISSING" #flag the missing data
d$Flag[d$temp>120&!is.na(d$temp)]<-paste(d$Flag[d$temp>120&!is.na(d$temp)],"High",sep="_") #flag data beyond a threshold
d$Flag[d$temp<(-40)&!is.na(d$temp)]<-paste(d$Flag[d$temp<(-40)&!is.na(d$temp)],"Low",sep="_") #flag data below a threshold
dtIdx<-which(abs(diff(d$temp,lag=1))>10) #set an index vector of changes >10 based on first derivative
d$Flag[dtIdx]<-paste(d$Flag[dtIdx],"D10",sep="_") #select data and paste in new codes
d$Flag<-gsub("NA_","",d$Flag) #strip NA that is introduced to flags
這會創建變數Flag,然后用自身 來自每個新條件的新資訊順序覆寫它。它有效,但感覺很亂。我也不喜歡清理引入的 NA - 我可以從一開始就忽略它們嗎?
uj5u.com熱心網友回復:
這是使用tidyverse. 對于dtIdx,我使用該資訊臨時創建一個新列,然后Flag使用其他名稱(即MISSING、High和Low)創建該列case_when。然后,我unite忽略了兩列NA并且也 drop dtIdx。
library(tidyverse)
df %>%
mutate(
dtIdx = ifelse(c(abs(diff(temp, lag = 1)) > 10, FALSE), "D10", NA),
Flag = case_when(is.na(temp) ~ "MISSING",
temp > 120 ~ "High",
temp < -40 ~ "Low")) %>%
unite(
"Flag",
c(dtIdx, Flag),
sep = "_",
remove = TRUE,
na.rm = TRUE
)
輸出
time temp Flag
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5 D10
6 6 -60 D10_Low
7 7 7
8 8 8
9 9 9
10 10 10
11 11 NA MISSING
12 12 12
13 13 13
14 14 14
15 15 15 D10
16 16 160 D10_High
17 17 17
18 18 18
19 19 19
20 20 20
資料
df <- structure(list(
time = 1:20,
temp = c(1, 2, 3, 4, 5,-60, 7, 8,
9, 10, NA, 12, 13, 14, 15, 160, 17, 18, 19, 20)
),
class = "data.frame",
row.names = c(NA,-20L))
uj5u.com熱心網友回復:
您可以從您使用的程序中抽象出一個函式。像這樣的東西
flag <- function(..., init, sep = "_") {
trimws(Reduce(
\(x, y) replace(x, y[[1L]], paste(x[y[[1L]]], y[[2L]], sep = sep)),
list(...), init = init
), "left", sep)
}
然后像這樣應用它
d$Flag <- flag(
list(is.na(d$temp), "MISSING"),
list(which(d$temp > 120), "High"),
list(which(d$temp < -40), "Low"),
list(which(abs(diff(d$temp, lag = 1)) > 10), "D10"),
init = character(nrow(d))
)
輸出
time temp Flag
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5 D10
6 6 -60 Low_D10
7 7 7
8 8 8
9 9 9
10 10 10
11 11 NA MISSING
12 12 12
13 13 13
14 14 14
15 15 15 D10
16 16 160 High_D10
17 17 17
18 18 18
19 19 19
20 20 20
或factor與interaction.
na_as <- forcats::fct_explicit_na
DEFAULT <- ""
d$Flag <- trimws(whitespace = "_", interaction(sep = "_",
factor(is.na(d$temp), labels = c(DEFAULT, "MISSING")),
na_as(factor(findInterval(d$temp, c(-40, 120)), labels = c("Low", DEFAULT, "High")), DEFAULT),
na_as(factor(abs(c(diff(d$temp, lag = 1), NA)) > 10, labels = c(DEFAULT, "D10")), DEFAULT)
))
你得到與上面相同的輸出。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/414454.html
標籤:
