我使用的 data.table 代碼作業正常,但我無法轉換為包含ifelse陳述句。我使用以下表示:
set.seed(1645)
Place <- c(rep("Copenhagen",7),rep("Berlin",11),rep("Roma",12))
Year <- c(rep("2020",4),rep("2021",3),rep("2020",6),rep("2021",5),rep("2019",4),rep("2020",4),rep("2021",4))
Value1 <- c(runif(3),NA,runif(8),NA,runif(9),NA,runif(7))
Value2 <- c(runif(4),NA,runif(2),runif(6),NA,NA,runif(11),NA,NA,runif(2))
df <- data.frame(Place,Year,Value1,Value2)
> df
Place Year Value1 Value2
1 Copenhagen 2020 0.10517697 0.865935100
2 Copenhagen 2020 0.96597760 0.579956282
3 Copenhagen 2020 0.47262307 0.346569960
4 Copenhagen 2020 NA 0.478763951
5 Copenhagen 2021 0.90030423 NA
6 Copenhagen 2021 0.14444142 0.280377315
7 Copenhagen 2021 0.73801550 0.302816525
8 Berlin 2020 0.13961383 0.641314310
9 Berlin 2020 0.40221211 0.756374251
10 Berlin 2020 0.49613139 0.070459347
11 Berlin 2020 0.95190545 0.184497038
12 Berlin 2020 0.40182901 0.407892240
13 Berlin 2020 NA 0.002209376
14 Berlin 2021 0.38310025 NA
15 Berlin 2021 0.76417492 NA
16 Berlin 2021 0.29001287 0.632133629
17 Berlin 2021 0.84478784 0.365406326
18 Berlin 2021 0.55547323 0.493870653
19 Roma 2019 0.44198733 0.067744090
20 Roma 2019 0.50403809 0.847876518
21 Roma 2019 0.85358805 0.952393606
22 Roma 2019 0.74996137 0.887583928
23 Roma 2020 NA 0.631937527
24 Roma 2020 0.08303509 0.993400333
25 Roma 2020 0.74205719 0.589183185
26 Roma 2020 0.27552659 0.522451407
27 Roma 2021 0.39518410 NA
28 Roma 2021 0.38390124 NA
29 Roma 2021 0.36605674 0.942102065
30 Roma 2021 0.32014949 0.375689863
如果組中 <= 25% NA,我想按地點和年份計算平均值。沒有我的情況,這可以正常作業:
setDT(df)
df_means <- df[,.(Value1_mean = mean(Value1),Value2_mean = mean(Value2)), by = .(Place,Year)]
> df_means
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 NA 0.4257258
2: Copenhagen 2021 0.3581245 NA
3: Berlin 2020 NA 0.3935807
4: Berlin 2021 0.3729461 NA
5: Roma 2019 0.4572996 0.3956536
6: Roma 2020 NA 0.6494491
7: Roma 2021 0.4142637 NA
我沒有包含該ifelse陳述句,這不起作用:
df_means2 <- df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>=0.25,NA,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>=0.25,NA,mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
我檢查了這些帖子1、2和3沒有解決我的問題。我的預期結果應該是這樣的:
> df_means2
Place Year Value1_mean Value2_mean
1 Copenhagen 2020 mean mean
2 Copenhagen 2021 mean <NA>
3 Berlin 2020 mean mean
4 Berlin 2021 mean <NA>
5 Roma 2019 mean mean
6 Roma 2020 mean mean
7 Roma 2021 mean <NA>
如何轉換我的代碼?
uj5u.com熱心網友回復:
我們可能會使用 if/else
library(data.table)
df[, lapply(.SD, function(x) if(mean(is.na(x)) <= 0.25)
mean(x, na.rm = TRUE) else NA_real_), by = .(Place, Year)]
-輸出
Place Year Value1 Value2
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA
在 OP 的代碼中,使用NA_real_而不是NA默認情況下logical 這會在class
df[,.(Value1_mean = ifelse(sum(is.na(Value1))/length(Value1)>0.25,
NA_real_,mean(Value1,na.rm=TRUE)),
Value2_mean = ifelse(sum(is.na(Value2))/length(Value2)>0.25,
NA_real_,
mean(Value2,na.rm=TRUE))),
by = .(Place,Year)]
-輸出
Place Year Value1_mean Value2_mean
1: Copenhagen 2020 0.5145925 0.5678063
2: Copenhagen 2021 0.5942537 NA
3: Berlin 2020 0.4783384 0.3437911
4: Berlin 2021 0.5675098 NA
5: Roma 2019 0.6373937 0.6888995
6: Roma 2020 0.3668730 0.6842431
7: Roma 2021 0.3663229 NA
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/371052.html
