我正在使用 tapply 來計算組均值,但我無法從輸出中獲得這些分組是什么。分組是兩個變數 Var1 和 Var2 的組合。用一些代碼演示會更容易
set.seed(123)
df <- mtcars
df$VAR1 <- sample(c("A","B"), nrow(mtcars) , replace = TRUE)
df$VAR2 <- sample(c("X","Y"), nrow(mtcars) , replace = TRUE)
df_result <- data.frame(apply(df[,sapply(df,is.numeric)], 2, function(x) tapply(x, list(df$VAR1,df$VAR2),mean)) )
輸出
> df_result
mpg cyl disp hp drat wt qsec vs am gear carb
1 22.3800 5.8 194.0500 126.80 3.74900 2.803000 18.44600 0.500 0.5 3.70 2.100
2 17.9900 6.8 284.0700 162.70 3.19900 3.645300 17.82700 0.400 0.2 3.30 2.600
3 18.9125 6.5 252.7875 168.50 3.74375 3.366125 17.16625 0.375 0.5 3.75 3.625
4 21.9750 5.0 144.9000 112.75 3.91500 2.885000 17.77500 0.500 0.5 4.50 3.500
如您所見,我無法分辨哪一行是 Var1 和 Var2 的哪個組合。這 4 個組應該是 AX、AY、BX、BY。有誰知道如何添加該表?對代碼的任何修改或簡化都是理想的。甚至有可能嗎?是的,我知道使用“dplyr”和“aggregate”可以更簡單地完成,但是我想以某種方式使用 sapply/tapply/lapply 來做到這一點。如果我做錯了什么,請簡化。非常感謝任何幫助。謝謝
uj5u.com熱心網友回復:
老實說,我不認為這是使用 apply 系列函式的正確問題。如果您想了解它們,您可以使用其他各種示例。
像這樣aggregate的功能是為此而構建的 -
aggregate(.~VAR1 VAR2, df, mean)
顯然,您可以用劍剪紙,但這不是它的用途。這是使用tapply-
tmp <- unique(df[c('VAR1', 'VAR2')])
rownames(tmp) <- NULL
cbind(tmp[with(tmp, order(VAR2, VAR1)), ],
sapply(df[,sapply(df,is.numeric)], function(x)
tapply(x, list(df$VAR1,df$VAR2),mean)))
uj5u.com熱心網友回復:
這應該你更接近你想要的。(tapply雖然不使用。)您可以使用outer()將自定義Vectorized 函式應用于"VAR1"和 的組合"VAR2"。我們還使用outerwithpaste來識別組合。
v <- lapply(df[, c('VAR1', 'VAR2')], unique) ## get levels
nm <- c("mpg", "cyl", "disp", "hp", "drat", "wt", "qsec", "vs", "am",
"gear", "carb") ## vector of columns to loop over
對于mean我們可以申請colMeans列子集nm。
V_MEAN <- Vectorize(\(x, y) colMeans(df[df$VAR1 == x & df$VAR2 == y, nm]),
SIMPLIFY=F)
sapply(outer(v[[1]], v[[2]], V_MEAN), I) |>
`colnames<-`(outer(v[[1]], v[[2]], paste, sep='_'))
# A_X B_X A_Y B_Y
# mpg 22.380 17.9900 18.912500 21.975
# cyl 5.800 6.8000 6.500000 5.000
# disp 194.050 284.0700 252.787500 144.900
# hp 126.800 162.7000 168.500000 112.750
# drat 3.749 3.1990 3.743750 3.915
# wt 2.803 3.6453 3.366125 2.885
# qsec 18.446 17.8270 17.166250 17.775
# vs 0.500 0.4000 0.375000 0.500
# am 0.500 0.2000 0.500000 0.500
# gear 3.700 3.3000 3.750000 4.500
# carb 2.100 2.6000 3.625000 3.500
對于weighted.mean,我們在 中使用匿名函式sapply,在其中定義w=。
set.seed(42)
df$wgt <- runif(nrow(df)) ## fabricate weights
V_W_MEAN <- Vectorize(\(x, y) {
dat <- df[df$VAR1 == x & df$VAR2 == y, ]
sapply(dat[, nm], \(z) weighted.mean(z, w=dat[, 'wgt']))
}, SIMPLIFY=F)
sapply(outer(v[[1]], v[[2]], V_W_MEAN), I) |>
`colnames<-`(outer(v[[1]], v[[2]], paste, sep='_'))
# A_X B_X A_Y B_Y
# mpg 21.4040177 18.7455432 17.4546812 20.9215362
# cyl 5.8600143 6.6403719 6.7614803 5.4227808
# disp 199.3270864 274.2125329 264.9109708 145.5146065
# hp 125.5585798 164.2029158 183.3016365 135.5210857
# drat 3.7055945 3.2798961 3.7334560 3.8744869
# wt 2.8267939 3.5926588 3.5684028 2.8345649
# qsec 18.4069666 17.8164896 16.9251482 16.9464657
# vs 0.4587740 0.4358106 0.2869283 0.3257468
# am 0.4255709 0.2440034 0.4766754 0.6742532
# gear 3.6095371 3.3727337 3.7589968 4.6742532
# carb 2.1471308 2.6620944 4.1912125 4.3421659
如果您更喜歡串列作為結果,您可以使用lapply(outer...) |> setNames(outer(...)).
筆記: "R version 4.1.2 (2021-11-01)"
資料:
df <- structure(list(mpg = c(21, 21, 22.8, 21.4, 18.7, 18.1, 14.3,
24.4, 22.8, 19.2, 17.8, 16.4, 17.3, 15.2, 10.4, 10.4, 14.7, 32.4,
30.4, 33.9, 21.5, 15.5, 15.2, 13.3, 19.2, 27.3, 26, 30.4, 15.8,
19.7, 15, 21.4), cyl = c(6, 6, 4, 6, 8, 6, 8, 4, 4, 6, 6, 8,
8, 8, 8, 8, 8, 4, 4, 4, 4, 8, 8, 8, 8, 4, 4, 4, 8, 6, 8, 4),
disp = c(160, 160, 108, 258, 360, 225, 360, 146.7, 140.8,
167.6, 167.6, 275.8, 275.8, 275.8, 472, 460, 440, 78.7, 75.7,
71.1, 120.1, 318, 304, 350, 400, 79, 120.3, 95.1, 351, 145,
301, 121), hp = c(110, 110, 93, 110, 175, 105, 245, 62, 95,
123, 123, 180, 180, 180, 205, 215, 230, 66, 52, 65, 97, 150,
150, 245, 175, 66, 91, 113, 264, 175, 335, 109), drat = c(3.9,
3.9, 3.85, 3.08, 3.15, 2.76, 3.21, 3.69, 3.92, 3.92, 3.92,
3.07, 3.07, 3.07, 2.93, 3, 3.23, 4.08, 4.93, 4.22, 3.7, 2.76,
3.15, 3.73, 3.08, 4.08, 4.43, 3.77, 4.22, 3.62, 3.54, 4.11
), wt = c(2.62, 2.875, 2.32, 3.215, 3.44, 3.46, 3.57, 3.19,
3.15, 3.44, 3.44, 4.07, 3.73, 3.78, 5.25, 5.424, 5.345, 2.2,
1.615, 1.835, 2.465, 3.52, 3.435, 3.84, 3.845, 1.935, 2.14,
1.513, 3.17, 2.77, 3.57, 2.78), qsec = c(16.46, 17.02, 18.61,
19.44, 17.02, 20.22, 15.84, 20, 22.9, 18.3, 18.9, 17.4, 17.6,
18, 17.98, 17.82, 17.42, 19.47, 18.52, 19.9, 20.01, 16.87,
17.3, 15.41, 17.05, 18.9, 16.7, 16.9, 14.5, 15.5, 14.6, 18.6
), vs = c(0, 0, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0,
0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 1, 0, 0, 0, 1), am = c(1,
1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1), gear = c(4, 4, 4, 3,
3, 3, 3, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 4, 4, 4, 3, 3, 3,
3, 3, 4, 5, 5, 5, 5, 5, 4), carb = c(4, 4, 1, 1, 2, 1, 4,
2, 2, 4, 4, 3, 3, 3, 4, 4, 4, 1, 2, 1, 1, 2, 2, 4, 2, 1,
2, 2, 4, 6, 8, 2), VAR1 = c("A", "A", "A", "B", "A", "B",
"B", "B", "A", "A", "B", "B", "B", "A", "B", "A", "B", "A",
"A", "A", "A", "B", "A", "A", "A", "A", "B", "B", "A", "B",
"A", "B"), VAR2 = c("X", "Y", "Y", "X", "X", "X", "X", "Y",
"X", "Y", "Y", "X", "X", "X", "X", "Y", "X", "X", "Y", "X",
"X", "X", "X", "Y", "Y", "X", "Y", "X", "X", "Y", "Y", "X"
)), row.names = c("Mazda RX4", "Mazda RX4 Wag", "Datsun 710",
"Hornet 4 Drive", "Hornet Sportabout", "Valiant", "Duster 360",
"Merc 240D", "Merc 230", "Merc 280", "Merc 280C", "Merc 450SE",
"Merc 450SL", "Merc 450SLC", "Cadillac Fleetwood", "Lincoln Continental",
"Chrysler Imperial", "Fiat 128", "Honda Civic", "Toyota Corolla",
"Toyota Corona", "Dodge Challenger", "AMC Javelin", "Camaro Z28",
"Pontiac Firebird", "Fiat X1-9", "Porsche 914-2", "Lotus Europa",
"Ford Pantera L", "Ferrari Dino", "Maserati Bora", "Volvo 142E"
), class = "data.frame")
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/389574.html
標籤:r
