這是我的資料:
mydata <- structure(list(group = c("a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a", "a",
"a", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b",
"b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b", "b"),
age = c(27, 21, 27, 21, 24, 27, 27, 30, 27, 27, 27, 27, 27,
21, 27, 24, 24, 27, 21, 24, 30, 30, 24, 24, 27, 18, 27, 24,
18, 18, 9, 12, 15, 12, 15, 12, 9, 9, 12, 15, 15, 18, 18,
15, 21, 21, 15, 21, 12, 21, 15, 30, 21, 18, 21, 21, 24, 21,
24, 24, 27, 24, 18, 27, 9, 21, 27, 21, 21, 21, 27, 24, 27,
24, 30, 30, 30, 27, 27, 24, 27, 27, 24, 24, 30, 27, 27, 30,
21, 24, 21, 27, 24, 24, 24, 24, 24, 24, 24, 21, 34, 25, 27,
35, 27, 28, 32, 33, 32, 9, 9, 8, 15, 29, 30, 10, 40, 31,
27, 40, 28, 31, 17, 19, 35, 29, 23, 15, 16, 26, 27, 25, 23,
24, 25, 25, 13, 36, 25, 27, 35, 35, 24, 21, 25, 10, 23, 5,
34, 21)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L, 21L, 22L,
23L, 24L, 25L, 26L, 27L, 28L, 29L, 30L, 31L, 32L, 33L, 34L, 35L,
36L, 37L, 38L, 39L, 40L, 41L, 42L, 43L, 44L, 45L, 46L, 47L, 48L,
49L, 50L, 51L, 52L, 53L, 54L, 55L, 56L, 57L, 58L, 59L, 60L, 61L,
62L, 63L, 64L, 65L, 66L, 67L, 68L, 69L, 70L, 71L, 72L, 73L, 74L,
75L, 76L, 77L, 78L, 79L, 80L, 81L, 82L, 83L, 84L, 85L, 86L, 87L,
88L, 89L, 90L, 91L, 92L, 93L, 94L, 95L, 96L, 97L, 98L, 99L, 100L,
101L, 102L, 103L, 104L, 105L, 106L, 107L, 108L, 109L, 110L, 111L,
112L, 113L, 114L, 115L, 116L, 117L, 118L, 119L, 120L, 121L, 122L,
123L, 124L, 125L, 126L, 127L, 128L, 129L, 130L, 132L, 133L, 134L,
135L, 136L, 137L, 138L, 139L, 140L, 141L, 142L, 143L, 144L, 145L,
146L, 147L, 148L, 149L, 150L, 151L), class = "data.frame")
這是我用抖動(ggplot2::geom_point)及其分布(ggdist::stat_halfeye)繪制點的代碼:
library(ggplot2)
library(ggdist)
ggplot(mydata, aes(x = group, y = age))
ggdist::stat_halfeye(
adjust = .5,
width = .6,
.width = 0,
justification = -.3,
point_colour = NA)
geom_point(
size = 1.3,
alpha = .3,
position = position_jitter(
seed = 1, width = .1
)
)
下圖:

兩組的年齡值都是整數,但是很明顯,b 組中的點抖動程度更大。同樣,b 組的分布比 a 組平滑得多。為什么會出現這種情況,我怎樣才能使組 a 的分布更平滑,就像組 b 的情況一樣?
uj5u.com熱心網友回復:
您的資料在兩組中都是整數,但根本不是均勻分布的。看這個:
mydata %>% group_by(group) %>% summarize(n_distinct(age))
# # A tibble: 2 × 2
# group `n_distinct(age)`
# <chr> <int>
# 1 a 8
# 2 b 25
mydata %>% count(group, age)
# group age n
# 1 a 9 4
# 2 a 12 5
# 3 a 15 7
# 4 a 18 7
# 5 a 21 19
# 6 a 24 24
# 7 a 27 25
# 8 a 30 9
# 9 b 5 1
# 10 b 8 1
# 11 b 9 2
# 12 b 10 2
# 13 b 13 1
# 14 b 15 2
# 15 b 16 1
# 16 b 17 1
# 17 b 19 1
# 18 b 21 2
# 19 b 23 3
# 20 b 24 2
# 21 b 25 6
# 22 b 26 1
# 23 b 27 5
# 24 b 28 2
# 25 b 29 2
# 26 b 30 1
# 27 b 31 2
# 28 b 32 2
# 29 b 33 1
# 30 b 34 2
# 31 b 35 4
# 32 b 36 1
# 33 b 40 2
a 組有 8 個不同的年齡值,幾乎所有年齡都是 21 歲、24 歲和 27 歲,每個年齡都在 19 歲或以上。a 組中沒有其他 19-29 的整數。
b 組有 25 個不同的年齡組,最大數量為 6。比 a 組分布更均勻,差距也少得多。
很明顯,b組中的點抖動程度更大。
這一點我都不清楚。來自?position_jitter幫助頁面對height和width引數的描述:“如果省略,則默認為資料解析度的 40%:這意味著抖動值將占據 80% 的隱含 bin”。我不確定這個“資料解析度”是否是按組計算的。我懷疑是這樣,我的猜測是抖動均勻地應用于兩組。(實際上,我會說很明顯,您的圖中 a 組中的高度抖動不占用 bin 寬度的 80%,因此兩組中的點抖動到相同的量。)在任何一種情況下,我d 建議指定抖動的高度和寬度,并保持較小,比如說0.1或更少。
同樣,b 組的分布比 a 組平滑得多。
沒錯,b 組的資料比 a 組的分布更平滑。這是您資料的準確反映。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/523467.html
下一篇:圖例的固定位置
