如何修改 geom_boxplot 中例外值的形狀以匹配樣本 ID。想象一下我有這樣的資料(這只是虛擬資料,代碼可能不漂亮,但這就是我想出的):
# create dummy data
df <- data.frame()
set.seed(42)
os <- 0
sam <- 1
for (time in as.factor(c('T0', 'T1'))) {
if (time == 'T1') {
sam <- 1
}
for (group in as.factor(c('A','B'))) {
for (pat in 1:10) {
df[pat os, 'Sample'] <- paste('P', pat, '_', sam, sep = '')
df[pat os, 'Time'] <- time
df[pat os, 'Group'] <- group
df[pat os, 'Value'] <- rnorm(1) os
# add outlier, they are the same in each group in this example,
# but can differ in the real data set
if (pat == 2 | pat == 9) {
print(pat)
df[pat os, 'Value'] <- df[pat os, 'Value'] 10
}
sam <- sam 1
}
os <- os 10
}
}
# mark outliers in table
df = df %>%
group_by(Group,Time) %>%
mutate(is_outlier = case_when(Value > quantile(Value)[4] 1.5*IQR(Value) ~ TRUE,
Value < quantile(Value)[2] - 1.5*IQR(Value) ~ TRUE,
TRUE ~ FALSE))
這導致以下情節:
ggplot(df, aes(x = Time,
y = Value,
label = Time))
geom_boxplot(outlier.colour = 'red',
outlier.shape = 1,
outlier.size = 2
)
facet_grid(~factor(Group),
switch = 'x',
scales = 'free_y')

目標:
我想要的是對于每個組A,或者B我可以查看例外值是否相同。因此,例如,A T0顯示的例外值與 中的相同A T1。更具體地說,被視為一個圓圈的例外值A T0應該是一個圓圈,A T1而第二個例外值A T1應該是任何其他形狀(例如三角形)。由于我的原始資料大約有 5/6 個時間點,因此很高興通過查看圖來了解例外值是否仍然是例外值。在某些情況下,我的原始資料集大約有 5-8 個例外值。
In group B we can reuse the same shapes as in group A although we have different sample ID's than in group A.
I want to use basic shapes like triangles, circles, Asterix and so on (I know the shapes are limited but for my kind of dataset it should suffice). I also know that I can label the data points, but that I don't want. Different colour would be okay too, but I'd prefer different shapes.
I guess I have to calculate outliers separately and then maybe use geom_point with aes(shape = df$Sample) or something. But I can't figure it out.
Does anybody has a hint or a solution based on my dummy data? That would be awesome :-)
Best TMC
uj5u.com熱心網友回復:
我想出了一個非常丑陋的解決方案。我很確定有一種更漂亮的方法可以做到這一點,但這里是完整的代碼:
首先我們創建虛擬資料:
# start with an clean environment
rm(list=ls())
# create a function to load or install all necessary libraries
install.load.package <- function(x) {
if (!require(x, character.only = TRUE))
install.packages(x)
require(x, character.only = TRUE)
}
package_vec <- c("ggplot2",
"dplyr"
)
sapply(package_vec, install.load.package)
# now to the data
df <- data.frame()
set.seed(42)
os <- 0
sam <- 1
for (time in as.factor(c('T0', 'T1'))) {
if (time == 'T1') {
sam <- 1
}
for (group in as.factor(c('A','B'))) {
for (pat in 1:10) {
df[pat os, 'Sample'] <- paste('P', pat, '_', sam, sep = '')
df[pat os, 'Time'] <- time
df[pat os, 'Group'] <- group
df[pat os, 'Value'] <- rnorm(1) os
# add outlier, they are the same in each group in this example,
# but can differ in the real data set
if (pat == 2 | pat == 9) {
print(pat)
df[pat os, 'Value'] <- df[pat os, 'Value'] 10
}
sam <- sam 1
}
os <- os 10
}
}
然后我們計算例外值如下,并創建一個新列,放置例外值的ID。如果它不是例外值,則插入“X”
# calculate outliers
df = df %>%
group_by(Group,Time) %>%
mutate(is_outlier = case_when(Value > quantile(Value)[4] 1.5*IQR(Value) ~ as.character(Sample),
Value < quantile(Value)[2] - 1.5*IQR(Value) ~ as.character(Sample),
TRUE ~ as.character('X')))
df$Group <- as.factor(df$Group)
現在,我們將 Sample ID 替換為一個數字。第一個例外值對獲得數字 1,第二個獲得數字 2,依此類推。如果例外值多于可用的“geom_points”形狀,則必須調整代碼。但是讓我們假設我們沒有超過 23 個例外值(我認為這是最大數量)。
for (group in levels(df$Group)) {
count <- 1
for (id in levels(as.factor(df$is_outlier[which(df$Group == group)]))) {
if (id == 'X') {
df[which(df$is_outlier == id), 'is_outlier'] <- as.character(NA)
} else {
df[which(df$is_outlier == id), 'is_outlier'] <- as.character(count)
count <- count 1
}
}
}
這會覆寫先前創建的列。它NA為 X 值引入了 's。
現在我們可以繪制資料了
ggplot(df, aes(x = Time,
y = Value,
label = Time))
geom_boxplot(outlier.shape = NA)
geom_point(data = df,
shape= as.numeric(df$is_outlier),
color = 'red')
facet_grid(~factor(Group),
switch = 'x',
scales = 'free_y')
這導致了這個情節:

Now we can see if an outlier stays an outlier from T0 to T1. Be aware that in Group B we use the same shape. But these are totally different samples. One has to adapt the code above the plotting code to account for this. But this way we would have potentially less shapes available.
If one of you has a smoother and more elegant solution, I'd be happy to learn.
Best TMC
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/450743.html
上一篇:如何在R中的圖中添加圖形的名稱?
下一篇:如何添加45度線?
