在R中使用回圈和PDF圖形繪制DNA核苷酸資料-有解無憂

我的老板讓我使用 R 中的 pdf 圖形函式繪制 DNA 核苷酸矩陣。我正在使用一些代碼，但我無法弄清楚并且花了太多時間嘗試！我知道可能還有其他方法/軟體包可以可視化這些遺傳資料，我絕對有興趣聽到它們，但我也需要按照分配給我的方式來做這件事。

我在 R 中有序列資料，如下所示：

> head(b)

Sequence                            X236 X237 X238 X239 X240 X241 X242 X244 X246 X247 X248 X249 X250 X251 X252 X253 X254 X255 X256 X257 X258 X259
1    L19088.1                         G    G    G    G    G    A    G    A    C    C    A    A    G    A    T    G    G    C    C    G    A    A   
2    chr1_43580199_43586187           ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    ·    g    g

共有 1040 行和 483 列，字符可能為 A、a、G、g、T、t、C、c、中點或 X。

我想為不同的字符著色并以類似于熱圖的方式繪制它們。點和 X 不需要著色。到目前為止，我正在使用的代碼是：

pdf( 
  sprintf( 
    "%s/L1.pdf", 
    out_dir),
  width = 8.5, height = 11 )
par(omi = rep(0.5,4))
par(mai = rep(0.5,4))
par(bg  = "#eeeeee")
plot( NULL, 
      xlim = c(1,100), ylim = c(1,140), 
      xlab = NA,       ylab = NA, 
      xaxt = "n",      yaxt = "n",
      bty = "n",       asp = 1 )

plot_width <- 100
w <- plot_width / 600


genome_colors <- list()
genome_colors[["A"]] <- "#ea0064"
genome_colors[["a"]] <- "#ea0064"
genome_colors[["C"]] <- "#008a3f"
genome_colors[["c"]] <- "#008a3f"
genome_colors[["G"]] <- "#116eff"
genome_colors[["g"]] <- "#116eff"
genome_colors[["T"]] <- "#cf00dc"
genome_colors[["t"]] <- "#cf00dc"

I <- nrow(b)
J <- ncol(b)
for ( i in 1:I ){
 for ( j in i:J ){
 # plot nucleotide as rectangle with color and text label, something like:

 # plot nucleotides with genome_colors
 # rect( (j-1)*w, top-(i-1)*w, j*w, top-i*w, col = color, border = NA )
 }
 # text( (j 1)*w, top-(i-1)*w, labels = i, cex = 0.05, col = "#dddddd" )
}
dev.off()

如果有人可以幫助我進行繪圖回圈或指出一個有用的方向，我將非常感激！

uj5u.com熱心網友回復：

假設df是您的寬格式資料框（每個位置一列，每個序列一行），例如：

df <- structure(list(sequence = c("L19088.1", "chr1_43580199_43586187"
), X236 = c("G", "."), X237 = c("G", "."), X238 = c("A", "a"), 
    X239 = c("T", "C"), X240 = c("A", "c"), X241 = c("G", "G"
    )), class = "data.frame", row.names = 1:2)

## > df
##                 sequence X236 X237 X238 X239 X240 X241
## 1               L19088.1    G    G    A    T    A    G
## 2 chr1_43580199_43586187    .    .    a    C    c    G

...您可以像這樣使用包ggplot2和tidyr來自tidyverse：

library(tidyr)
library(ggplot2)

df %>%
  ## reshape to long table
  ## (one column each for sequence, position and nucleotide):
  pivot_longer(-sequence, ## stack all columns *except* sequence
               names_to = 'position',
               values_to = 'nucleotide'
               ) %>%
  ## create the plot:
  ggplot()  
  geom_tile(aes(x = position, y = sequence, fill = nucleotide),
              height = .9 ## adjust to visually separate sequences
            )  
  scale_fill_manual(values = c('A'='#ea0064', 'a'='#ea0064', 'C'='#008a3f',
                              'c'='#008a3f', 'G'='#116eff', 'g'='#116eff',
                              'T'='#cf00dc', 't'='#cf00dc', '.'='#a0a0a0'
                              )
                    )  
  labs(x = 'x-axis-title', y='y-axis-title')  
  ## remove x-axis (=position) elements: they'll probably be too dense:
  theme(axis.title.x = element_blank(),
        axis.text.x = element_blank(),
        axis.ticks.x = element_blank()
        )

^^^ 為便于造型，請參見例如ggplot 主題

使用便利包裝器保存繪圖ggsave：

  ggsave(filename = 'my_plot.pdf',
       width = 12, ## inches; to fill DIN A4 landscape
       height = 8
       )

使用該pdf()功能時，不要忘記明確print您的情節：

pdf(file = 'my_plot.pdf',
    ## ... other parameters
)
print( ## you need to print the plot
  qplot(data = cars, x = speed, y = dist, geom = 'point')
)
dev.off()

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/462035.html

標籤：r for循环 pdf 热图 DNA序列

上一篇：使用glob串行列印檔案

下一篇：將pdf從A4拆分為A6季度，不要保存空的季度