我有一個這樣的資料框:
seqnames pos strand nucleotide count
id1 12 A 13
id1 13 C 25
id2 24 G 10
id2 25 T 25
id2 26 A 10
id3 10 C 5
但它總共有超過 100,000 行,seqnames有 3138 個級別。我想根據 seqnames 將它拆分為資料框串列,所以我使用了 split 函式:
data_list <- split(data,data$seqnames)
但它只回傳如下內容:
List of 3138
$ id1:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels " ","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
$ id2:'data.frame': 0 obs. of 6 variables:
..$ seqnames : Factor w/ 3138 levels "id1","id2",..:
..$ pos : int(0)
..$ strand : Factor w/ 3 levels " ","-","*":
..$ nucleotide: Factor w/ 8 levels "A","C","G","T",..:
..$ count : int(0)
..$ sample_id : chr(0)
我不知道為什么會這樣,因為我已經在一個包含所有數字的組成資料幀上使用了它(當然,沒有這個行那么多)并且它可以作業。我怎么解決這個問題?
uj5u.com熱心網友回復:
只是有許多未使用的級別,因為列 'seqnames' 是factor. 使用split,有一個選項drop(drop = TRUE- 默認情況下是FALSE)來洗掉這些串列元素。否則,它們將回傳data.frame0 行。如果我們希望將這些元素替換為NULL,則找到那些行數 ( nrow) 為 0 的元素并將其分配給NULL
data_list <- split(data,data$seqnames)
> str(data_list)
List of 5
$ id1:'data.frame': 2 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 1 1
..$ pos : int [1:2] 12 13
..$ strand : chr [1:2] " " " "
..$ nucleotide: chr [1:2] "A" "C"
..$ count : int [1:2] 13 25
$ id2:'data.frame': 3 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 2 2 2
..$ pos : int [1:3] 24 25 26
..$ strand : chr [1:3] " " " " " "
..$ nucleotide: chr [1:3] "G" "T" "A"
..$ count : int [1:3] 10 25 10
$ id3:'data.frame': 1 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 3
..$ pos : int 10
..$ strand : chr " "
..$ nucleotide: chr "C"
..$ count : int 5
$ id4:'data.frame': 0 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..:
..$ pos : int(0)
..$ strand : chr(0)
..$ nucleotide: chr(0)
..$ count : int(0)
$ id5:'data.frame': 0 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..:
..$ pos : int(0)
..$ strand : chr(0)
..$ nucleotide: chr(0)
..$ count : int(0)
做任務NULL
data_list[sapply(data_list, nrow) == 0] <- list(NULL)
-再檢查一遍
> str(data_list)
List of 5
$ id1:'data.frame': 2 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 1 1
..$ pos : int [1:2] 12 13
..$ strand : chr [1:2] " " " "
..$ nucleotide: chr [1:2] "A" "C"
..$ count : int [1:2] 13 25
$ id2:'data.frame': 3 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 2 2 2
..$ pos : int [1:3] 24 25 26
..$ strand : chr [1:3] " " " " " "
..$ nucleotide: chr [1:3] "G" "T" "A"
..$ count : int [1:3] 10 25 10
$ id3:'data.frame': 1 obs. of 5 variables:
..$ seqnames : Factor w/ 5 levels "id1","id2","id3",..: 3
..$ pos : int 10
..$ strand : chr " "
..$ nucleotide: chr "C"
..$ count : int 5
$ id4: NULL
$ id5: NULL
資料
data <- structure(list(seqnames = structure(c(1L, 1L, 2L, 2L, 2L,
3L), .Label = c("id1",
"id2", "id3", "id4", "id5"), class = "factor"), pos = c(12L,
13L, 24L, 25L, 26L, 10L), strand = c(" ", " ", " ", " ", " ",
" "), nucleotide = c("A", "C", "G", "T", "A", "C"), count = c(13L,
25L, 10L, 25L, 10L, 5L)), row.names = c(NA, -6L), class = "data.frame")
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/442225.html
上一篇:如何根據R中的值合并資料框列
