我有一個包含 3 列的資料框。第一個具有字串值,其中一些包含逗號,另外兩個:ID 和區域資訊。
df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
column3 = c("area51", "area52", "area53", "area54"))
head(df)
column1 column2 column3
1 ab 34, 35, 36 ID_27 area51
2 cb 23 ID_28 area52
3 df 45, 46 ID_29 area53
4 gh 21 ID_30 area54
我想要做的是修改第一列中的值,以便逗號分隔值消失,將兩個字母前綴應用于每個數值,并在新行中重新排列用逗號分隔的每個值。同時,復制其他列中的值,如下例所示:
new_df <- data_frame(column1 = c("ab34", "ab35", "ab36", "cb23", "df45", "df46", "gh21"),
column2 = c("ID_27", "ID_27", "ID_27", "ID_28", "ID_29", "ID_29", "ID_30"),
column3 = c("area51", "area51", "area51", "area52", "area53", "area53", "area54"))
head(new_df)
# A tibble: 6 x 3
column1 column2 column3
<chr> <chr> <chr>
1 ab34 ID_27 area51
2 ab35 ID_27 area51
3 ab36 ID_27 area51
4 cb23 ID_28 area52
5 df45 ID_29 area53
6 df46 ID_29 area53
有人知道什么 R 代碼可以實作這一點嗎?使用 tidyverse 還是舊方法?順便說一句,無需從資料框轉到 tibble,這僅用于示例。我的目標是轉換資料框。
uj5u.com熱心網友回復:
一個可能的解決方案:
library(tidyverse)
df <- data.frame(column1 = c("ab 34, 35, 36", "cb 23", "df 45, 46", "gh 21"),
column2 = c("ID_27", "ID_28", "ID_29", "ID_30"),
column3 = c("area51", "area52", "area53", "area54"))
df %>%
mutate(column1 = str_replace_all(column1,", ", str_extract(column1,"^\\S ")) %>%
str_remove(.," ")) %>%
separate_rows(column1, sep = "(?<=\\d)(?=\\D)")
#> # A tibble: 7 × 3
#> column1 column2 column3
#> <chr> <chr> <chr>
#> 1 ab34 ID_27 area51
#> 2 ab35 ID_27 area51
#> 3 ab36 ID_27 area51
#> 4 cb23 ID_28 area52
#> 5 df45 ID_29 area53
#> 6 df46 ID_29 area53
#> 7 gh21 ID_30 area54
uj5u.com熱心網友回復:
這是一個(希望)更容易理解的分步方法:
library(dplyr)
library(stringr)
library(tidyr)
df %>%
mutate(
# extract the prefix:
prefix = str_extract(column1,"^\\w "),
# extract the digits in a list:
digits = str_extract_all(column1,"\\d ")) %>%
# cast the list values in `digits` into long format:
unnest_longer(digits) %>%
# paste `prefix`, `digits` together:
mutate(column1 = paste0(prefix, digits)) %>%
# remove obsolete columns:
select(-c(prefix, digits))
# A tibble: 7 × 3
column1 column2 column3
<chr> <chr> <chr>
1 ab34 ID_27 area51
2 ab35 ID_27 area51
3 ab36 ID_27 area51
4 cb23 ID_28 area52
5 df45 ID_29 area53
6 df46 ID_29 area53
7 gh21 ID_30 area54
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/414687.html
標籤:
下一篇:在R中重新分類具有多個范圍的柵格
