我有幾個大型資料框,其中包含一列(我們可以稱之為timeperiod),其中的變數是文本字串。所有變數都以特定字串(如V.1to2或V.2to3)結尾,但開頭不同我希望將具有相同結尾的值更改為不同的值。下面是一個例子:
使用這樣的資料框:
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
看起來像這樣:
Location timeperiod
1 a A.V.1to2
2 b D.V.1to2
3 c A.V.1to2
4 d D.V.2to3
5 e A.V.3to4
6 f H.V.3to4
7 g A.V.4to5
8 h D.V.4to5
我預期/希望的輸出如下所示:
df2
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
df2 <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c(1, 1, 1, 2, 3, 3, 4, 4))
我知道:
df$timeperiod[df$timeperiod =="A.V.1to2"] <- "1"
但是由于我的資料集的大小,并且因為我需要對多個資料幀重復此操作,這些資料幀在時間段值的前綴中不一致,我想在 dplyr 中使用這樣的東西:
library(dplyr)
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.1to2)="1"))
df$timeperiod <- revalue(df$timeperiod, c(ends_with(V.2to3)="2"))
#etc..
這樣我就可以在許多不同的值和許多不同的作業表上重復這個程序。但這不起作用,即使這似乎效率低下,因此任何比重命名每個特定值更快的解決方案就足夠了。
謝謝你的幫助。
uj5u.com熱心網友回復:
我們可以使用str_extract:
library(dplyr)
library(stringr)
df %>%
mutate(timeperiod = str_extract(timeperiod, '\\d '))
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
uj5u.com熱心網友回復:
我們可以使用 dplyr 和 stringr。首先提取.的最后6個字符timeperiod。然后,時間段group_by,最后使用cur_group_id
library(dplyr)
library(stringr)
df %>% mutate(timeperiod = str_extract(timeperiod, '.{6}$'))%>%
group_by(timeperiod)%>%
mutate(timeperiod = cur_group_id())%>%
ungroup()
# A tibble: 8 × 2
Location timeperiod
<chr> <int>
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
uj5u.com熱心網友回復:
也許這就是你要找的
df <- data.frame (Location = c("a","b","c","d","e","f","g","h"),
timeperiod = c("A.V.1to2", "D.V.1to2", "A.V.1to2","D.V.2to3","A.V.3to4","H.V.3to4","A.V.4to5","D.V.4to5"))
df$timeperiod <- substr(gsub('[[:alpha:]]|[[:punct:]]', '', df$timeperiod), 1, 1)
df
Location timeperiod
1 a 1
2 b 1
3 c 1
4 d 2
5 e 3
6 f 3
7 g 4
8 h 4
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/376278.html
上一篇:R-如何使用聚合資料表創建圖形?
