我有這樣的資料農場
dummy_data <- structure(list(Date = c("24/06/2002", "24/06/2002", "01/07/2002",
"01/07/2002", "08/07/2002",
"08/07/2002","15/07/2002","17/07/2002",
"22/07/2002", "22/07/2002", "29/07/2002"),
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"T300/500,XYZ", "T300/390,XYZ", "0000,M300", "1234,M678", "ABC")), class =
"data.frame",
row.names = c(NA,
-11L))
在“temp_id”列的某些行中,有一個附加文本。
如何洗掉“,”之前的部分并將其余字串留在列中?
Required output <- dummy_data <- structure(list(Date = c("24/06/2002", "24/06/2002", "01/07/2002", "01/07/2002", "08/07/2002", "08/07/2002","15/07/2002","17/07/2002",
"22/07/2002", "22/07/2002", "29/07/2002"),
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"XYZ", "XYZ", "M300", "M678", "ABC")), class= "data.frame", row.names = c(NA, -11L))
uj5u.com熱心網友回復:
這是你的列Temp_id:
Temp_id= c("ABC", "M567", "M567", "M567", "XYZ", "XYZ",
"T300/500,XYZ", "T300/390,XYZ", "0000,M300", "1234,M678", "ABC"))
哪個:
[1] "ABC" "M567" "M567" "M567" "XYZ" "XYZ" "T300/500,XYZ"
[8] "T300/390,XYZ" "0000,M300" "1234,M678" "ABC"
一種簡單的方法是使用gsub將您指示的正則運算式模式替換為其他運算式的函式。在這種情況下,我們表示從行首到第一個逗號 - ^.*, - 的所有內容都替換為空 - '' 。
gsub('^.*,','',Temp_id)
[1] "ABC" "M567" "M567" "M567" "XYZ" "XYZ" "XYZ" "XYZ" "M300" "M678" "ABC"
如果您不了解正則運算式符號:
^ -> 行首,. -> 每個字符,* -> 重復上一個 ' 。' 直到下一個符號匹配,, -> 以逗號停止
應用于資料框:
dummy_data$Temp_id = gsub('^.*,','',dummy_data$Temp_id)
> dummy_data
Date Temp_id
1 24/06/2002 ABC
2 24/06/2002 M567
3 01/07/2002 M567
4 01/07/2002 M567
5 08/07/2002 XYZ
6 08/07/2002 XYZ
7 15/07/2002 XYZ
8 17/07/2002 XYZ
9 22/07/2002 M300
10 22/07/2002 M678
11 29/07/2002 ABC
uj5u.com熱心網友回復:
與... dplyr_stringr
library(dplyr)
library(stringr)
dummy_data |>
mutate(Temp_id = case_when(str_detect(Temp_id, ",") ~ str_extract(Temp_id, "(?<=,).*$"),
TRUE ~ Temp_id))
#or using `ifelse()`
dummy_data |>
mutate(Temp_id = ifelse(str_detect(Temp_id, ","),
str_extract(Temp_id, "(?<=,).*$"),
Temp_id))
#> Date Temp_id
#> 1 24/06/2002 ABC
#> 2 24/06/2002 M567
#> 3 01/07/2002 M567
#> 4 01/07/2002 M567
#> 5 08/07/2002 XYZ
#> 6 08/07/2002 XYZ
#> 7 15/07/2002 XYZ
#> 8 17/07/2002 XYZ
#> 9 22/07/2002 M300
#> 10 22/07/2002 M678
#> 11 29/07/2002 ABC
使用reprex v2.0.2創建于 2022-10-13
uj5u.com熱心網友回復:
這也有效:
lirary(dplyr)
library(stringr)
dummy_data %>%
mutate(Temp_id = str_extract(Temp_id, "[^,] $"))
Date Temp_id
1 24/06/2002 ABC
2 24/06/2002 M567
3 01/07/2002 M567
4 01/07/2002 M567
5 08/07/2002 XYZ
6 08/07/2002 XYZ
7 15/07/2002 XYZ
8 17/07/2002 XYZ
9 22/07/2002 M300
10 22/07/2002 M678
11 29/07/2002 ABC
這里匹配任何不是[^,] $逗號的字符序列,直到字串的結尾(),因此有效地洗掉逗號之前的任何部分(包括逗號本身)(如果存在)。^$
或者,我們可以這樣base R做:
sub(".*?([^,] )$", "\\1", dummy_data$Temp_id)
where是任何在字串的結尾 ( ) 之前不是( ) 逗號的.*?任何字符序列之前的任何內容的“惰性”匹配,并且 where是反向參考,它參考由捕獲的序列^$\\1(...)
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/516524.html
標籤:r细绳数据框
上一篇:僅向某些行名添加前綴和后綴
下一篇:用字串串列替換熊貓資料框中的字串
