我有以下串列:
MoreInfos <- list("\n \n London\n \n \n \n Service Green\n \n \n \n Posted: 02 Feb 2022\n \n \n \n ",
"\n \n London\n \n \n \n Service Green\n \n \n \n Posted: 21 Oct 2021\n \n \n \n ",
"\n \n London\n \n \n \n Service Green\n \n \n \n Posted: 18 Mar 2021\n \n \n \n ",
"\n \n London\n \n \n \n Service Green\n \n \n \n Posted: 14 Nov 2021\n \n \n \n ",
"\n \n San Francisco, Singapore, London\n \n \n \n Services & Solutions\n \n \n \n Posted: 30 Jan 2020\n \n \n \n ",
"\n \n San Francisco, Singapore, London\n \n \n \n Solutions\n \n \n \n Posted: 08 Jan 2002\n \n \n \n ")
我想擺脫串列中的所有“/n”和空格。此外,我需要提取三個字串部分(城市、服務、日期)以分隔新資料框中的列并格式化日期。
輸出應如下所示:
> df
City Service Date
1 London Service Green 02.02.2022
2 London Service Green 21.10.2021
3 London Service Green 18.03.2021
4 London Service Green 14.11.2021
5 San Francisco, Singapore, London Services & Solutions 30.01.2020
6 San Francisco, Singapore, London Solutions 08.01.2002
現在我嘗試了str_replace_all和gsub. 但對我來說,這似乎很復雜。
MoreInfos <- str_replace_all(MoreInfos, c("\n"),"|" )
MoreInfos <- gsub("(\\S)\\s{2,}", "\\1", MoreInfos, perl=TRUE)
MoreInfos <- str_replace_all(MoreInfos, c("\\|\\|\\|\\|"),"|" )
我確信有一個簡單的解決方案。
uj5u.com熱心網友回復:
a <-gsub('[ \n]{2,}', ':', sub('Posted:', '', trimws(unlist(MoreInfos))))
read.table(text=a, col.names = c('City', 'Service', 'Date'), sep=':') |>
transform(Date = as.Date(Date, "%d %b %Y"))
City Service Date
1 London Service Green 2022-02-02
2 London Service Green 2021-10-21
3 London Service Green 2021-03-18
4 London Service Green 2021-11-14
5 San Francisco, Singapore, London Services & Solutions 2020-01-30
6 San Francisco, Singapore, London Solutions 2002-01-08
uj5u.com熱心網友回復:
這是一種可能可以進一步簡化的方法。我用換行符分割字串并提取第 3、7 和 11 個元素,它們對應于您要提取的三個變數:
library(stringr)
library(dplyr)
MoreInfos %>%
str_split(pattern = "\\n", simplify = TRUE) %>%
as_tibble() %>%
select(3, 7, 11) %>%
mutate(
across(where(is.character), trimws),
V11 = dmy(str_remove(V11, "Posted: "))
) %>%
rename(City = V3, Service = V7, Date = V11)
輸出:
# A tibble: 6 × 3
City Service Date
<chr> <chr> <date>
1 London Service Green 2022-02-02
2 London Service Green 2021-10-21
3 London Service Green 2021-03-18
4 London Service Green 2021-11-14
5 San Francisco, Singapore, London Services & Solutions 2020-01-30
6 San Francisco, Singapore, London Solutions 2002-01-08
轉載請註明出處,本文鏈接:https://www.uj5u.com/caozuo/444472.html
