用r中的列中的字符分隔字串-有解無憂

我在資料框中有一列，價格如下：

prices
$1,50 $1,20
$1,50
$1,75 $1,25 $1,35

總之，在每一列中，我可以有很多價格。我想要的是獲得與 $ 分開的不同列，這是我根據我提出的示例所需要的。

prices               price1 price2 price3
$1,50 $1,20          1,50   1,20   NA
$1,50                1,50   NA     NA
$1,75 $1,25 $1,35    1,75   1,25   1,35

我嘗試了以下方法，但沒有一個選項能滿足我的需要。幫助

str_split(prices, pattern = '[$]') # I get a column with values like this c("", "1,50")
separate(prices, sep = '[$]', into = c("price1", "price2"), remove = FALSE) 
#Price1 is created empty and I am trying to use it in a function, 
#so in some dataframes the number of prices can vary.

uj5u.com熱心網友回復：

一種方法使用dplyr：

df %>% 
  rowwise() %>% 
  mutate(price = list(gsub("$", "",strsplit(prices, " ")[[1]],fixed = T))) %>% 
  unnest_wider(price,names_sep = "")

輸出：

  prices            price1 price2 price3
  <chr>             <chr>  <chr>  <chr> 
1 $1,50 $1,20       1,50   1,20   NA    
2 $1,50             1,50   NA     NA    
3 $1,75 $1,25 $1,35 1,75   1,25   1,35

輸入：

df = structure(list(prices = c("$1,50 $1,20", "$1,50", "$1,75 $1,25 $1,35"
)), class = "data.frame", row.names = c(NA, -3L))

uj5u.com熱心網友回復：

在基礎 R 中，您可以執行以下操作：

read.table(text=df$prices, fill=TRUE, header = FALSE, sep='$', dec = ',')[-1]
    V2   V3   V4
1 1.50 1.20   NA
2 1.50   NA   NA
3 1.75 1.25 1.35

如果您不希望它們作為數字而是作為其中的字符，,您可以執行以下操作：

read.table(text=df$prices, fill=TRUE, header=FALSE, sep='$', na.strings='')[-1]
     V2    V3   V4
1 1,50   1,20 <NA>
2  1,50  <NA> <NA>
3 1,75  1,25  1,35

您可以更改名稱：即將名稱設定為paste0('prices', seq(ncol(df1))

uj5u.com熱心網友回復：

如果您的默認語言環境使用逗號作為小數分隔符，則：

library(tidyverse)
options("readr.default_locale" = readr::locale(decimal_mark = ","))

df <- tibble(prices =
               c("$1,50 $1,20",
                 "$1,50",
                 "$1,75 $1,25 $1,35"))
df |>
  mutate(prices = prices |>
           str_split(" ") |>
           map( ~ str_remove(., "\\$"))) |>
  unnest_wider(prices) |>
  mutate(across(.fns = readr::parse_number))
#> New names:
#> New names:
#> New names:
#> ? `` -> `...1`
#> ? `` -> `...2`
#> # A tibble: 3 × 3
#>    ...1  ...2  ...3
#>   <dbl> <dbl> <dbl>
#> 1  1.5   1.2  NA   
#> 2  1.5  NA    NA   
#> 3  1.75  1.25  1.35

否則：

df |>
  mutate(prices = prices |>
           str_split(" ") |>
           map( ~ str_remove(., "\\$"))) |>
  unnest_wider(prices) |>
  mutate(across(.fns = ~ readr::parse_number(., locale = readr::locale(decimal_mark = ","))))
#> New names:
#> New names:
#> New names:
#> ? `` -> `...1`
#> ? `` -> `...2`
#> # A tibble: 3 × 3
#>    ...1  ...2  ...3
#>   <dbl> <dbl> <dbl>
#> 1  1.5   1.2  NA   
#> 2  1.5  NA    NA   
#> 3  1.75  1.25  1.35

uj5u.com熱心網友回復：

與cSplit：

library(splitstackshape)
s <- cSplit(df, "prices", "$", type.convert = T)[, -1]
df[, paste0("price", 1:ncol(s))] <- s

#             prices  price1  price2  price3
#1       $1,50 $1,20    1,50    1,20    <NA>
#2             $1,50    1,50    <NA>    <NA>
#3 $1,75 $1,25 $1,35    1,75    1,25    1,35

uj5u.com熱心網友回復：

在這種方法中，我們使用將資料轉換為長格式separate_rows，使用轉換它，然后使用transform轉換回寬格式reshape。我們混合使用 dplyr、tidyr 和 base 函式，根據哪一個提供更短的代碼，在它們之間進行選擇。

1）添加一個與價格相同的P列，將價格列分成行，添加一個列行，對行進行編號，在價格中對它們進行編號，然后轉換為寬格式。在這種情況下，reshape 的代碼比 pivot_wider 少一點，但可以使用后者。我們還使用了類似于 mutate 的轉換，只是它輸出了我們需要進行整形的資料幀。最后選擇我們需要的東西。

library(dplyr)
library(tidyr)

DF %>% 
  mutate(P = prices, prices = gsub("\\$", "", prices), row = 1:n()) %>% 
  separate_rows(prices, sep = "  ") %>%
  transform(n = ave(1:nrow(.), row, FUN = seq_along))  %>%
  reshape(dir = "wide", idvar = c("row", "P"), timevar = "n", sep = "") %>%
  select(prices = P, everything(), -row)

給予：

             prices prices1 prices2 prices3
1       $1,50 $1,20    1,50    1,20    <NA>
3             $1,50    1,50    <NA>    <NA>
4 $1,75 $1,25 $1,35    1,75    1,25    1,35

2）如果您希望將價格列轉換為數字并且如果小數點在當前語言環境中是點，那么使用它用點替換逗號并添加convert=TRUE到separate_rows. 如果逗號是當前語言環境中的小數點，則省略mutate下面的第二個。

DF %>% 
  mutate(P = prices, prices = gsub("\\$", "", prices), 
         prices = gsub(",", ".", prices),
         row = 1:n()) %>% 
  separate_rows(prices, sep = "  ", convert = TRUE) %>%
  transform(n = ave(1:nrow(.), row, FUN = seq_along))  %>%
  reshape(dir = "wide", idvar = c("row", "P"), timevar = "n", sep = "") %>%
  select(prices = P, everything(), -row)

筆記

可重現形式的輸入：

DF <-
structure(list(prices = c("$1,50 $1,20", "$1,50", "$1,75 $1,25 $1,35"
)), class = "data.frame", row.names = c(NA, -3L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/492581.html

標籤：r 细绳

上一篇：re.split：保留與下一個結果字串的分隔符

下一篇：str_extract所有語法