如何將句子拆分為R中的新變數（使用零一編碼）？-有解無憂

我有如下資料：

V1  V2
1   orange, apple
2   orange, lemon
3   lemon, apple
4   orange, lemon, apple
5   lemon
6   apple
7   orange
8   lemon, apple

我想像這樣拆分 V2 變數：

我有 V2 列的三個類別：“橙子”、“檸檬”、“蘋果”
對于我想創建一個新列（變數）的每個類別，它將告知此類名稱是否出現在 V2 (0,1)

我試過這個

df %>% separate(V2, into = c("orange", "lemon", "apple"))

..我得到了這個結果，但這不是我所期望的。

  V1 orange lemon apple
1  1   orange   apple    <NA>
2  2   orange   lemon    <NA>
3  3    lemon   apple    <NA>
4  4   orange   lemon   apple
5  5    lemon    <NA>    <NA>
6  6    apple    <NA>    <NA>
7  7   orange    <NA>    <NA>
8  8    lemon   apple    <NA>

我的意思是結果如下。

V1  orange  lemon   apple
1   1   0   1
2   1   1   0
3   0   1   1
4   1   1   0
5   0   1   0
6   0   0   1
7   1   0   0
8   0   1   1

uj5u.com熱心網友回復：

你可以嘗試旋轉：

library(dplyr)
library(tidyr)
df |> 
  separate_rows(V2, sep = ", ") |> 
  mutate(ind = 1) |> 
  pivot_wider(names_from = V2,
              values_from = ind,
              values_fill = 0)

輸出是：

# A tibble: 8 × 4
     V1 orange apple lemon
  <int>  <dbl> <dbl> <dbl>
1     1      1     1     0
2     2      1     0     1
3     3      0     1     1
4     4      1     1     1
5     5      0     0     1
6     6      0     1     0
7     7      1     0     0
8     8      0     1     1

我使用的資料：

V1 <- 1:8
V2 <- c("orange, apple", "orange, lemon", 
        "lemon, apple", "orange, lemon, apple",
        "lemon", "apple", "orange", 
        "lemon, apple")
df <- tibble(V1, V2)

uj5u.com熱心網友回復：

我們可能會使用dummy_cols

library(stringr)
library(fastDummies)
library(dplyr)
dummy_cols(df, "V2", split = ",\\s ", remove_selected_columns = TRUE) %>% 
  rename_with(~ str_remove(.x, '.*_'))

-輸出

# A tibble: 8 × 4
     V1 apple lemon orange
  <int> <int> <int>  <int>
1     1     1     0      1
2     2     0     1      1
3     3     1     1      0
4     4     1     1      1
5     5     0     1      0
6     6     1     0      0
7     7     0     0      1
8     8     1     1      0

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/527010.html

標籤：r分裂句子

上一篇：我想總結類似國家的銷售額

下一篇：繪制具有邊際效應的人口水平預測