根據Rdplyr中的條件將字符轉換為NA-有解無憂

我有一個看起來像這樣的資料框

library(tidyverse)
df3 <- tibble(col1 = c("apple",rep("banana",3)),
              col2 = c("aple", "banan","bananb","banat"), 
              count_col1 = c(1,4,4,4), 
              count_col2 = c(4,1,1,1))
df3
#> # A tibble: 4 × 4
#>   col1   col2   count_col1 count_col2
#>   <chr>  <chr>       <dbl>      <dbl>
#> 1 apple  aple            1          4
#> 2 banana banan           4          1
#> 3 banana bananb          4          1
#> 4 banana banat           4          1

^{由reprex 包于 2022-02-17 創建(v2.0.1)}

我想 group_by col1 并且當 count_col2 > count_col1 時 col1 的值被轉換為 NA，

當 count_col1 > count_col2 時 col2 的值被轉換為 NA。

我希望我的資料看起來像這樣

#> # A tibble: 4 × 4
#>   col1    col2        count_col1 count_col2
#>   <chr>   <chr>          <dbl>      <dbl>
#> 1   NA    aple             1          4
#> 2 banana   NA              4          1
#> 3 banana   NA              4          1
#> 4 banana   NA              4          1

我不確定這是否可以通過 mutate(case_when...) 我到目前為止失敗

df3 %>% 
  group_by(col1) %>% 
  mutate(case_when(count_col2 > count_col1 ~ col1==NA,
                   count_col1 > count_col2 ~ col2==NA ))

uj5u.com熱心網友回復：

您可以使用來實作所需的輸出ifelse()，即

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df3 <- tibble(col1 = c("apple",rep("banana",3)),
              col2 = c("aple", "banan","bananb","banat"), 
              count_col1 = c(1,4,4,4), 
              count_col2 = c(4,1,1,1))
df3
#> # A tibble: 4 × 4
#>   col1   col2   count_col1 count_col2
#>   <chr>  <chr>       <dbl>      <dbl>
#> 1 apple  aple            1          4
#> 2 banana banan           4          1
#> 3 banana bananb          4          1
#> 4 banana banat           4          1

df3 %>% 
  group_by(col1) %>% 
  mutate(col1 = ifelse(count_col2 > count_col1, NA, col1),
         col2 = ifelse(count_col1 > count_col2, NA, col2))
#> # A tibble: 4 × 4
#> # Groups:   col1 [2]
#>   col1   col2  count_col1 count_col2
#>   <chr>  <chr>      <dbl>      <dbl>
#> 1 <NA>   aple           1          4
#> 2 banana <NA>           4          1
#> 3 banana <NA>           4          1
#> 4 banana <NA>           4          1

^{由reprex 包于 2022-02-18 創建(v2.0.1)}

或與case_when()：

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

df3 <- tibble(col1 = c("apple",rep("banana",3)),
              col2 = c("aple", "banan","bananb","banat"), 
              count_col1 = c(1,4,4,4), 
              count_col2 = c(4,1,1,1))
df3
#> # A tibble: 4 × 4
#>   col1   col2   count_col1 count_col2
#>   <chr>  <chr>       <dbl>      <dbl>
#> 1 apple  aple            1          4
#> 2 banana banan           4          1
#> 3 banana bananb          4          1
#> 4 banana banat           4          1

df3 %>% 
  group_by(col1) %>% 
  mutate(col1 = case_when(count_col2 > count_col1 ~ NA_character_,
                          TRUE ~ col1),
         col2 = case_when(count_col1 > count_col2 ~ NA_character_, 
                          TRUE ~ col2))
#> # A tibble: 4 × 4
#> # Groups:   col1 [2]
#>   col1   col2  count_col1 count_col2
#>   <chr>  <chr>      <dbl>      <dbl>
#> 1 <NA>   aple           1          4
#> 2 banana <NA>           4          1
#> 3 banana <NA>           4          1
#> 4 banana <NA>           4          1

^{由reprex 包于 2022-02-18 創建(v2.0.1)}

這能解決你的問題嗎？

uj5u.com熱心網友回復：

我不確定你是否真的需要在group_by這里，因為即使group_by每個值count_col1都與對應的值進行比較count_col2。在“組”內沒有發生任何事情。

這是一個基本的 R 選項 -

df3$col1[df3$count_col2 > df3$count_col1] <- NA
df3$col2[df3$count_col1 > df3$count_col2] <- NA
df3

#  col1   col2  count_col1 count_col2
#  <chr>  <chr>      <dbl>      <dbl>
#1 NA     aple           1          4
#2 banana NA             4          1
#3 banana NA             4          1
#4 banana NA             4          1

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/426657.html

標籤：r dplyr 数据表 tidyverse 蒂迪尔

上一篇：在Rggplot中，用兩個不同的子集做一個散點圖

下一篇：R：使用mapply作為兩個向量的函式