如何使用R中的跳過規則為最低和最高日期創建指標變數？-有解無憂

我想要一個指示變數，它告訴我在按 ID 分組時日期是最低還是最高。但是，我不希望它把任何有實踐的東西都算為 1。這就是資料框現在的樣子，也是我希望它看起來的樣子。

ID	日期	實踐
1	2020 年 2 月 27 日	1
1	2021 年 4 月 21 日	0
1	2022 年 6 月 24 日	0
2	03-21-2019	0
2	09-19-2020	0
2	01-21-2021	0

成品：

ID	日期	實踐	最低	最高
1	2020 年 2 月 27 日	1	0	0
1	2021 年 4 月 21 日	0	1	0
1	2022 年 6 月 24 日	0	0	1
2	03-21-2019	0	1	0
2	09-19-2020	0	0	0
2	01-21-2021	0	0	1

uj5u.com熱心網友回復：

此代碼使用tidyverse. 請注意，我必須強制date輸入日期格式；默認情況下，mm-dd-YYYY 格式被讀取為字符，這會導致讀取不同的值作為最小值和最大值min(date)。max(date)

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
library(tidyverse)

df1$date <- as.Date(df1$date, format = "%m-%d-%Y")

desired_result <- df1 %>%
  group_by(ID) %>%
  mutate(
    lowest = ifelse(date == min(date[practice == 0]), 1, 0),
    highest = ifelse(date == max(date[practice == 0]), 1, 0)
  )

desired_result
# A tibble: 6 × 5
# Groups:   ID [2]
     ID date       practice lowest highest
  <int> <date>        <int>  <dbl>   <dbl>
1     1 2020-02-27        1      0       0
2     1 2021-04-21        0      1       0
3     1 2022-06-24        0      0       1
4     2 2019-03-21        0      1       0
5     2 2020-09-19        0      0       0
6     2 2021-01-21        0      0       1

uj5u.com熱心網友回復：

這是一個基本的 R 解決方案ave。

x<-'
ID  date    practice
1   02-27-2020  1
1   04-21-2021  0
1   06-24-2022  0
2   03-21-2019  0
2   09-19-2020  0
2   01-21-2021  0'

df1 <- read.table(textConnection(x), header = TRUE)
df1$date <- as.Date(df1$date, "%m-%d-%Y")

y1 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    min(x) == x
  else NULL
}))
y2 <- with(df1, ave(as.integer(date), ID, practice, FUN = \(x) {
  if(length(x))
    max(x) == x
  else NULL
}))

df1$lowest <- as.integer(y1 & (df1$practice != 1))
df1$highest <- as.integer(y2 & (df1$practice != 1))
df1
#>   ID       date practice lowest highest
#> 1  1 2020-02-27        1      0       0
#> 2  1 2021-04-21        0      1       0
#> 3  1 2022-06-24        0      0       1
#> 4  2 2019-03-21        0      1       0
#> 5  2 2020-09-19        0      0       0
#> 6  2 2021-01-21        0      0       1

^{由reprex 包于 2022-04-25 創建(v2.0.1)}

uj5u.com熱心網友回復：

data.table 方法：

f <- function(x,p) list(1*(x==min(x[p!=1])), 1*(x==max(x[p!=1])))
setDT(df)[,date:=as.IDate(date, "%m-%d-%Y")][,c("lowest","highest"):=f(date,practice), by=ID][]

輸出：

      ID       date practice lowest highest
   <int>     <IDat>    <int>  <num>   <num>
1:     1 2020-02-27        1      0       0
2:     1 2021-04-21        0      1       0
3:     1 2022-06-24        0      0       1
4:     2 2019-03-21        0      1       0
5:     2 2020-09-19        0      0       0
6:     2 2021-01-21        0      0       1

輸入：

structure(list(ID = c(1L, 1L, 1L, 2L, 2L, 2L), date = c("02-27-2020", 
"04-21-2021", "06-24-2022", "03-21-2019", "09-19-2020", "01-21-2021"
), practice = c(1L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, -6L
), class = "data.frame")

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/464722.html

標籤：r 日期指标

上一篇：PHP-如果否則在日期操作中不起作用

下一篇：vuejs中mysql的奇怪日期格式