具有分組資料的行ID，按特定列排列-有解無憂

這是可重現的資料集：

id<- c("U1", "U2", "U3", "U2", "U5", "U5")
date<- c("2020-02-01", "2020-05-06", "2020-04-01", "2020-07-09", "2020-11-01", "2020-12-01")
result<- c(1:6)
dt<- data.frame(id, date, result)

> dt
  id       date result
1 U1 2020-02-01      1
2 U2 2020-05-06      2
3 U3 2020-04-01      3
4 U2 2020-07-09      4
5 U5 2020-11-01      5
6 U5 2020-12-01      6

我想創建一個回圈（或者可能有另一種方法），查看唯一 ID 和測驗日期，并添加一個新列來告訴我哪個是測驗編號，按這些測驗日期排序。所以輸出看起來像這樣：

  id       date result     type
1 U1 2020-02-01      1 Result 1
2 U2 2020-05-06      2 Result 1
3 U3 2020-04-01      3 Result 1
4 U2 2020-07-09      4 Result 2
5 U5 2020-11-01      5 Result 1
6 U5 2020-12-01      6 Result 2

U2 有兩個結果，按考試日期排序，U5 有兩個結果，按考試日期排序。作為一個額外的問題，我也很想找到每個唯一 ID 的各種測驗之間的時間差，再次作為單獨的列。所以它看起來像這樣：

id       date result     type       time
1 U1 2020-02-01      1 Result 1 First Test
2 U2 2020-05-06      2 Result 1 First Test
3 U3 2020-04-01      3 Result 1 First Test
4 U2 2020-07-09      4 Result 2    64 Days
5 U5 2020-11-01      5 Result 1 First Test
6 U5 2020-12-01      6 Result 2    30 Days

uj5u.com熱心網友回復：

我們可以按日期，然后group_by(id)使用. 始終建議在進行任何轉換之前將日期轉換為適當的日期類列。arrangerow_number()

library(dplyr)
library(glue)

dt %>%
    mutate(date = as.Date(date)) %>%
    group_by(id) %>%
    arrange(date) %>%
    mutate(type = glue("Result {row_number()}")) %>%
    ungroup()

# A tibble: 6 × 4
  id    date       result type    
  <chr> <date>      <int> <glue>  
1 U1    2020-02-01      1 Result 1
2 U3    2020-04-01      3 Result 1
3 U2    2020-05-06      2 Result 1
4 U2    2020-07-09      4 Result 2
5 U5    2020-11-01      5 Result 1
6 U5    2020-12-01      6 Result 2

對于獎金問題，我們可以使用簡單的減法date - first(date)：

dt %>%
    mutate(date = as.Date(date)) %>%
    group_by(id) %>%
    arrange(date) %>%
    mutate(type = glue("Result {row_number()}"),
           time = date - first(date)) %>%
    ungroup()

# A tibble: 6 × 5
  id    date       result type     time   
  <chr> <date>      <int> <glue>   <drtn> 
1 U1    2020-02-01      1 Result 1  0 days
2 U3    2020-04-01      3 Result 1  0 days
3 U2    2020-05-06      2 Result 1  0 days
4 U2    2020-07-09      4 Result 2 64 days
5 U5    2020-11-01      5 Result 1  0 days
6 U5    2020-12-01      6 Result 2 30 days

轉載請註明出處，本文鏈接：https://www.uj5u.com/qukuanlian/513094.html

標籤：r日期唯一标识符

上一篇：PowerQuery：根據現有日期將DDMM重新格式化為MMDD

下一篇：加入（合并）表時在開始和結束時間間隔內重復資料