我有一個包含 142 列的資料庫，其中一個名為“日期”（屬于 POSIXct 類），我想將連續日期組合在一起創建一個新列。相隔超過 2 天的日期被分為不同的組。

我還想用連續日期開始的月份名稱命名組的級別（例如：2018 年 1 月 3 日 -> 2018 年 1 月 12 日 = 組級別稱為“一月采樣事件”；2 月 27 日， 2018 -> 2018 年 3 月 1 日 = 稱為“2 月采樣事件”的組級別；等等...）。

我見過非常相似的問題，例如在 R和R中分組連續日期：彼此相鄰的分組日期，但無法讓它適用于我的資料。

編輯：我的資料示例（由于某種原因，最后一行顯示相隔一年的日期被組合在一起）

    > dput(df)
structure(list(Date = structure(c(17534, 17535, 17536, 17537, 
18279, 18280, 18281, 18282, 17932), class = "Date"), group = c(1, 
1, 1, 1, 2, 2, 2, 2, 2)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

我的嘗試：

df$group <- 1   c(0, cumsum(ifelse(diff(df$Date) > 1, 1, 0)))

uj5u.com熱心網友回復：

從日期時間中洗掉時間

如果沒有看到您的資料（或類似的示例資料），很難準確判斷問題出在哪里，但我的猜測是日期時間格式（00:00:00 部分）搞砸了as.Date

一種解決方案是僅提取日期部分，然后僅使用日期部分重試：

# here are your date times
date_time <- "2018-01-03 00:00:00"

# this looks for 4 digits between 0 and 9, followed by a dash, followed by 2 digits between 0 and 9,followed by a dash, followed by 2 digits between 0 and 9  
date_pattern <- " ?([0-9]{4}-[0-9]{2}-[0-9]{2}) ?"

#need this library
library(stringr)
library(magrittr) #for pipes

#this pulls out text matching the pattern we specified in date pattern
date_new <- str_extract(date_time, date_pattern) %>% 
  str_squish()   # this removes white space

# this is the new date without the time
date_new

# then we convert to as date
date_new <- as.Date(date_new)

看看是否將您的日期列轉換為日期，然后重新運行您的分組作業。

如果您有不同格式的日期并且需要調整正則運算式，這里有一些關于正則運算式的內容：https ://stackoverflow.com/a/49286794/16502170

團體日期

讓我們從一個包含日期列的示例資料框開始

# here's a bunch of example dates:
library(lubridate)
dates2 <- seq.Date(as.Date("2018-03-01"),by="days",length.out = 60)

#here's the dataframe
exampl_df <- data.frame(animals = rep(c("cats","dogs","rabbits"),20), dates=dates2,
                        numbers= rep(1:3,20))

這是它的樣子：

head(exampl_df)
  animals      dates numbers
1    cats 2018-03-01       1
2    dogs 2018-03-02       2
3 rabbits 2018-03-03       3
4    cats 2018-03-04       1
5    dogs 2018-03-05       2
6 rabbits 2018-03-06       3

然后讓我們對序列中最小日期和最大日期之間的每一天進行序列化。這一步很重要，因為我們的資料中可能缺少我們仍希望計入天數間隔的日期。

# this is a day by day sequence from the earliest day in your data to the latest day
date_sequence <- seq.Date(from = min(dates2),max(dates2),by="day")

然后讓我們制作一個數字序列，每個數字重復七次。如果您想每三天分組一次，您可以將每個更改為 3。然后 length.out= length(date_sequence) 告訴 R 使該向量具有與最小到最大日期序列一樣多的條目：

# and then if you want a new group every seven days you can make this number sequence
groups <- rep(1:length(date_sequence),each= 7, length.out = length(date_sequence) )

然后讓我們將這些組附加到 date_sequence 以制作分組索引

date_grouping_index <- data.frame(a=date_sequence,b=groups)

然后您可以進行連接以將組附加到原始資料框

library(dplyr)
example_df 2 <- exampl_df %>% 
  inner_join(date_grouping_index, by=c("dates"="a"))

這是我們得到的：

head(example_df2,n=10)
   animals      dates numbers b
1     cats 2018-03-01       1 1
2     dogs 2018-03-02       2 1
3  rabbits 2018-03-03       3 1
4     cats 2018-03-04       1 1
5     dogs 2018-03-05       2 1
6  rabbits 2018-03-06       3 1
7     cats 2018-03-07       1 1
8     dogs 2018-03-08       2 2
9  rabbits 2018-03-09       3 2
10    cats 2018-03-10       1 2

然后您應該能夠使用 b 列group_by()或aggregate()您的資料

使用問題中提供的資料

#original data
df <- structure(list(Date = structure(c(17534, 17535, 17536, 17537, 
                                        18279, 18280, 18281, 18282, 17932), class = "Date"), group = c(1, 
                                                                                                     1, 1, 1, 2, 2, 2, 2, 2)), row.names = c(NA, -9L), class = c("tbl_df", 
                                                                                                                                                                   "tbl", "data.frame"))

#plus extra step
df$group2 <- 1   c(0, cumsum(ifelse(diff(df$Date) > 1, 1, 0)))

上述方法

date_sequence <- seq.Date(from = min(df$Date),max(df$Date),by="day")
groups <- rep(1:length(date_sequence),each= 7, length.out = length(date_sequence) )
date_grouping_index <- data.frame(a=date_sequence,groups=groups)

example_df2<- df %>% 
  inner_join(date_grouping_index, by=c("Date"="a"))

看起來奏效了？

example_df2
# A tibble: 9 x 4
  Date       group group2 groups
  <date>     <dbl>  <dbl>  <int>
1 2018-01-03     1      1      1
2 2018-01-04     1      1      1
3 2018-01-05     1      1      1
4 2018-01-06     1      1      1
5 2020-01-18     2      2    107
6 2020-01-19     2      2    107
7 2020-01-20     2      2    107
8 2020-01-21     2      2    107
9 2019-02-05     2      2     57

以下是您可以使用日期和年份制作組名的方法：

example_df2$group_name <- paste0("sampling number ",
                                example_df2$groups,
                                " (",
                                month.name[month(example_df2$Date)],
                                "-",
                                year(example_df2$Date),
                                ")")

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/461647.html

標籤：r 日期

上一篇：如何將最初是日期的序列號轉換為日期最初具有但不使用日期物件的天數？

下一篇：查找串列中最接近給定日期且不晚于給定日期的日期

對連續日期進行分組[重復問題，但無法使用我的資料]

從日期時間中洗掉時間

團體日期

使用問題中提供的資料