我希望可以為 R 中的時間段創建唯一的索引或 ID。
我有一個大型的二級時間資料集。理論上有時間中斷可以讓我“分組”時間塊并為它們分配一個唯一的索引或編號。
我將嘗試創建一個可重現的示例,但請記住,時間段的持續時間會發生變化,時間間隔不是均勻分布的,并且日期可能會從一天更改為另一天。
#this is what the dataframe would look like
DateTime
2021-07-12 20:28:26 CDT
2021-07-12 20:28:27 CDT
2021-07-12 20:28:28 CDT
2021-07-12 20:28:29 CDT
2021-07-12 20:28:30 CDT
2021-07-12 23:14:28 CDT
2021-07-12 23:14:29 CDT
2021-07-12 23:14:30 CDT
2021-07-12 23:14:31 CDT
2021-07-12 23:14:32 CDT
2021-07-12 23:14:33 CDT
2021-07-12 23:14:34 CDT
2021-07-12 23:14:35 CDT
2021-07-12 23:14:36 CDT
2021-07-27 17:16:05 CDT
2021-07-27 17:16:06 CDT
2021-07-27 17:16:07 CDT
2021-07-27 17:16:08 CDT
2021-07-27 17:16:09 CDT
2021-07-27 17:16:10 CDT
2021-07-27 17:16:11 CDT
2021-07-27 17:16:12 CDT
2021-07-27 17:16:13 CDT
2021-07-27 17:16:14 CDT
2021-07-27 17:16:15 CDT
#this is for reproducing time times
structure(c(1626139706, 1626139707, 1626139708, 1626139709, 1626139710, 1626149668, 1626149669, 1626149670, 1626149671, 1626149672, 1626149673, 1626149674, 1626149675, 1626149676, 1627424165, 1627424166, 1627424167, 1627424168, 1627424169, 1627424170, 1627424171, 1627424172, 1627424173, 1627424174, 1627424175),
class = c("POSIXct", "POSIXt"), tzone = "")
同樣,我希望為時間段/塊分配一個唯一的數字。它看起來像下面這樣:
DateTime Index
2021-07-12 20:28:26 CDT 1
2021-07-12 20:28:27 CDT 1
2021-07-12 20:28:28 CDT 1
2021-07-12 20:28:29 CDT 1
2021-07-12 20:28:30 CDT 1
2021-07-12 23:14:28 CDT 2
2021-07-12 23:14:29 CDT 2
2021-07-12 23:14:30 CDT 2
2021-07-12 23:14:31 CDT 2
2021-07-12 23:14:32 CDT 2
2021-07-12 23:14:33 CDT 2
2021-07-12 23:14:34 CDT 2
2021-07-12 23:14:35 CDT 2
2021-07-12 23:14:36 CDT 2
2021-07-27 17:16:05 CDT 3
2021-07-27 17:16:06 CDT 3
2021-07-27 17:16:07 CDT 3
2021-07-27 17:16:08 CDT 3
2021-07-27 17:16:09 CDT 3
2021-07-27 17:16:10 CDT 3
2021-07-27 17:16:11 CDT 3
2021-07-27 17:16:12 CDT 3
2021-07-27 17:16:13 CDT 3
2021-07-27 17:16:14 CDT 3
2021-07-27 17:16:15 CDT 3
#edit: something like this is possibility but isn't included in the reproducible example.
DateTime Index
2021-07-15 23:59:59 CDT 4
2021-07-16 00:00:00 CDT 4
這是我找到的最接近我正在尋找的東西:如何為連續日期的每個夜間時段創建唯一 ID?
但我不確定如何繼續。任何幫助將不勝感激謝謝。
uj5u.com熱心網友回復:
library(dplyr)
data.frame(DateTime) %>%
mutate(Index = 1 cumsum(DateTime - lag(DateTime,1,min(DateTime)) > 60))
這將在每次休息 1 分鐘或更長時間時創建一個新組。日期時間以秒為單位“在幕后”存盤,因此由于先前('滯后')值為一分鐘,因此相差 60。cumsum正在捕獲發生大中斷的累計次數。
DateTime Index
1 2021-07-12 18:28:26 1
2 2021-07-12 18:28:27 1
3 2021-07-12 18:28:28 1
4 2021-07-12 18:28:29 1
5 2021-07-12 18:28:30 1
6 2021-07-12 21:14:28 2
7 2021-07-12 21:14:29 2
8 2021-07-12 21:14:30 2
9 2021-07-12 21:14:31 2
10 2021-07-12 21:14:32 2
11 2021-07-12 21:14:33 2
12 2021-07-12 21:14:34 2
13 2021-07-12 21:14:35 2
14 2021-07-12 21:14:36 2
15 2021-07-27 15:16:05 3
16 2021-07-27 15:16:06 3
17 2021-07-27 15:16:07 3
18 2021-07-27 15:16:08 3
19 2021-07-27 15:16:09 3
20 2021-07-27 15:16:10 3
21 2021-07-27 15:16:11 3
22 2021-07-27 15:16:12 3
23 2021-07-27 15:16:13 3
24 2021-07-27 15:16:14 3
25 2021-07-27 15:16:15 3
uj5u.com熱心網友回復:
如果我們正在尋找索引每分鐘變化增加 1,那么可以使用 floor_date
library(lubridate)
library(tibble)
library(dplyr)
tibble(DateTime) %>%
mutate(Index =floor_date(DateTime, unit = 'minute'),
Index = match(Index, unique(Index)))
-輸出
# A tibble: 25 × 2
DateTime Index
<dttm> <int>
1 2021-07-12 21:28:26 1
2 2021-07-12 21:28:27 1
3 2021-07-12 21:28:28 1
4 2021-07-12 21:28:29 1
5 2021-07-12 21:28:30 1
6 2021-07-13 00:14:28 2
7 2021-07-13 00:14:29 2
8 2021-07-13 00:14:30 2
9 2021-07-13 00:14:31 2
10 2021-07-13 00:14:32 2
# … with 15 more rows
uj5u.com熱心網友回復:
這是另一種方法:
library(dplyr)
tibble(DateTime) %>%
mutate(DateTime1 = lag(DateTime, default = DateTime[1])) %>%
mutate(helper = DateTime - DateTime1) %>%
group_by(Index = cumsum(helper!=1)) %>%
select(-DateTime1, -helper)
資料:
DateTime <- structure(c(1626139706, 1626139707, 1626139708, 1626139709, 1626139710, 1626149668, 1626149669, 1626149670, 1626149671, 1626149672, 1626149673, 1626149674, 1626149675, 1626149676, 1627424165, 1627424166, 1627424167, 1627424168, 1627424169, 1627424170, 1627424171, 1627424172, 1627424173, 1627424174, 1627424175),
class = c("POSIXct", "POSIXt"), tzone = "")
輸出:
DateTime Index
<dttm> <int>
1 2021-07-13 03:28:26 1
2 2021-07-13 03:28:27 1
3 2021-07-13 03:28:28 1
4 2021-07-13 03:28:29 1
5 2021-07-13 03:28:30 1
6 2021-07-13 06:14:28 2
7 2021-07-13 06:14:29 2
8 2021-07-13 06:14:30 2
9 2021-07-13 06:14:31 2
10 2021-07-13 06:14:32 2
11 2021-07-13 06:14:33 2
12 2021-07-13 06:14:34 2
13 2021-07-13 06:14:35 2
14 2021-07-13 06:14:36 2
15 2021-07-28 00:16:05 3
16 2021-07-28 00:16:06 3
17 2021-07-28 00:16:07 3
18 2021-07-28 00:16:08 3
19 2021-07-28 00:16:09 3
20 2021-07-28 00:16:10 3
21 2021-07-28 00:16:11 3
22 2021-07-28 00:16:12 3
23 2021-07-28 00:16:13 3
24 2021-07-28 00:16:14 3
25 2021-07-28 00:16:15 3
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325682.html
