對于每個事件,我試圖計算在同一組(班級和地區)內的前一個冬天(以及之前的冬天)發生了多少以前的事件。
這是一個簡單的資料集(我的實際資料集有 85,000 條記錄)
# creating a simple data.frame
class <- c(1,1,1,1,1,2,2,2,2)
area <- c("a", "a","b", "a", "a","b", "a", "a","b" )
event <- as.Date(c("2023-04-01", "2022-12-01", "2022-01-01",
"2021-12-01", "2022-12-01", "2022-12-01",
"2020-04-01", "2022-04-01", "2022-04-01"))
df <- data.frame(class, area, event)
str(df) # checking the structure of the data.frame
df <- df[order(class, area, event),] # sorting the order
df
df$events_in_previous_winter <- c(0,1,1,2,0,0,0,0,0) # this is the desired answer
df
我曾嘗試使用 dplyr / group_by(class, area) 和 mutate 來計數,但我無法讓它作業。
我將冬季定義為 12、1 和 2 月份(12 月、1 月、2 月)。
我想知道每個組(班級和地區的唯一配對)在該事件的前一個冬天發生了多少“事件”。
有任何想法嗎?
uj5u.com熱心網友回復:
我會使用分組摘要來創建一個單獨的previous_winter_events資料框,其中包含每個冬季的事件數。然后,您可以使用dplyr::lag(). (您也可以通過設定lag(x, n = 2)等來獲取兩個冬天前的事件。)然后,使用左連接將這些值合并回原始資料框。
此解決方案使用winter_year幫助列將每年的 12 月與接下來的 1 月和 2 月分組,即使它們位于不同的年份。我習慣tidyr::complete()將沒有事件的年份添加到previous_winter_events.
library(dplyr)
library(tidyr)
library(lubridate)
event_df <- event_df %>%
mutate(winter_year = year(event %m-% months(2))) %>%
arrange(event_class, area, event)
previous_winter_events <- event_df %>%
complete(event_class, area, winter_year = full_seq(winter_year, 1)) %>%
group_by(event_class, area, winter_year) %>%
summarize(
events_this_winter = sum(month(event) %in% c(12, 1, 2)),
.groups = "drop_last"
) %>%
mutate(
events_in_previous_winter = dplyr::lag(events_this_winter, default = 0)
) %>%
ungroup()
event_df <- event_df %>%
left_join(previous_winter_events) %>%
select(!c(winter_year, events_this_winter)) # remove helper columns
event_df
輸出:
event_class area event events_in_previous_winter
1 1 a 2018-01-01 0
2 1 a 2021-12-01 0
3 1 a 2022-01-31 0
4 1 a 2022-12-01 2
5 1 a 2022-12-01 2
6 1 a 2023-04-01 2
7 1 b 2022-01-01 0
8 2 a 2020-04-01 0
9 2 a 2022-04-01 0
10 2 b 2022-04-01 0
11 2 b 2022-12-01 0
資料:
# Added a couple additional test cases from OP comments.
# Changed names of `df` and `class` because those are function names in R.
event_df <- data.frame(
event_class = c(1,1,1,1,1,1,1,2,2,2,2),
area = c("a", "a","a", "a","b", "a", "a","b", "a", "a","b"),
event = as.Date(c("2018-01-01", "2023-04-01", "2022-12-01", "2022-01-31",
"2022-01-01", "2021-12-01", "2022-12-01", "2022-12-01",
"2020-04-01", "2022-04-01", "2022-04-01"))
)
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/520922.html
上一篇:選擇n列中的日期匹配的行
下一篇:R中頁面和節點的Web抓取回圈
