晚上好,
我在 R 中有一個非常大的資料集,我正在嘗試找到回圈遍歷它以解決一些問題的最佳方法。將資料想象為歷史員工作業時間。它看起來像:
rawTable:
Department Name Date Hours
Engineering Mary 2021-01-01 8
Engineering Mary 2021-01-02 8
Engineering Mary 2021-01-03 0
Engineering Mary 2021-01-04 6
Sales Barry 2021-01-01 0
Sales Barry 2021-01-02 12
Sales Barry 2021-01-03 12
Sales Barry 2021-01-04 12
我有大約 3,200 人在名單上,一年中的每一天都排成一排,所以這張桌子顯然很大。
我需要向表中添加兩列:
第一個是 LDO,顯示(每天)他們的最后一天休息
第二個是 WSH 顯示該人自最后一天休息以來作業了多少小時。看起來像:
rawTable:
Department Name Date Hours LDO WSH
Engineering Mary 2021-01-01 8 2020-12-31 8
Engineering Mary 2021-01-02 8 2020-12-31 16
Engineering Mary 2021-01-03 0 2021-01-03 0
Engineering Mary 2021-01-04 6 2021-01-03 6
Sales Barry 2021-01-01 0 2021-01-01 0
Sales Barry 2021-01-02 12 2021-01-01 12
Sales Barry 2021-01-03 12 2021-01-01 24
Sales Barry 2021-01-04 12 2021-01-01 36
我試過使用 for 回圈讓它逐行應用邏輯。對于每一行,如果小時數為零,則 LDO=Date 且 WSH=0。如果不是,則來自前一行的 LDO=LDO 和來自前 小時的 WSH=WSH。使用此尺寸設定,運行需要永遠半時間。
接下來,我創建了一個函式,給定一行,使用大串列的副本,并基于“which”陳述句告訴我該人在行日期前 0 小時作業的最后一天的行號。這也花了很長時間。除此之外,我什至沒有進入 WSH 部分。那看起來像:
rawLU <- rawTable
LDO = function(x) {
max(c(0, which((rawLU$Name == x["Name"]) &
(rawLU$Hours == 0) & (rawLU$Date <= x[Date])
)))
}
LastOff<-apply(rawTable,1,LDO)
我知道有一種更簡單的方法可以做到,而且我也知道我似乎無法弄清楚。
任何人都可以提供幫助嗎?提前致謝!
麥克風
uj5u.com熱心網友回復:
這是一個可能的解決方案dplyr-
獲取Date值,如果Hours = 0, 用于fill獲取其他行上的上一個非作業日期。WSH可以使用 計算cumsum。
library(dplyr)
library(tidyr)
rawTable %>%
mutate(Date = as.Date(Date)) %>%
group_by(Department, Name) %>%
mutate(LDO = if_else(Hours == 0, Date, as.Date(NA))) %>%
fill(LDO) %>%
mutate(LDO = if_else(is.na(LDO), min(Date) - 1, LDO)) %>%
group_by(LDO, .add = TRUE) %>%
mutate(WSH = cumsum(Hours)) %>%
ungroup
# Department Name Date Hours LDO WSH
# <chr> <chr> <date> <int> <date> <int>
#1 Engineering Mary 2021-01-01 8 2020-12-31 8
#2 Engineering Mary 2021-01-02 8 2020-12-31 16
#3 Engineering Mary 2021-01-03 0 2021-01-03 0
#4 Engineering Mary 2021-01-04 6 2021-01-03 6
#5 Sales Barry 2021-01-01 0 2021-01-01 0
#6 Sales Barry 2021-01-02 12 2021-01-01 12
#7 Sales Barry 2021-01-03 12 2021-01-01 24
#8 Sales Barry 2021-01-04 12 2021-01-01 36
資料
rawTable <- structure(list(Department = c("Engineering", "Engineering", "Engineering",
"Engineering", "Sales", "Sales", "Sales", "Sales"), Name = c("Mary",
"Mary", "Mary", "Mary", "Barry", "Barry", "Barry", "Barry"),
Date = c("2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04",
"2021-01-01", "2021-01-02", "2021-01-03", "2021-01-04"),
Hours = c(8L, 8L, 0L, 6L, 0L, 12L, 12L, 12L)), class = "data.frame", row.names = c(NA, -8L))
uj5u.com熱心網友回復:
df1 %>%
group_by(Department, Name, grp = cumsum(Hours==0)) %>%
mutate(Date = as.Date(Date),
LDO = first(Date) - (first(Hours)>0),
WHS = cumsum(Hours))
# A tibble: 8 x 7
# Groups: Department, Name, grp [3]
Department Name Date Hours grp LDO WHS
<chr> <chr> <date> <int> <int> <date> <int>
1 Engineering Mary 2021-01-01 8 0 2020-12-31 8
2 Engineering Mary 2021-01-02 8 0 2020-12-31 16
3 Engineering Mary 2021-01-03 0 1 2021-01-03 0
4 Engineering Mary 2021-01-04 6 1 2021-01-03 6
5 Sales Barry 2021-01-01 0 2 2021-01-01 0
6 Sales Barry 2021-01-02 12 2 2021-01-01 12
7 Sales Barry 2021-01-03 12 2 2021-01-01 24
8 Sales Barry 2021-01-04 12 2 2021-01-01 36
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/313520.html
下一篇:for回圈在r中回圈通過向量
