我想根據一年中的月份生成一個月中的隨機日期。我目前的代碼是:
df$new_day = case_when(
df$new_month == 2 ~ (floor(runif(1, min=1, max=28))),
df$new_month == 1 ~ floor(runif(1, min=1, max=31)),
df$new_month == 3 ~ floor(runif(1, min=1, max=31)),
df$new_month == 5 ~ floor(runif(1, min=1, max=31)),
df$new_month == 7 ~ floor(runif(1, min=1, max=31)),
df$new_month == 8 ~ floor(runif(1, min=1, max=31)),
df$new_month == 10 ~ floor(runif(1, min=1, max=31)),
df$new_month == 12 ~ floor(runif(1, min=1, max=31)),
TRUE ~ floor(runif(1, min=1, max=30))
)
但是,給定月份的每一天都是一樣的。例如,2 月的所有日期都是 23 日。
我怎樣才能真正隨機化每個月內的日期?
uj5u.com熱心網友回復:
您每次都明確要求 1 個亂數:runif(1, ...). 相反,使用runif(n(), ...). 意識到它不是為每一行呼叫一次,而是為滿足該條件的所有行運行一次。在下面的示例中,五月有三行,但runif被稱為 asrunif(1,..)并且該單個數字應用于所有三行。
樣本資料:
set.seed(42)
df <- data.frame(day = as.Date("2022-01-01") sample(364, size=10)) %>%
arrange(day) %>%
mutate(month = as.POSIXlt(day)$mon 1L)
df
# day month
# 1 2022-02-19 2
# 2 2022-03-16 3
# 3 2022-05-03 5
# 4 2022-05-09 5
# 5 2022-05-27 5
# 6 2022-06-03 6
# 7 2022-08-17 8
# 8 2022-10-31 10
# 9 2022-11-18 11
# 10 2022-12-31 12
破碎的:
library(dplyr)
set.seed(42)
df %>%
mutate(
new_day = case_when(
month == 2 ~ floor(runif(1, 1, 28)),
month %in% c(9, 4, 6, 11) ~ floor(runif(1, 1, 30)),
TRUE ~ floor(runif(1, 1, 31))
)
)
# day month new_day
# 1 2022-02-19 2 25
# 2 2022-03-16 3 9
# 3 2022-05-03 5 9
# 4 2022-05-09 5 9
# 5 2022-05-27 5 9
# 6 2022-06-03 6 28
# 7 2022-08-17 8 9
# 8 2022-10-31 10 9
# 9 2022-11-18 11 28
# 10 2022-12-31 12 9
為了證明對滿足每個條件的所有行都runif呼叫一次message,我將添加到每個行。如果我們可以依賴runif(1,..),那么我們應該會看到"30d"列印到控制臺 7 次和"31d"兩次,但我們沒有。
set.seed(42)
df %>%
mutate(
new_day = case_when(
month == 2 ~ { message("Feb: ", length(month)); floor(runif(1, 1, 28)); },
month %in% c(9, 4, 6, 11) ~ { message("30d: ", length(month)); floor(runif(1, 1, 30)); },
TRUE ~ { message("31d: ", length(month)); floor(runif(1, 1, 31)); }
)
)
# Feb: 10
# 30d: 10
# 31d: 10
# day month new_day
# 1 2022-02-19 2 25
# 2 2022-03-16 3 9
# 3 2022-05-03 5 9
# 4 2022-05-09 5 9
# 5 2022-05-27 5 9
# 6 2022-06-03 6 28
# 7 2022-08-17 8 9
# 8 2022-10-31 10 9
# 9 2022-11-18 11 28
# 10 2022-12-31 12 9
這表明,當我們在其中一個條件的 RHS 中時,它是對框架所有行的呼叫。請注意,每次呼叫 時runif,它都會看到(我們有 10 行)的所有值。monthdf
相反,使用n()(每次呼叫中的行數):
set.seed(42)
df %>%
mutate(
new_day = case_when(
month == 2 ~ floor(runif(n(), 1, 28)),
month %in% c(9, 4, 6, 11) ~ floor(runif(n(), 1, 30)),
TRUE ~ floor(runif(n(), 1, 31))
)
)
# day month new_day
# 1 2022-02-19 2 25
# 2 2022-03-16 3 5
# 3 2022-05-03 5 30
# 4 2022-05-09 5 29
# 5 2022-05-27 5 3
# 6 2022-06-03 6 28
# 7 2022-08-17 8 12
# 8 2022-10-31 10 28
# 9 2022-11-18 11 14
# 10 2022-12-31 12 26
uj5u.com熱心網友回復:
sampl從seq.Date利用存盤在POSIXlt. 我們可以很容易地替換天并增加月份(但減去一天)。這會自動考慮閏年等。
f <- \(x) {
sample(with(as.POSIXlt(x),
seq.Date(as.Date(ISOdate(year 1900, mon 1, 1, 0)),
as.Date(ISOdate(year 1900, mon 2, 1, 0)) - 1, 'day')),
1)
}
res <- transform(df, new_date=do.call(c, lapply(df$date, f)))
res
# x date new_date
# 1 0.9148060 2021-06-17 2021-06-22
# 2 0.9370754 2022-08-13 2022-08-18
# 3 0.2861395 2020-08-23 2020-08-13
# 4 0.8304476 2022-07-30 2022-07-28
# 5 0.6417455 2021-07-20 2021-07-05
# 6 0.5190959 2021-09-23 2021-09-04
# 7 0.7365883 2020-09-12 2020-09-02
# 8 0.1346666 2022-05-20 2022-05-24
# 9 0.6569923 2021-05-09 2021-05-18
# 10 0.7050648 2019-09-16 2019-09-03
# 11 0.4577418 2022-08-30 2022-08-24
# 12 0.7191123 2020-04-25 2020-04-23
# 13 0.9346722 2022-08-14 2022-08-17
# 14 0.2554288 2019-01-24 2019-01-21
# 15 0.4622928 2022-03-27 2022-03-26
# 16 0.9400145 2019-10-26 2019-10-18
# 17 0.9782264 2020-02-10 2020-02-06
# 18 0.1174874 2019-11-10 2019-11-06
# 19 0.4749971 2022-08-08 2022-08-02
# 20 0.5603327 2021-04-15 2021-04-20
不確定您是否想要日期或數字。如果您希望新的月份和日期顯示為數字,您可以這樣做
within(res, {
new_date <- do.call(c, lapply(df$date, f))
month <- strftime(new_date, '%m')
day <- strftime(new_date, '%d')
}) |>
type.convert(as.is=TRUE)
# x date new_date day month
# 1 0.9148060 2021-06-17 2021-06-03 3 6
# 2 0.9370754 2022-08-13 2022-08-22 22 8
# 3 0.2861395 2020-08-23 2020-08-21 21 8
# 4 0.8304476 2022-07-30 2022-07-02 2 7
# 5 0.6417455 2021-07-20 2021-07-23 23 7
# 6 0.5190959 2021-09-23 2021-09-06 6 9
# 7 0.7365883 2020-09-12 2020-09-26 26 9
# 8 0.1346666 2022-05-20 2022-05-10 10 5
# 9 0.6569923 2021-05-09 2021-05-08 8 5
# 10 0.7050648 2019-09-16 2019-09-05 5 9
# 11 0.4577418 2022-08-30 2022-08-01 1 8
# 12 0.7191123 2020-04-25 2020-04-17 17 4
# 13 0.9346722 2022-08-14 2022-08-07 7 8
# 14 0.2554288 2019-01-24 2019-01-04 4 1
# 15 0.4622928 2022-03-27 2022-03-13 13 3
# 16 0.9400145 2019-10-26 2019-10-10 10 10
# 17 0.9782264 2020-02-10 2020-02-09 9 2
# 18 0.1174874 2019-11-10 2019-11-29 29 11
# 19 0.4749971 2022-08-08 2022-08-12 12 8
# 20 0.5603327 2021-04-15 2021-04-20 20 4
資料:
df <- structure(list(x = c(0.914806043496355, 0.937075413297862, 0.286139534786344,
0.830447626067325, 0.641745518893003, 0.519095949130133, 0.736588314641267,
0.13466659723781, 0.656992290401831, 0.705064784036949, 0.45774177624844,
0.719112251652405, 0.934672247152776, 0.255428824340925, 0.462292822543532,
0.940014522755519, 0.978226428385824, 0.117487361654639, 0.474997081561014,
0.560332746244967), date = structure(c(18795, 19217, 18497, 19203,
18828, 18893, 18517, 19132, 18756, 18155, 19234, 18377, 19218,
17920, 19078, 18195, 18302, 18210, 19212, 18732), class = "Date")), class = "data.frame", row.names = c(NA,
-20L))
uj5u.com熱心網友回復:
您可以創建一個小助手函式,該函式將回傳每個月的天數。
month_days <- function(x) case_when(
x == 2 ~ 28,
x %in% c(1,3,5,7,8,10) ~ 31,
TRUE ~ 30
)
max=然后,您可以使用向量化的事實runif來一次獲取所有值。請注意,由于您正在這樣做floor(),因此您需要將 1 添加到最大值,以便您有機會觀察到該值
set.seed(22)
# test data
N <- 50
dd <- data.frame(new_month = sample(1:12, N, replace=TRUE))
dd$new_day <- floor( runif( length(dd$new_month), min=1, max=month_days(dd$new_month) 1 ) )
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/431162.html
上一篇:你如何在R中迭代地改變觀察結果?
