我有一個關于 data.table 的小問題。因為我不太擅長它,所以我不太確定如何在 data.table 中做到這一點。
基本上我有 3 列,想按前兩列(鍵和日期)分組,然后對于每個鍵和每個日期,找到第三列(票價)中出現的最大值和最小值
我試過這樣做,但它給了我一個錯誤
flights[, c("max_day", "min_day") := unlist(lapply(gross_fare, findr)), by = c("key", "created_date")]
Error in `[.data.table`(flights, , `:=`(c("max_day", "min_day"), unlist(lapply(gross_fare, :
Supplied 18 items to be assigned to group 1 of size 9 in column 'max_day'. The RHS length must either be 1 (single values are ok) or match the LHS length exactly. If you wish to 'recycle' the RHS please use rep() explicitly to make this intent clear to readers of your code.
findr 是一個函式,它只找到最大值和最小值,即
findr <- function(x) {list(max = max(x), min = min(x)}
我已經在 dplyr 中完成了我想做的事情,我會附上代碼,但是由于我有數百萬行,dplyr 吃光了我的記憶體,因此 data.table 會有所幫助
test <- flights %>%
select(key, created_date, gross_fare) %>%
group_by(key, created_date) %>%
summarise(
max_day = max(gross_fare),
min_day = min(gross_fare),
diff = max_day - min_day) %>%
arrange(created_date)
如果有人想使用它,我已經把 dput 輸出放在那里了,如果有人可以提供幫助,那就太好了,謝謝:)
data.table::setDT(structure(list(key = c("LHE_KHI_LHE KHI_PA-405_15.0_1", "KHI_ISB_KHI ISB_PK-370_20.0_0",
"LHE_KHI_LHE KHI_PK-307_20.0_0", "ISB_KHI_ISB KHI_PF-124_20.0_1",
"LHE_KHI_LHE KHI_PK-307_20.0_0", "LHE_KHI_LHE KHI_PA-405_15.0_1",
"KHI_LHE_KHI LHE_PK-304_20.0_0", "KHI_ISB_KHI ISB_PA-204_15.0_1",
"ISB_KHI_ISB KHI_PA-207_15.0_1", "KHI_ISB_KHI ISB_PA-200_20.0_1",
"KHI_LHE_KHI LHE_PK-304_40.0_0", "ISB_KHI_ISB KHI_PA-201_35.0_1",
"ISB_KHI_ISB KHI_ER-501_20.0_1", "KHI_LHE_KHI LHE_PF-145_20.0_2",
"KHI_ISB_KHI ISB_PA-204_20.0_1", "LHE_KHI_LHE KHI_PA-401_0.0_0",
"ISB_KHI_ISB KHI_PK-309_40.0_0", "KHI_ISB_KHI ISB_PF-123_20.0_2",
"ISB_KHI_ISB KHI_PA-205_15.0_1", "LHE_KHI_LHE KHI_PF-142_0.0_0",
"ISB_KHI_ISB KHI_PA-223_15.0_1", "ISB_KHI_ISB KHI_PF-126_20.0_2",
"ISB_KHI_ISB KHI_PK-309_20.0_0", "KHI_ISB_KHI ISB_PF-121_20.0_2",
"ISB_KHI_ISB KHI_PK-373_20.0_0", "KHI_LHE_KHI LHE_PF-145_20.0_2",
"KHI_LHE_KHI LHE_PA-402_15.0_1", "LHE_KHI_LHE KHI_PA-407_20.0_1",
"KHI_ISB_KHI ISB_PK-308_40.0_0", "KHI_LHE_KHI LHE_PF-145_20.0_2",
"LHE_KHI_LHE KHI_PF-144_0.0_0", "ISB_KHI_ISB KHI_PK-369_40.0_0",
"ISB_KHI_ISB KHI_PF-124_20.0_2", "KHI_ISB_KHI ISB_PA-204_15.0_1",
"KHI_ISB_KHI ISB_PA-200_15.0_1", "ISB_KHI_ISB KHI_PF-124_20.0_1",
"KHI_ISB_KHI ISB_PK-300_20.0_0", "ISB_KHI_ISB KHI_PF-122_20.0_2",
"KHI_ISB_KHI ISB_PK-368_20.0_0", "KHI_ISB_KHI ISB_PA-204_15.0_1",
"ISB_KHI_ISB KHI_ER-503_20.0_1", "ISB_KHI_ISB KHI_PA-209_15.0_1",
"KHI_ISB_KHI ISB_PK-308_40.0_0", "ISB_KHI_ISB KHI_PF-124_20.0_1",
"ISB_KHI_ISB KHI_PK-301_40.0_0", "KHI_LHE_KHI LHE_PA-408_35.0_1",
"LHE_KHI_LHE KHI_PF-144_20.0_2", "KHI_ISB_KHI ISB_PF-121_20.0_2",
"KHI_ISB_KHI ISB_PA-204_35.0_1", "ISB_KHI_ISB KHI_PK-309_40.0_0",
"ISB_KHI_ISB KHI_PA-223_20.0_1", "KHI_ISB_KHI ISB_PA-206_35.0_1",
"LHE_KHI_LHE KHI_PF-142_32.0_1", "LHE_KHI_LHE KHI_PF-142_20.0_1",
"KHI_ISB_KHI ISB_PF-123_20.0_2", "ISB_KHI_ISB KHI_PA-209_15.0_1",
"KHI_ISB_KHI ISB_PA-204_35.0_1", "ISB_KHI_ISB KHI_PA-201_20.0_1",
"KHI_ISB_KHI ISB_PK-368_20.0_0", "ISB_KHI_ISB KHI_PA-205_20.0_1",
"KHI_ISB_KHI ISB_PF-121_20.0_1", "ISB_KHI_ISB KHI_PF-124_20.0_1",
"ISB_KHI_ISB KHI_PA-205_15.0_1", "KHI_LHE_KHI LHE_PF-145_20.0_2",
"KHI_LHE_KHI LHE_PA-406_35.0_1", "KHI_ISB_KHI ISB_PK-308_20.0_0",
"LHE_KHI_LHE KHI_PA-401_20.0_1", "LHE_KHI_LHE KHI_PA-401_15.0_1",
"KHI_ISB_KHI ISB_PA-204_35.0_1", "KHI_LHE_KHI LHE_PA-406_35.0_1",
"KHI_ISB_KHI ISB_PA-206_35.0_1", "KHI_ISB_KHI ISB_PF-121_20.0_1",
"ISB_KHI_ISB KHI_PA-205_20.0_1", "LHE_KHI_LHE KHI_PF-142_20.0_1",
"LHE_KHI_LHE KHI_PF-146_20.0_2", "LHE_KHI_LHE KHI_PA-401_35.0_1",
"ISB_KHI_ISB KHI_PA-209_15.0_1", "ISB_KHI_ISB KHI_PK-301_40.0_0",
"ISB_KHI_ISB KHI_PA-205_35.0_1", "KHI_LHE_KHI LHE_PA-406_15.0_1",
"KHI_ISB_KHI ISB_PF-123_20.0_1", "ISB_KHI_ISB KHI_PA-201_35.0_1",
"KHI_ISB_KHI ISB_PK-300_40.0_0", "KHI_LHE_KHI LHE_PA-402_35.0_1",
"ISB_KHI_ISB KHI_ER-505_20.0_1", "ISB_KHI_ISB KHI_PF-122_20.0_2",
"ISB_KHI_ISB KHI_PA-207_15.0_1", "KHI_LHE_KHI LHE_PA-404_35.0_1",
"KHI_ISB_KHI ISB_PF-123_20.0_1", "ISB_KHI_ISB KHI_ER-503_20.0_1",
"ISB_GIL_ISB GIL_PK-605_20.0_0", "KHI_ISB_KHI ISB_PF-123_20.0_1",
"KHI_ISB_KHI ISB_PA-200_15.0_1", "ISB_KHI_ISB KHI_PF-122_20.0_2",
"KHI_LHE_KHI LHE_PA-404_35.0_1", "ISB_KHI_ISB KHI_PF-122_20.0_2",
"PEW_KHI_PEW KHI_PF-152_20.0_1", "LHE_KHI_LHE KHI_PK-303_20.0_0",
"KHI_ISB_KHI ISB_PA-222_35.0_1", "ISB_KHI_ISB KHI_PF-124_20.0_1"
), created_date = c("2021-04-20", "2021-05-27", "2021-02-13",
"2021-08-14", "2021-08-11", "2021-08-21", "2021-01-26", "2021-08-21",
"2021-05-24", "2021-09-15", "2021-06-05", "2021-07-19", "2021-09-29",
"2021-07-02", "2021-08-10", "2021-01-04", "2021-07-15", "2021-07-14",
"2021-08-13", "2021-01-11", "2021-09-13", "2021-09-20", "2021-05-27",
"2021-02-20", "2021-08-15", "2021-07-27", "2021-08-26", "2021-09-15",
"2021-08-02", "2021-06-25", "2021-05-15", "2021-08-26", "2021-07-30",
"2021-06-27", "2021-08-07", "2021-03-19", "2021-03-02", "2021-06-06",
"2021-08-15", "2021-06-27", "2021-09-19", "2021-07-28", "2021-08-09",
"2021-08-16", "2021-09-09", "2021-06-04", "2021-08-12", "2021-05-15",
"2021-07-26", "2021-05-27", "2021-08-12", "2021-08-02", "2021-01-26",
"2021-04-20", "2021-08-26", "2021-08-26", "2021-03-21", "2021-01-09",
"2021-04-23", "2021-01-04", "2021-08-13", "2021-06-22", "2021-05-31",
"2021-08-18", "2021-06-16", "2021-08-14", "2021-08-10", "2021-06-16",
"2021-04-08", "2021-05-20", "2021-06-22", "2021-04-20", "2021-01-05",
"2021-02-27", "2021-07-07", "2021-03-26", "2021-08-16", "2021-05-01",
"2021-07-31", "2021-06-14", "2021-06-16", "2021-03-25", "2021-09-14",
"2021-06-06", "2021-09-02", "2021-08-06", "2021-07-18", "2021-02-28",
"2021-04-28", "2021-09-19", "2021-08-25", "2021-06-17", "2021-06-07",
"2021-06-17", "2021-07-07", "2021-08-23", "2021-07-09", "2021-07-19",
"2021-07-14", "2021-05-21"), gross_fare = c(7796, 7427, 11504,
6870, 6580, 14945, 8697, 7524, 7124, 6785, 11858, 7524, 11500,
9525, 6785, 8739, 8200, 13560, 9045, 7400, 7524, 12500, 7458,
14000, 6570, 9525, 6220, 10545, 8310, 7900, 7820, 8410, 11285,
19892, 6810, 9800, 11441, 11900, 6570, 13592, 11500, 8300, 20380,
8525, 7340, 9707, 7870, 10655, 10545, 11798, 14645, 10545, 8650,
8650, 7870, 12945, 10799, 10227, 6765, 10227, 20120, 11045, 9403,
7870, 7124, 6570, 6810, 6531, 8605, 7124, 11072, 7390, 10227,
13435, 10530, 12280, 18945, 11147, 10545, 6531, 6620, 10799,
18480, 32702, 5606, 13560, 23895, 8027, 9655, 11500, 11990, 6620,
9403, 7620, 14645, 19105, 9000, 6440, 12645, 8025)), row.names = c(NA,
-100L), class = c("data.table", "data.frame")))
uj5u.com熱心網友回復:
我想這行代碼應該可以完成這項作業:
library(data.table)
flights[, .(min_day = min(gross_fare), max_day = max(gross_fare), diff = max(gross_fare) - min(gross_fare)), by = .(key, created_date)][]
uj5u.com熱心網友回復:
由于該函式findr回傳一個串列,因此無需將事情復雜化:
findr <- function(x) {list(max = max(x), min = min(x))}
flights[, c("max_day", "min_day") := findr(gross_fare), by = list(key, created_date)][]
為了還回的區別max和min,使用
findr2 <- function(x) {
list(max = max(x), min = min(x), diff = diff(range(x)))
}
flights[, c("max_day", "min_day", "diff_day") := findr2(gross_fare), by = list(key, created_date)][]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/338652.html
