我想按條件“年”對 data.table 進行子集化。基本上我想要每組匹配給定年份的 dt 資料。但是,有些組在所有年份都沒有完整的時間線,因此我想回傳每個組最近一年的資料,因此每個組都有所選年份的資料(無論那是否是正確的年份, 或不)。
library(data.table)
# make dummy data
dt <- data.table(Group = c(rep("A", 5),rep("B", 3),rep("C", 5),rep("D", 2)),
x = sample(1:10,15, rep=T), Year = c(2011:2015, 2013:2015, 2011:2015, 2014:2015))
# subset by, e.g., Year == 2015 is fine, but I want a full result for ANY
# year chosen, such as 2012, by using the closest entry in time, per group.
# Attempt;
y <- 2012
dt[Year == which.min(abs(Year - y)), .SD, by = Group]
Empty data.table (0 rows and 3 cols): Group,x,Year
這個例子中的結果應該是;
Group x Year
1: A 4 2012
2: B 7 2013
3: C 2 2012
4: D 3 2014
uj5u.com熱心網友回復:
你很接近:使用which.min(abs(Year - y))很好,但需要.SD在部分的 -subset內j。
dt[, .SD[which.min(abs(Year - y)),], Group]
# Group x Year
# <char> <int> <int>
# 1: A 5 2012
# 2: B 4 2013
# 3: C 8 2012
# 4: D 5 2014
可重現的資料
set.seed(42)
dt <- data.table(Group = c(rep("A", 5),rep("B", 3),rep("C", 5),rep("D", 2)), x = sample(1:10,15, rep=T), Year = c(2011:2015, 2013:2015, 2011:2015, 2014:2015))
dt
# Group x Year
# <char> <int> <int>
# 1: A 1 2011
# 2: A 5 2012
# 3: A 1 2013
# 4: A 9 2014
# 5: A 10 2015
# 6: B 4 2013
# 7: B 2 2014
# 8: B 10 2015
# 9: C 1 2011
# 10: C 8 2012
# 11: C 7 2013
# 12: C 4 2014
# 13: C 9 2015
# 14: D 5 2014
# 15: D 4 2015
y <- 2012
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/522763.html
標籤:r数据表
