這可能是一個長期的嘗試,但我試圖在某些條件下將 df1 中的值添加到 df2 中。讓我舉例說明我的資料框:
df1
Date Time TimeSlot Behavior BehaviorTime
10.30 1030 Morning Visitors Startle 142
10.30 1030 Morning Visitors Retreat 155
10.30 1030 Morning Visitors Chase 187
10.31 830 Keeper Feeding Startle 133
10.31 830 Keeper Feeding Chase 139
df2
SessionStart ScanTime Val1 Val2 Temp Weather
10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny
進一步解釋一下,df1 是一個記錄動物行為的資料框。會議于 10/30/21 (10.30) 大約上午 10:30 (1030) 開始。每次行為發生時,觀察者都會記錄下來。“BehaviorTime”列是觀察者記錄行為的時間(以秒為單位)。因此,df1 的第一次觀察是在觀察會話開始 142 秒時記錄的。
第二個資料框 df2 記錄觀察會話的環境。每個會話大約 30 分鐘,并且每分鐘記錄某些值(每個會話 30 次觀察)。有些值改變了,有些值保持不變。
我想找到一種方法,將 df1 中的“行為”列在其發生期間整合到 df2 中。df1 中的時間是近似值,而 df2 中的時間是準確的。就像我說的,“BehaviorTime”列代表進入會話的秒數。因此,在 10:33:42(從 df2 的觀察開始時間)之后的 142 秒(df1 中的第一次觀察)將是 10:36:24,這意味著“驚嚇”值將包含在 10:36 的 df2 行中:42 因為那個時間最接近計算的時間。本質上,它看起來像這樣:
df3
SessionStart ScanTime Val1 Val2 Temp Weather Behavior
10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy Startle
10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy Retreat
10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy Chase
10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny Startle
10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny Chase
10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny
我真的希望這是有道理的。我在 dplyr 中嘗試了不同的功能,但老實說我什至不知道從哪里開始。任何幫助表示贊賞!
uj5u.com熱心網友回復:
您想要的輸出似乎關閉了。10:30:00 142 秒 == 10:32:12,而不是 10:33:42。
這是一種data.table使用滾動連接的方法
樣本資料
library(data.table)
df1 <- fread('Date Time TimeSlot Behavior BehaviorTime
10.30 1030 "Morning Visitors" Startle 142
10.30 1030 "Morning Visitors" Retreat 155
10.30 1030 "Morning Visitors" Chase 187
10.31 830 "Keeper Feeding" Startle 133
10.31 830 "Keeper Feeding" Chase 139',
colClasses = "character")
df2 <- fread('SessionStart ScanTime Val1 Val2 Temp Weather
"10/30/21 10:33:42" 10:34:42 A 60-70 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:35:42 B 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:36:42 A 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:37:42 B 70-80 68 "Partly Cloudy"
"10/30/21 10:33:42" 10:38:42 C 70-80 68 "Partly Cloudy"
"10/31/21 08:35:23" 08:36:23 A 40-50 77 Sunny
"10/31/21 08:35:23" 08:37:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:38:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:39:23 C 90-100 77 Sunny
"10/31/21 08:35:23" 08:40:23 C 90-100 77 Sunny',
colClasses = "character")
代碼
# Create some actual timestamps to calculate with
df1[, timestamp :=
as.POSIXct(
paste0(Date, ".2021 ", sprintf("d", as.numeric(Time))),
format = "%m.%d.%Y %H%M") as.numeric(BehaviorTime)]
df2[, timestamp :=
as.POSIXct(
paste0(gsub("(.*) .*", "\\1", SessionStart), " ", ScanTime),
format = "%m/%d/%y %H:%M:%S")][]
# Create rowid's for joining later in the process
df2[, rowid := .I][]
# Perform rolling joi to nearest timestamp, then left join
df2[df1, .(timestamp = x.timestamp, Behavior), on = .(timestamp), roll = -Inf][df2, on = .(timestamp)]
# timestamp Behavior SessionStart ScanTime Val1 Val2 Temp Weather
# 1: 2021-10-30 10:34:42 Startle 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 2: 2021-10-30 10:34:42 Retreat 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 3: 2021-10-30 10:34:42 Chase 10/30/21 10:33:42 10:34:42 A 60-70 68 Partly Cloudy
# 4: 2021-10-30 10:35:42 <NA> 10/30/21 10:33:42 10:35:42 B 70-80 68 Partly Cloudy
# 5: 2021-10-30 10:36:42 <NA> 10/30/21 10:33:42 10:36:42 A 70-80 68 Partly Cloudy
# 6: 2021-10-30 10:37:42 <NA> 10/30/21 10:33:42 10:37:42 B 70-80 68 Partly Cloudy
# 7: 2021-10-30 10:38:42 <NA> 10/30/21 10:33:42 10:38:42 C 70-80 68 Partly Cloudy
# 8: 2021-10-31 08:36:23 Startle 10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
# 9: 2021-10-31 08:36:23 Chase 10/31/21 08:35:23 08:36:23 A 40-50 77 Sunny
#10: 2021-10-31 08:37:23 <NA> 10/31/21 08:35:23 08:37:23 C 90-100 77 Sunny
#11: 2021-10-31 08:38:23 <NA> 10/31/21 08:35:23 08:38:23 C 90-100 77 Sunny
#12: 2021-10-31 08:39:23 <NA> 10/31/21 08:35:23 08:39:23 C 90-100 77 Sunny
#13: 2021-10-31 08:40:23 <NA> 10/31/21 08:35:23 08:40:23 C 90-100 77 Sunny
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/346314.html
上一篇:如何根據多個變數制作條件變數?
