我有一個看起來像這樣的資料集:
Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0
我想填寫 10-31 和 11-1 的值,以包含前一個交易日 (10-30) 的值。這在 R 中如何輕松實作?我覺得圖書館(tidyr)似乎完全適合這張照片?
預期代表將是:
Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0
請求的 dput 輸出
structure(list(Date = c("2020-10-28", "2020-10-29", "2020-10-30",
"2020-11-02", "2020-11-03", "2020-11-04", "2020-11-05", "2020-11-06",
"2020-11-09", "2020-11-10"), Open = c(1384L, 1297L, 1283L, 1284L,
1263L, 1224L, 1194L, 1196L, 1207L, 1200L), High = c(1384L, 1297L,
1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L), Low = c(1384L,
1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
), Close = c(1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L,
1196L, 1207L, 1200L), Adjusted_close = c(1384L, 1297L, 1283L,
1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L), Volume = c(0L,
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), row.names = c(NA, 10L), class = "data.frame")
uj5u.com熱心網友回復:
第一個日期必須是日期格式
df$Date = as.Date(df$Date)
df %>%
full_join(data.frame(Date = seq(min(df$Date), max(df$Date), by = "days")),by = "Date") %>%
arrange(Date) %>%
fill(everything())
然后與僅包含資料庫中整個日期序列的資料進行連接,我們對其進行排序并使用填充函式來填充它們
uj5u.com熱心網友回復:
解決方案
這是 中的一個解決方案tidyverse,它利用該tidyr::fill()函式來填充較早行的值:
library(tidyverse)
# ...
# Code to generate 'my_data'.
# ...
my_data %>%
# Ensure 'Date' column is proper datatype.
mutate(Date = as.Date(Date)) %>%
# Link to full range of dates, with blank rows for missing dates.
right_join(
# A temporary dataset with the full range of 'Date's.
tibble(Date = seq(from = min(.$Date), to = max(.$Date), by = "days")),
by = "Date"
) %>%
# Sort for filling: earlier above later.
arrange(Date) %>%
# Fill blank rows with values above.
fill(everything(), .direction = "down")
結果
鑒于my_data像data.frame這里轉載
my_data <- structure(
list(
Date = c(
"2020-10-28", "2020-10-29", "2020-10-30", "2020-11-02", "2020-11-03",
"2020-11-04", "2020-11-05", "2020-11-06", "2020-11-09", "2020-11-10"
),
Open = c(
1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
),
High = c(
1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
),
Low = c(
1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
),
Close = c(
1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
),
Adjusted_close = c(
1384L, 1297L, 1283L, 1284L, 1263L, 1224L, 1194L, 1196L, 1207L, 1200L
),
Volume = c(
0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L
)
),
row.names = c(NA, 10L),
class = "data.frame"
)
這個解決方案應該產生data.frame這樣的:
Date Open High Low Close Adjusted_close Volume
1 2020-10-28 1384 1384 1384 1384 1384 0
2 2020-10-29 1297 1297 1297 1297 1297 0
3 2020-10-30 1283 1283 1283 1283 1283 0
4 2020-10-31 1283 1283 1283 1283 1283 0
5 2020-11-01 1283 1283 1283 1283 1283 0
6 2020-11-02 1284 1284 1284 1284 1284 0
7 2020-11-03 1263 1263 1263 1263 1263 0
8 2020-11-04 1224 1224 1224 1224 1224 0
9 2020-11-05 1194 1194 1194 1194 1194 0
10 2020-11-06 1196 1196 1196 1196 1196 0
11 2020-11-07 1196 1196 1196 1196 1196 0
12 2020-11-08 1196 1196 1196 1196 1196 0
13 2020-11-09 1207 1207 1207 1207 1207 0
14 2020-11-10 1200 1200 1200 1200 1200 0
uj5u.com熱心網友回復:
1)使用(這也轉換為類)轉換為zoo類系列,然后將所有日期與. 使用 填充缺失值,最后使用轉換回資料框。如果結果是動物園物件沒問題,則省略該部分。zread.zooDateDatezna.locffortify.zoofortify.zoo
library(zoo)
z <- read.zoo(dat)
out1 <- merge(z, zoo(, seq(start(z), end(z), "day"))) |>
na.locf() |>
fortify.zoo(name = "Date")
# check - target is defined in Note at the end
identical(out1, transform(target, Date = as.Date(Date)))
## [1] TRUE
2)在這個替代方案中,我們使用以下管道。而不是使用merge.zoo,如上所述,這轉換為 ts 類并回傳以擴展日期。
- 轉換
dat為zoo類,這也將索引轉換為Date類。 - 然后將其轉換為
ts類。由于該類僅支持規則間隔的系列,因此轉換將用 NA 填充與缺失日期相對應的值。 na.locf然后將填寫那些 NA。- 用于
fortify.zoo將其轉換回資料框。 - 由于
tsclass 不支持 Date 索引,因此此時 Date 列只是數字,因此將它們轉換回Dateclass。
library(zoo)
out2 <- dat |>
read.zoo() |>
as.ts() |>
na.locf() |>
fortify.zoo(name = "Date") |>
transform(Date = as.Date(Date))
# check - target is defined in Note at the end
identical(out2, transform(target, Date = as.Date(Date)))
## [1] TRUE
筆記
可重現形式的輸入dat和輸出target假定為:
Lines <- "Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0"
dat <- read.csv(text = Lines, strip.white = TRUE)
Lines2 <- "Date,Open,High,Low,Close,Adjusted_close,Volume
2020-10-28,1384,1384,1384,1384,1384,0
2020-10-29,1297,1297,1297,1297,1297,0
2020-10-30,1283,1283,1283,1283,1283,0
2020-10-31,1283,1283,1283,1283,1283,0
2020-11-01,1283,1283,1283,1283,1283,0
2020-11-02,1284,1284,1284,1284,1284,0
2020-11-03,1263,1263,1263,1263,1263,0
2020-11-04,1224,1224,1224,1224,1224,0
2020-11-05,1194,1194,1194,1194,1194,0
2020-11-06,1196,1196,1196,1196,1196,0
2020-11-07,1196,1196,1196,1196,1196,0
2020-11-08,1196,1196,1196,1196,1196,0
2020-11-09,1207,1207,1207,1207,1207,0
2020-11-10,1200,1200,1200,1200,1200,0"
target <- read.csv(text = Lines2, strip.white = TRUE)
轉載請註明出處,本文鏈接:https://www.uj5u.com/shujuku/340135.html
