我有一個 data.table 如下 -
data = structure(list(date = c("2021-11-24", "2021-11-24", "2021-11-26",
"2021-11-24", "2021-11-26", "2021-11-24", "2021-11-24", "2021-11-26",
"2021-11-26", "2021-11-26", "2021-11-26"), open = c("NaN", "NaN",
"0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"
), high = c("NaN", "NaN", "0.43", "0.17", "0.19", "0.15", "NaN",
"NaN", "NaN", "NaN", "NaN"), low = c("NaN", "NaN", "0.43", "0.17",
"0.19", "0.15", "NaN", "NaN", "NaN", "NaN", "NaN"), close = c("NaN",
"NaN", "0.43", "0.17", "0.19", "0.15", "NaN", "NaN", "NaN", "NaN",
"NaN"), volume = c(0L, 0L, 2L, 10L, 75L, 1L, 0L, 0L, 0L, 0L,
0L)), row.names = c(NA, -11L), class = c("data.table", "data.frame"
))
我想從這個 data.table 中洗掉所有的NaN和Inf值。
date open high low close volume
1: 2021-11-24 NaN NaN NaN NaN 0
2: 2021-11-24 NaN NaN NaN NaN 0
3: 2021-11-26 0.43 0.43 0.43 0.43 2
4: 2021-11-24 0.17 0.17 0.17 0.17 10
5: 2021-11-26 0.19 0.19 0.19 0.19 75
6: 2021-11-24 0.15 0.15 0.15 0.15 1
7: 2021-11-24 NaN NaN NaN NaN 0
8: 2021-11-26 NaN NaN NaN NaN 0
9: 2021-11-26 NaN NaN NaN NaN 0
10: 2021-11-26 NaN NaN NaN NaN 0
11: 2021-11-26 NaN NaN NaN NaN 0
由于值,所有列open, high,都是字符型別。lowcloseNaN
有沒有一種快速的方法可以直接在 中洗掉 NaN data.table?
uj5u.com熱心網友回復:
你可以讓我們as.numeric轉換嗎?
result = na.omit(cbind(data[, .(date,volume)], data[, lapply(.SD, as.numeric), .SDcols = 2:5]))
輸出:
date volume open high low close
1: 2021-11-26 2 0.43 0.43 0.43 0.43
2: 2021-11-24 10 0.17 0.17 0.17 0.17
3: 2021-11-26 75 0.19 0.19 0.19 0.19
4: 2021-11-24 1 0.15 0.15 0.15 0.15
uj5u.com熱心網友回復:
使用 dplyr,最佳策略是獲取行索引,NaN然后過濾掉這些索引。
library(dplyr)
data$Row <- row.names(data)
rm_rw <- data[apply(data, 1,
function(X) any(X== "NaN"|X== "Inf")),] %>% pull(Row)
data[!row.names(data) %in% rm_rw ,] %>% select(-Row)
date open high low close volume
1: 2021-11-26 0.43 0.43 0.43 0.43 2
2: 2021-11-24 0.17 0.17 0.17 0.17 10
3: 2021-11-26 0.19 0.19 0.19 0.19 75
4: 2021-11-24 0.15 0.15 0.15 0.15 1
更新1
改any(X== "NaN"))到any(X== "NaN"|X== "Inf"))讓Inf還可以過濾掉
uj5u.com熱心網友回復:
一種方法是找到包含NaN以下內容的行的索引:
unique(which(data == NaN, arr.ind=T)[,1])
[1] 1 2 7 8 9 10 11
然后設定一個邏輯條件來洗掉這些行:
data[!unique(which(data == NaN, arr.ind=T)[,1])]
date open high low close volume
1: 2021-11-26 0.43 0.43 0.43 0.43 2
2: 2021-11-24 0.17 0.17 0.17 0.17 10
3: 2021-11-26 0.19 0.19 0.19 0.19 75
4: 2021-11-24 0.15 0.15 0.15 0.15 1
uj5u.com熱心網友回復:
基于的解決方案dtplyr:
library(dtplyr)
library(dplyr)
library(data.table)
data <- structure(
list(date=c("2021-11-24","2021-11-24","2021-11-26",
"2021-11-24","2021-11-26","2021-11-24",
"2021-11-24","2021-11-26","2021-11-26",
"2021-11-26","2021-11-26"),
open=c("NaN","NaN","0.43","0.17","0.19","0.15",
"NaN","NaN","NaN","NaN","NaN"),
high=c("NaN","NaN","0.43","0.17","0.19","0.15","NaN",
"NaN","NaN","NaN","NaN"),low=c("NaN","NaN","0.43","0.17","0.19","0.15","NaN","NaN","NaN","NaN","NaN"),close=c("NaN","NaN","0.43","0.17","0.19","0.15","NaN","NaN","NaN","NaN","NaN"),volume=c(0L,0L,2L,10L,75L,1L,0L,0L,0L,0L,0L)),row.names=c(NA,-11L),class=c("data.table","data.frame"))
data %>%
lazy_dt %>%
filter(across(2:5, ~ .x != "NaN")) %>%
as.data.table
#> date open high low close volume
#> 1: 2021-11-26 0.43 0.43 0.43 0.43 2
#> 2: 2021-11-24 0.17 0.17 0.17 0.17 10
#> 3: 2021-11-26 0.19 0.19 0.19 0.19 75
#> 4: 2021-11-24 0.15 0.15 0.15 0.15 1
uj5u.com熱心網友回復:
在NaN創建被引述,所以柱子不必要的型別已更改為character。
> str(data)
Classes ‘data.table’ and 'data.frame': 11 obs. of 6 variables:
$ date : chr "2021-11-24" "2021-11-24" "2021-11-26" "2021-11-24" ...
$ open : chr "NaN" "NaN" "0.43" "0.17" ...
$ high : chr "NaN" "NaN" "0.43" "0.17" ...
$ low : chr "NaN" "NaN" "0.43" "0.17" ...
$ close : chr "NaN" "NaN" "0.43" "0.17" ...
$ volume: int 0 0 2 10 75 1 0 0 0 0 ...
我們可能需要自動轉換型別,然后使用data.table方法 - 通過指定 'date' 以外的列回圈.SDcols,創建邏輯運算式即列值不是 NaN ( !is.nan) 并且 ( &) 是有限的 ( is.finite),Reduce邏輯向量到單個向量&和行子集
library(data.table)
data <- type.convert(data, as.is = TRUE)
out <- data[data[, Reduce(`&`, lapply(.SD, function(x)
!is.nan(x) & is.finite(x))), .SDcols = -1]]
out
date open high low close volume
1: 2021-11-26 0.43 0.43 0.43 0.43 2
2: 2021-11-24 0.17 0.17 0.17 0.17 10
3: 2021-11-26 0.19 0.19 0.19 0.19 75
4: 2021-11-24 0.15 0.15 0.15 0.15 1
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/368506.html
