我是 R 的新手,無法理解為什么一些非常基本的腳本在 Windows 環境中不執行一種熱編碼,而在 linux 環境中卻表現得非常好。由于我必須在失敗的 Windows 環境中作業,我想讓腳本執行一種熱編碼。
這發生在視窗內(一個熱失敗)
> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19043)
Matrix products: default
locale:
[1] LC_COLLATE=German_Germany.1252 LC_CTYPE=German_Germany.1252 LC_MONETARY=German_Germany.1252
[4] LC_NUMERIC=C LC_TIME=German_Germany.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] data.table_1.14.2 mltools_0.3.5
loaded via a namespace (and not attached):
[1] compiler_4.1.1 Matrix_1.3-4 tools_4.1.1 grid_4.1.1 lattice_0.20-44
>
> customers <- data.frame(
id=c(10, 20, 30, 40, 50),
gender=c('male', 'female', 'female', 'male', 'female'),
mood=c('happy', 'sad', 'happy', 'sad','happy'),
outcome=c(1, 1, 0, 0, 0))
>
> customers
id gender mood outcome
1 10 male happy 1
2 20 female sad 1
3 30 female happy 0
4 40 male sad 0
5 50 female happy 0
>
> library(data.table)
> library(mltools)
>
> customers_1h <- one_hot(as.data.table(customers))
>
> customers_1h
id gender mood outcome
1: 10 male happy 1
2: 20 female sad 1
3: 30 female happy 0
4: 40 male sad 0
5: 50 female happy 0
雖然這是我期望發生的事情 - 一種熱編碼
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-suse-linux-gnu (64-bit)
Running under: openSUSE Leap 15.3
Matrix products: default
BLAS: /usr/lib64/R/lib/libRblas.so
LAPACK: /usr/lib64/R/lib/libRlapack.so
locale:
[1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C LC_TIME=de_DE.UTF-8
[4] LC_COLLATE=de_DE.UTF-8 LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
[7] LC_PAPER=de_DE.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
attached base packages:
[1] stats graphics grDevices utils datasets methods base
loaded via a namespace (and not attached):
[1] compiler_3.5.0 tools_3.5.0
>
> customers <- data.frame(
id=c(10, 20, 30, 40, 50),
gender=c('male', 'female', 'female', 'male', 'female'),
mood=c('happy', 'sad', 'happy', 'sad','happy'),
outcome=c(1, 1, 0, 0, 0))
>
> customers
id gender mood outcome
1 10 male happy 1
2 20 female sad 1
3 30 female happy 0
4 40 male sad 0
5 50 female happy 0
>
> library(data.table)
data.table 1.14.2 using 8 threads (see ?getDTthreads). Latest news: r-datatable.com
> library(mltools)
>
> customers_1h <- one_hot(as.data.table(customers))
>
> customers_1h
id gender_female gender_male mood_happy mood_sad outcome
1: 10 0 1 1 0 1
2: 20 1 0 0 1 1
3: 30 1 0 1 0 0
4: 40 0 1 0 1 0
5: 50 1 0 1 0 0
至少似乎安裝了相同的軟體包。那么為什么在沒有至少一些錯誤的情況下不會發生一種熱編碼呢?任何人都可以給我一個提示,我是如何讓 Windows 運行的?
提前謝謝了
克里斯
uj5u.com熱心網友回復:
我認為這與您的 R 版本有關,而不是平臺。創建 data.frames 的關鍵默認值之一stringsAsFactors,獲得了一個新的默認值(=FALSE)在 R 4.0 中,經過多年絆倒毫無戒心的新用戶。但是,某些包,例如似乎mltools,期望使用創建的資料框型別舊的默認值,stringsAsFactors = TRUE。更多資訊:https : //developer.r-project.org/Blog/public/2020/02/16/stringsasfactors/index.html
我能夠復制該問題并可以通過設定stringsAsFactors = TRUE. (順便說一句,它看起來mltools::onehot需要一個 data.table 作為輸入,所以我不確定有沒有辦法避免使用該包。)
不起作用:
customers <- data.frame(
id=c(10, 20, 30, 40, 50),
gender=c('male', 'female', 'female', 'male', 'female'),
mood=c('happy', 'sad', 'happy', 'sad','happy'),
outcome=c(1, 1, 0, 0, 0))
mltools::one_hot(data.table::as.data.table(customers))
id gender mood outcome
1: 10 male happy 1
2: 20 female sad 1
3: 30 female happy 0
4: 40 male sad 0
5: 50 female happy 0
作品:
customers <- data.frame(
id=c(10, 20, 30, 40, 50),
gender=c('male', 'female', 'female', 'male', 'female'),
mood=c('happy', 'sad', 'happy', 'sad','happy'),
outcome=c(1, 1, 0, 0, 0), stringsAsFactors = TRUE)
mltools::one_hot(data.table::as.data.table(customers))
id gender_female gender_male mood_happy mood_sad outcome
1: 10 0 1 1 0 1
2: 20 1 0 0 1 1
3: 30 1 0 1 0 0
4: 40 0 1 0 1 0
5: 50 1 0 1 0 0
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325667.html
標籤:r
