Rdata.table按組排序，每組底部有“other”-有解無憂

我不能完全得到正確的語法。我有一個data.table我想先按分組列g1（有序因子）排序，然后按另一列降序排序的地方n。唯一的問題是我希望第三列的標記為“其他”的行g2出現在每組的底部，而不管它們的n.

例子：

library(data.table)

dt <- data.table(g1 = factor(rep(c('Australia', 'Mexico', 'Canada'), 3), levels = c('Australia', 'Canada', 'Mexico')),
                 g2 = rep(c('stuff', 'things', 'other'), each = 3),
                 n = c(1000, 2000, 3000, 5000, 100, 3500, 10000, 10000, 0))

這是預期的輸出，在 each 中g1，n除了g2 == 'other'始終位于底部的行之外，我們有降序排列：

         g1     g2     n
1: Australia things  5000
2: Australia  stuff  1000
3: Australia  other 10000
4:    Canada things  3500
5:    Canada  stuff  3000
6:    Canada  other     0
7:    Mexico  stuff  2000
8:    Mexico things   100
9:    Mexico  other 10000

uj5u.com熱心網友回復：

利用data.table::order和它的-反向排序：

dt[order(g1, g2 == "other", -n), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia things  5000
# 2: Australia  stuff  1000
# 3: Australia  other 10000
# 4:    Canada things  3500
# 5:    Canada  stuff  3000
# 6:    Canada  other     0
# 7:    Mexico  stuff  2000
# 8:    Mexico things   100
# 9:    Mexico  other 10000

我們添加g2 == "other"是因為您說過“其他”應該始終放在最后。例如，如果"stuff"是"abc"，那么我們可以看到行為的差異：

dt[ g2 == "stuff", g2 := "abc" ]
dt[order(g1, -n), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia  other 10000
# 2: Australia things  5000
# 3: Australia    abc  1000
# 4:    Canada things  3500
# 5:    Canada    abc  3000
# 6:    Canada  other     0
# 7:    Mexico  other 10000
# 8:    Mexico    abc  2000
# 9:    Mexico things   100

dt[order(g1, g2 == "other", -g2), ]
#           g1     g2     n
#       <fctr> <char> <num>
# 1: Australia things  5000
# 2: Australia    abc  1000
# 3: Australia  other 10000
# 4:    Canada things  3500
# 5:    Canada    abc  3000
# 6:    Canada  other     0
# 7:    Mexico things   100
# 8:    Mexico    abc  2000
# 9:    Mexico  other 10000

這樣做的一個缺點是setorder不能直接作業：

setorder(dt, g1, g2 == "other", -n)
# Error in setorderv(x, cols, order, na.last) : 
#   some columns are not in the data.table: ==,other

所以我們需要重新排序并重新分配回dt.

順便說一句：這是有效的，因為g2 == "other"決議為logical，是的，但是在對它們進行排序時被視為0（假）和1（真），因此假條件將出現在真條件之前。

轉載請註明出處，本文鏈接：https://www.uj5u.com/gongcheng/318529.html

標籤：r 数据表

上一篇：從tidyverse管道內將互動（）應用于用戶指定的列

下一篇：面板資料的增量求和值