使用具有不同降序和升序的列名字符向量以編程方式對R資料框進行排序-包含日期變數-有解無憂

給定一個包含兩列的 R 資料框：

dfc <-  data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4))

dfc
  col1 col2
     1    3
     4    5
     3   11
     3   10
     2    4

col1到時候我想先排序col2

我將名稱存盤在cols帶有列名的字串向量中

cols=c("col1","col2")

但是，我想有一種方法有時使用降序col1，也許升序col2

大多數指南（此處和此處）不使用列名的字符向量，盡管這在類似的問題中得到了回答，盡管這沒有回答降序/升序的問題 - 以及一種在它們之間輕松切換并放入另一列的方法例如。

我厭倦了找不到答案，因為我經常需要這個，所以我寫了一個函式，使用eval(parse(text="EXPRESSION_HERE"))

order_dataframe_by_cols <- function(dfc,cols=c("col1","col2"),dec_ace=NA) {
  # takes a data frame "dfc", and sorts it by each column in "cols",
  # in a descending or ascending order, defined by dec_ace by each col


  if (length(dec_ace) == 1) {
    if (is.na(dec_ace)) {dec_ace <- rep("",length(cols))}
  }

  str_eval <- "dfc <- dfc[order("
  ix2 <- ""

  for (ix in 1:length(cols)){
    if (ix > 1) {ix2 <- ","}
    str_eval <- paste(str_eval,ix2,dec_ace[ix],"dfc[,'",cols[ix],"']",sep="")
  }
  str_eval <- paste(str_eval,"),]")

  eval(parse(text=str_eval))


  return(dfc)
}

如此上升col1和下降col2

order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("","-"))
col1 col2
  1    3
  2    4
  3   11
  3   10
  4    5

然后上升col1上升col2

  order_dataframe_by_cols(dfc,cols=c("col1","col2"),dec_ace=c("",""))
  col1 col2
    1    3
    2    4
    3   10
    3   11
    4    5

注意 10 和 11 改變了位置

問題：

如果我有一個as.Date變數，或者as.POSIXct我也想排序的變數怎么辦

dfc2 <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))
dfc2 
  col1 col2       col3
     1    3 2015-04-11
     4    5 2016-04-11
     3   11 2017-04-11
     3   10 2018-04-11
     2    4 2019-04-11


order_dataframe_by_cols(dfc2,cols=c("col1","col3","col2"),dec_ace=c("","-",""))
Error in `-.Date`(dfc[, "col3"]) : unary - is not defined for "Date" objects

這是無法做到的；在對其他變數進行排序時，我無法以這種方式對日期進行排序。

我可以排序，但只有那一列

dfc[order(dfc[,"col3"]),]

因此，我需要一種使用列名的列向量對資料框進行排序的方法，并使用一種方法來定義適用于日期變數的單獨排序升序和降序。感謝您的閱讀。

uj5u.com熱心網友回復：

解決您的問題的一種可能方法：

dfc[do.call(order, c(dfc[cols], decreasing=list(c(FALSE, TRUE)), method="radix")), ]

  col1 col2
1    1    3
5    2    4
3    3   11
4    3   10
2    4    5

您的函式的簡化版本：

order_dataframe_by_cols = function(df, cols=c("col1","col2"), dec_ace=FALSE) {
  if(missing(dec_ace)) dec_ace = rep_len(dec_ace, length(cols))
  df[do.call(order, c(unname(df[cols]), decreasing=list(dec_ace), method="radix")),]
}

cols=c("col1","col2")

order_dataframe_by_cols(dfc, cols)                         # all ascending
order_dataframe_by_cols(dfc, cols, dec_ace=FALSE)          # same
order_dataframe_by_cols(dfc, cols, dec_ace=TRUE)           # all descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(FALSE, TRUE)) # first ascending, second descending
order_dataframe_by_cols(dfc, cols, dec_ace=c(TRUE, FALSE)) # second descending, first ascending

uj5u.com熱心網友回復：

從另一個問題概括我的解決方案以接受升序/降序引數：

sort_df_by_cols <- function (df, sort_key, sort_order) {
  order_tf <- c(asc = identity, desc = xtfrm)
  df[do.call("order", Map(\(k, a) a(k), df[, sort_key], order_tf[sort_order])), ]
}

如下呼叫：

df <- data.frame(
  var1 = c("b","a","b","a"),
  var2 = c("l","l","k","k"),
  var3 = c("t","w","x","t")
)

sort_df_by_cols(df, c("var1", "var2"), c("desc", "asc"))

…當然現有的軟體包已經為這個問題提供了其他/更好的解決方案，例如'dplyr'，它提供了arrange()功能。我一般建議使用這個函式而不是自己撰寫。

uj5u.com熱心網友回復：

我喜歡以上所有答案，因為他們使用dec_ace引數來回答 OP，盡管我自己發現這有點笨拙。

我找到了一種方法來執行此操作，同時- 在變數前面指定降序，類似于使用順序的原始方式：

dfc[order(-dfc$col1, dfc$col2),]

使用此功能：

order_dataframe_by_cols <- function(dfc, cols=NA,defaultToDecending=FALSE) {

  library(stringr)

  if (length(cols) == 1) {if (is.na(cols)) {cols <- names(dfc)}}

  dec_ace <- rep(defaultToDecending,length(cols))
  for (col in cols){if (substr(col,1,1) == "-") {dec_ace[which(cols == col)] <- TRUE} }
  cols <- str_remove(cols, "[ -]")

  dfc <- dfc[do.call(order, c(unname(dfc[cols]), decreasing=list(dec_ace), method="radix")),]

  return(dfc)
}

dfc <- data.frame(col1=c(1,4,3,3,2),col2=c(3,5,11,10,4),col3=c(as.Date(c("2015-04-11","2016-04-11","2017-04-11","2018-04-11","2019-04-11"))))

降序col1、col3升序和col2升序

order_dataframe_by_cols(dfc,cols=c("-col1","col3"," col2"))

升序col1、降序col3和col2升序

order_dataframe_by_cols(dfc,cols=c("col1","-col3"," col2"))

猜猜這只是口味問題。

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/522664.html

標籤：r排序

上一篇：如何在Python中對輸入進行排序？

下一篇：C 中四叉樹實作的奇怪問題