所以我有這個資料框
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6
# 1 A1 x <NA> <NA> <NA> <NA> <NA>
# 2 A2 <NA> x <NA> <NA> <NA> <NA>
# 3 B1 <NA> <NA> x <NA> <NA> <NA>
# 4 B2 <NA> <NA> <NA> <NA> x <NA>
# 5 B3 <NA> <NA> <NA> x <NA> <NA>
# 6 D2 <NA> <NA> <NA> <NA> <NA> x
# 7 F6 <NA> <NA> <NA> <NA> x <NA>
我想向 databackend 添加一列,根據“x”在 databackend 上的哪一列顯示從 1 到 6 的值。所以附加列看起來像這樣
# Name Comp1 Con2 Vis3 Tra4 Pred5 Adap6 stage
# 1 A1 x <NA> <NA> <NA> <NA> <NA> 1
# 2 A2 <NA> x <NA> <NA> <NA> <NA> 2
# 3 B1 <NA> <NA> x <NA> <NA> <NA> 3
# 4 B2 <NA> <NA> <NA> <NA> x <NA> 5
# 5 B3 <NA> <NA> <NA> x <NA> <NA> 4
# 6 D2 <NA> <NA> <NA> <NA> <NA> x 6
# 7 F6 <NA> <NA> <NA> <NA> x <NA> 5
由于我的資料框在原始腳本中非常大,因此我正在尋找最快(自動)的方法來執行此操作。我嘗試過使用 for 回圈,但它需要的時間太長。
資料
databackend <- structure(list(Name = c("A1", "A2", "B1", "B2", "B3", "D2", "F6"
), Comp1 = c("x", NA, NA, NA, NA, NA, NA), Con2 = c(NA, "x",
NA, NA, NA, NA, NA), Vis3 = c(NA, NA, "x", NA, NA, NA, NA), Tra4 = c(NA,
NA, NA, NA, "x", NA, NA), Pred5 = c(NA, NA, NA, "x", NA, NA,
"x"), Adap6 = c(NA, NA, NA, NA, NA, "x", NA), stage = c(1, 2,
3, 5, 4, 6, 5)), row.names = c(NA, -7L), class = "data.frame")
uj5u.com熱心網友回復:
您可以這樣做(假設在您的示例中每行都有一個“x”):
max.col(!is.na(databackend[-1]))
[1] 1 2 3 5 4 6 5
uj5u.com熱心網友回復:
比較簡單
> tmp=which(databackend[,-1]=="x",arr.ind=T)
> tmp[order(tmp[,"row"]),"col"]
[1] 1 2 3 5 4 6 5
uj5u.com熱心網友回復:
使用which和apply:
apply(databackend[-1], 1, \(x) which(x == "x"))
#[1] 1 2 3 5 4 6 5
一個基準,max.col是最快的:
microbenchmark::microbenchmark(
apply = apply(databackend[-1], 1, \(x) which(x == "x")),
which = {tmp=which(databackend[,-1]=="x",arr.ind=T)
tmp[order(tmp[,"row"]),"col"]},
max.col = max.col(!is.na(databackend[-1]))
)
Unit: microseconds
expr min lq mean median uq max neval
apply 149.4 165.95 232.308 196.20 216.95 2882.4 100
which 118.9 144.35 184.684 158.10 190.45 907.0 100
max.col 51.5 73.00 88.302 79.45 94.40 326.1 100
uj5u.com熱心網友回復:
我們可以試試
> rowSums(col(databackend[-1])*(!is.na(databackend[-1])))
[1] 1 2 3 5 4 6 5
轉載請註明出處,本文鏈接:https://www.uj5u.com/qianduan/478308.html
