對多個data.tables執行多項操作-有解無憂

我創建了 30 個表。他們的名字結構如下：
mdl_(種族)_(工資四分位數)。
(種族) 是以下之一：白人、黑人、西班牙裔、亞洲人、其他人或所有人。
(工資四分位數) 是以下之一：Q1、Q2、Q3、Q4 和 allQ。
由于我有 6 個種族類別和 5 個工資四分位數，因此我有 6*5 = 30 個物件！

例如：在工資分布的第 1 個四分位數中僅包含西班牙裔的線性模型 => mdl_hispanics_Q1
例如：包括所有種族和所有工資四分位數的線性模型 => mdl_all_allQ

所有表的格式都相同，當然值不同：

          Variables     Estimate   Std. Error    t value      Pr(>|t|)
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e 00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e 00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e 00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e 00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189

我想要做的是獲得一個包含 30 個值的數字向量，其中每個值是變數“forborn”的估計值，如果其統計顯著性 Pr(>|t|) < 0.1，否則為零。我是 R 的初學者，只知道如何逐表做這個表。這非常乏味并且占用了大量代碼。有沒有一種方法可以利用表的名稱相似的事實并在一次掃描中回圈此操作？

uj5u.com熱心網友回復：

您可以嘗試mget遍歷資料幀，然后使用sapply.

編輯，更改資料框名稱以匹配您的描述。

ls()
#[1] "mdl_hispanics_..."  "mdl_blacks_..." etc.

as.vector( sapply( mget( 
  grep("mdl_.*[whites|blacks|hispanics|asians|others|all]", 
  ls(), value=T) ), function(x) 
  ifelse( x[x$Variables == "forborn","Pr(>|t|)"] < 0.1,
          x[x$Variables == "forborn","Pr(>|t|)"], 0) ) )
#[1] 2.300944e-32 2.300944e-32 0.000000e 00

uj5u.com熱心網友回復：

這可能被認為是一種更好的方式，和它回傳的向量估計為forborn如果p-值<0.1，或0 [未p值本身]

rbindlist(lapply(ls(pattern="mdl_"),get))[
  Variables=="forborn",fifelse(`Pr(>|t|)`<0.1,Estimate,0)
  ]

注意：如果您需要對物件進行進一步的具體化，只需調整pattern引數ls()

uj5u.com熱心網友回復：

撰寫一個函式來提取以Estimatep 值為條件的列并將lapply其添加到串列中。

library(data.table)

fextrac <- function(x){
  y <- x[, Estimate := ifelse(`Pr(>|t|)` < 0.1, Estimate, 0)][["Estimate"]]
  y[x$Variables == "forborn"]
}

Estimates_list <- sapply(dt_list, fextrac)
Estimates_list
#[1] -0.6129412 -0.6129412

測驗資料

dt1 <- read.table(text = "
         Variables     Estimate   'Std. Error'    't value'      'Pr(>|t|)'
 1:       Intercept 37.231178895 9.486380e-02 392.469814  0.000000e 00
 2:         forborn -0.612941167 5.174224e-02 -11.846051  2.300944e-32
 3:          female -3.238655089 4.797890e-02 -67.501655  0.000000e 00
 4:        numchild  0.583390602 2.239027e-02  26.055543 1.841656e-149
 5: numchild_female  0.371351058 9.086739e-02   4.086736  4.376191e-05
 6:              hs  0.173864095 9.180975e-02   1.893743  5.826025e-02
 7:         somecol  0.595612050 9.407851e-02   6.331011  2.439689e-10
 8:         college  1.593917949 9.929766e-02  16.051918  5.923264e-58
 9:        advanced  0.171443556 1.983952e-03  86.415175  0.000000e 00
10:              rw -0.001207904 1.460021e-05 -82.731964  0.000000e 00
11:      rw_squared -0.954029880 3.252520e-02 -29.332024 8.456547e-189
", header = TRUE, check.names = FALSE)

set.seed(2021)
dt2 <- dt1
dt2$`Pr(>|t|)`[sample(nrow(dt2), nrow(dt2)/3)] <- 0.1

setDT(dt1)
setDT(dt2)
dt_list <- list(dt1, dt2)

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/373001.html

標籤：r 循环数据表

上一篇：在R中的資料幀串列上運行PCA

下一篇：有沒有辦法改變ggplot中線圖中的x軸比例？