我找不到任何以前的問題來解決嵌套串列中的這些步驟。我自己的 attems 也沒有讓我到任何地方!
我有一個嵌套串列df。
- 我想將所有 data.frames 中前 3 列的列名更改為
c("one","two","three"). - 在每個資料框中,要保留前 3 列以及與串列中的資料框名稱相同的列。
- 現在每個資料框有 4 列。在每個資料框中,如果第四列的值大于 3,我想保留第二列中的值。
- 回傳一個嵌套串列,其中包含每個資料框的名稱和第二列中的選定值(在步驟 4 中)。
Purrr和dplyr方法是首選,但其他一切都非常感謝!
> dput(map_depth(df,1, head))
list(`CD8_C01-LEF1` = structure(list(...1 = c("1236", "6194",
"51176", "6402", "6137", "1937"), ...2 = c("CCR7", "RPS6", "LEF1",
"SELL", "RPL13", "EEF1G"), ...3 = c(448.275813024615, 114.565282822255,
405.993571415472, 352.462886197845, 152.430598462657, 73.5226212775651
), `P-value*` = c(0, 2.35914832807463e-150, 0, 0, 1.03146807397557e-195,
3.00681346250943e-98), `CD8_C01-LEF1` = c(6.3388353508401, 1.36075129906401,
5.11667843995657, 5.22902495053118, 1.35703181746742, 1.72815687302818
), `CD8_C02-GPR183` = c(2.71993044636725, 0.755445092850178,
2.26029822474036, 3.57732840656951, 0.757664532314421, 0.732003573596204
), `CD8_C03-CX3CR1` = c(-2.50016459757821, 0.0430813598361915,
-1.47763877045973, -1.31104077043168, -0.118054173396857, -0.217984797372657
), `CD8_C04-GZMK` = c(-0.639352384551204, -0.304854019068466,
-1.400271288872, -1.56965980479594, -0.128422617265835, -0.701864111617954
), `CD8_C05-CD6` = c(-2.35873754058284, -0.115888861319928, -2.08628173736428,
-3.32630706764402, -0.177640817498698, -0.215754243123614), `CD8_C06-CD160` = c(-2.85558322130952,
-0.29530343951866, -2.20232116143474, -3.274807762691, -0.440783845861116,
-0.56207661416919), `CD8_C07-LAYN` = c(-2.75671138163062, -0.887003245107014,
-2.40845402752497, -3.47698326675668, -1.03656381624963, -1.46468960616135
), `CD8_C08-SLC4A10` = c(-2.68199272253543, 0.0292368512820967,
-2.1581654239029, -2.99895134853712, 0.0615744908900675, 0.192173783941343
)), row.names = c(NA, 6L), class = "data.frame"), `CD8_C02-GPR183` = structure(list(
...1 = c("3575", "4050", "1901", "6653", "1880", "10628"),
...2 = c("IL7R", "LTB", "S1PR1", "SORL1", "GPR183", "TXNIP"
), ...3 = c(268.347035159053, 151.397715576146, 423.815475272167,
154.131971403975, 161.502687932662, 138.188069200824), `P-value*` = c(0,
1.63481853000449e-194, 0, 1.09616441981898e-197, 3.47999420200636e-206,
5.87606326954945e-179), `CD8_C01-LEF1` = c(2.25872137515665,
1.06433926285014, 2.06890434595653, 1.77222927526522, -2.32256398023726,
1.17445992511194), `CD8_C02-GPR183` = c(3.58534594694992,
2.33774626980998, 3.1044712936119, 3.00075778716827, 1.54874669286004,
2.11053414857411), `CD8_C03-CX3CR1` = c(-2.73122665345433,
-3.23251051546321, 2.76359001828421, 0.899851788567591, -3.4595583469893,
1.9924219816788), `CD8_C04-GZMK` = c(-1.20359289904198, -2.27859013855459,
-0.289843306560729, 0.0930099548084882, 0.293766916539111,
-1.05998934689132), `CD8_C05-CD6` = c(0.771026257612103,
-1.84446654315228, -1.92859019625536, -0.993527571866541,
-0.517242518264243, -1.05505195656161), `CD8_C06-CD160` = c(-1.26433565787961,
-3.62072638085859, -1.99838091859197, -2.66224984657089,
-3.84677781455005, -0.741084525734145), `CD8_C07-LAYN` = c(-4.85420539962432,
-3.79535857695107, -2.07599716553024, -2.41001692585172,
-3.66993376805675, -1.90910214659534), `CD8_C08-SLC4A10` = c(1.79563839118781,
0.431971358693421, 0.24665792844753, 0.820564247625701, -0.941462395796914,
0.224912511574641)), row.names = c(NA, 6L), class = "data.frame"),
`CD8_C03-CX3CR1` = structure(list(...1 = c("5341", "1524",
"83888", "2214", "343413", "10219"), ...2 = c("PLEK", "CX3CR1",
"FGFBP2", "FCGR3A", "FCRL6", "KLRG1"), ...3 = c(372.816216710618,
713.554708746553, 575.834099328186, 419.996034284325, 215.715234731706,
281.827177706662), `P-value*` = c("0", "0", "0", "0", "3.5450627744914998E-266",
"0"), `CD8_C01-LEF1` = c(-1.34745098111019, -0.39476162886016,
-0.248194028712413, -0.326944139043036, -0.833877751680806,
-0.822668603983214), `CD8_C02-GPR183` = c(0.50737446056126,
-0.495638146054913, -0.484905896571723, -0.125753818325312,
0.0263098770399738, 0.894340812937189), `CD8_C03-CX3CR1` = c(6.36825282208761,
5.38301238794739, 5.26196506464758, 5.6197563760267, 5.8532850807879,
5.36851683724817), `CD8_C04-GZMK` = c(1.44463895049283, -0.513803138075432,
-0.125340966094923, 0.2447981258131, 1.34537977512099, 2.10784813093189
), `CD8_C05-CD6` = c(-0.718776566594413, -0.795121492384525,
-0.681892196238474, -0.421395883952147, 0.0987360993173341,
-1.35585804120358), `CD8_C06-CD160` = c(-0.550964233191398,
-0.794078725052049, -0.707741972359531, -0.156207202527366,
2.24842830259497, -1.28977809817504), `CD8_C07-LAYN` = c(0.0641870785667258,
-0.785201010640904, -0.631939964779986, -0.340799120353511,
0.271892089522186, 0.236064375692484), `CD8_C08-SLC4A10` = c(1.40102283829925,
-0.158585496249154, -0.056110756095033, 0.00915832466806331,
-0.085141865592199, 3.78847417230501)), row.names = c(NA,
6L), class = "data.frame"))
uj5u.com熱心網友回復:
這是一個purrr解決dplyr方案:
library(tidyverse)
map2(df_list, names(df_list),
\(dat, name) {
dat |>
select(one = ...1,
two = ...2,
three = ...3,
all_of(name)) |>
(\(d) filter(d, d[,4] > 3))() |>
pull(two)
}
)
#> $`CD8_C01-LEF1`
#> [1] "CCR7" "LEF1" "SELL"
#>
#> $`CD8_C02-GPR183`
#> [1] "IL7R" "S1PR1" "SORL1"
#>
#> $`CD8_C03-CX3CR1`
#> [1] "PLEK" "CX3CR1" "FGFBP2" "FCGR3A" "FCRL6" "KLRG1"
編輯:解釋
map2= 我在這里使用它是因為您有一個資料框串列并且map可以很好地與串列一起使用。我使用“2”變體是因為您還想根據串列名稱選擇列。
\(dat, name)= 使用來自 的兩個輸入創建一個匿名函式map2,其中我將資料定義為dat并將串列的名稱定義為name。
select(one = ...1, two = ...2, three = ...3, all_of(name))= 在這里,我根據您在問題中的要求選擇并重命名前三列,并且我還選擇了帶有all_of(name). 請記住,這name是在匿名函式中為串列名稱定義的變數名稱。
(\(d) filter(d, d[,4] > 3))()= 這是一個有點時髦的語法,因為我喜歡使用原生管道運算子 ( |>) 而不是magritr管道運算子 ( %>%)。這意味著我創建了另一個匿名函式 ( \(d)),它將當前資料定義為d. 然后我filter d基于第 4 列大于 3(即 d[,4] > 3)。如果使用magritr管道,則可以簡化為filter(.[,4] > 3). 更好的是使用非標準評估來完全避免使用匿名函式,但是我很難弄清楚正確使用{{}}, quo,enquo和!!whith 參考的列名。
pull(two)= 最后,我們只從名為 的列中選擇值two。
編輯 2:清理代碼。
我想出了非標準的 eval 來清理奇怪的語法。
map2(df_list, names(df_list),
\(dat, name) {
dat |>
select(one = ...1,
two = ...2,
three = ...3,
all_of(name)) |>
filter(!!sym(all_of(name)) > 3) |>
pull(two)
}
)
#> $`CD8_C01-LEF1`
#> [1] "CCR7" "LEF1" "SELL"
#>
#> $`CD8_C02-GPR183`
#> [1] "IL7R" "S1PR1" "SORL1"
#>
#> $`CD8_C03-CX3CR1`
#> [1] "PLEK" "CX3CR1" "FGFBP2" "FCGR3A" "FCRL6" "KLRG1"
uj5u.com熱心網友回復:
一個解決方案是:
res <- lapply(setNames(nm = names(df)), function(dfname) {
dff <- df[[dfname]]
# only renaming column 2 as columns 1 and 3 are not used later on
colnames(dff)[2] <- "two"
# not 'keeping' the column with the same name as the dataframe, just using the dataframe straightaway
dff$two[dff[,dfname] > 3]
})
請注意該setNames(...)陳述句作為 的第一個引數lapply。如果您將命名串列發送到lapply,它將使用元素的名稱作為它回傳的元素的名稱。
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/427736.html
