我正在嘗試計算它們各自樣本站點內多個物種的種群引數。我的 df 樣本結構如下:
資料框
df<- structure(list(waterbody = c("Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer", "Homer", "Homer",
"Homer", "Homer", "Homer", "Homer", "Homer"), sample_site = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L), species = c("LMB", "LMB", "BLG", "LMB", "BLG", "BLG",
"BLG", "BLG", "BLG", "LMB", "LMB", "LMB", "LMB", "LMB", "BLG",
"BLG", "LMB", "LMB", "BLG", "BLG", "LMB", "LMB", "LMB", "BLG",
"BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "BLG", "LMB",
"LMB", "LMB", "BLG", "LMB", "LMB", "LMB", "BLG", "LMB", "LMB",
"LMB", "BLG", "LMB", "BLG", "LMB", "LMB", "BLG", "LMB", "BLG"
), length_mm = c(430L, 430L, 165L, 345L, 128L, 117L, 93L, 135L,
161L, 402L, 347L, 450L, 477L, 255L, 115L, 91L, 445L, 335L, 119L,
124L, 249L, 135L, 361L, 160L, 115L, 130L, 155L, 116L, 158L, 130L,
126L, 158L, 500L, 330L, 150L, 90L, 333L, 404L, 343L, 150L, 285L,
303L, 340L, 120L, 420L, 115L, 295L, 322L, 85L, 145L, 185L), stock = c(1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 0, 1), quality = c(1, 1, 1, 1, 0, 0, 0, 0,
1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0, 1, 0, 1,
0, 0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0,
1)), row.names = c(NA, -51L), class = "data.frame")
這在兩個不同的樣本站點中被過濾到只有 2 個物種,我的完整資料框有數百個樣本站點和 20 多個物種。我想撰寫一個函式,將質量個體的總數(用列中的“1”表示)相加,然后除以股票個體的總數(再次,用列中的“1”表示)。手動,這看起來像:
a<- filter(df, waterbody=="Homer", sample_site==1, species=="LMB", quality==1)
b<- filter(df, waterbody=="Homer", sample_site==1, species=="LMB", stock==1)
(count(a))/(count(b))*100
結果為 83.333 ((10 quality/12 stock)*100)。但是,我想為每個樣本站點中的每個物種執行此操作。因此,對于樣本站點 1 和 2,LMB 和 BLG 的值范圍為 0-100。
我希望最終結果是一個結構如下的資料框:
results<- structure(list(waterbody = c("Homer", "Homer", "Homer", "Homer",
"Homer", "Homer"), transect = c(1L, 1L, 1L, 2L, 2L, 2L), species = c("BLC",
"BLG", "LMB", "BLC", "BLG", "GSF"), psd = c(50, 31.58, 83.33,
100, 33.33, 0)), row.names = c(NA, 6L), class = "data.frame")
進入函式的數學顯然非常簡單,我遇到的問題是如何將其應用于過濾資料,以便我不計算,例如,跨多個樣本站點的質量個體的數量。
任何幫助/見解將不勝感激
uj5u.com熱心網友回復:
這是一個dplyr解決方案:
library(dplyr)
df %>%
group_by(waterbody, sample_site, species) %>%
summarise(psd = (sum(quality==1)/sum(stock == 1))*100)
waterbody sample_site species psd
<chr> <int> <chr> <dbl>
1 Homer 1 BLG 31.6
2 Homer 1 LMB 83.3
3 Homer 2 BLG 33.3
4 Homer 2 LMB 81.8
uj5u.com熱心網友回復:
你能確認一下嗎
transect(在預期輸出中)與sample_site(在傳入資料集中)相同- 預期的資料集(具有“BLC”物種的值)不是從傳入的資料集(沒有)產生的。
如果是這樣,dplyr 的group_by()andsummarize()就是你所需要的。
df |>
dplyr::group_by(waterbody, sample_site, species) |>
dplyr::summarize(
psd = sum(quality) / sum(stock)
) |>
dplyr::ungroup()
生產
# A tibble: 4 x 4
waterbody sample_site species psd
<chr> <int> <chr> <dbl>
1 Homer 1 BLG 0.316
2 Homer 1 LMB 0.833
3 Homer 2 BLG 0.333
4 Homer 2 LMB 0.818
在你運行它之前,我建議驗證 和 的所有值都是非缺失值stock和quality0/1。 checkmate::assert_integerish()非常適合這個。
checkmate::assert_integerish(df$stock , any.missing = FALSE, lower = 0, upper = 1)
checkmate::assert_integerish(df$quality, any.missing = FALSE, lower = 0, upper = 1)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/418849.html
標籤:
下一篇:R從函式中替換串列元素
