我的問題更多是關于如何改進我懷疑效率低下的代碼。
我有兩個資料框:一個包含縣級災害資訊資料,另一個包含縣級人均收入資料。作為第一步,我有興趣確定我們缺少哪些縣的人均收入資料。以下是示例資料框的樣子:
counties <- data.frame(polyname = c("alabama,autauga","alabama,autauga",
"alabama,baldwin","alabama,baldwin",
"alabama,barbour","alabama,barbour",
"alabama,bibb", "alabama,bibb"),
indAnyDisaster_frequency = c("1-2", "1-2", "0", "0",
"3-5", "3-5", "1-2","1-2"))
counties_persinc_1980 <- data.frame(polyname = c("alabama,autauga","alabama,autauga",
"alabama,baldwin","alabama,baldwin",
"alabama,barbour","alabama,barbour",
"alabama,bibb", "alabama,bibb"),
persinc_1980 = c(NA, NA, NA, NA, 25, 30, 32, 28))
no_persinc_1980 <- unique(counties_persinc_1980$polyname[is.na(counties_persinc_1980$persinc_1980)])
現在,如果同一索引的counties$polyname 與向量的元素匹配,我想使用缺少縣名的向量將counties$indAnyDisaster_frequency 替換為NA。我相信我已經能夠通過 for 回圈實作這一點,但我不認為它非常有效。但是,我一直無法弄清楚如何使用 lapply 來實作相同的結果。我已經包含了回圈的代碼和我使用 lapply 的嘗試之一。
for(i in 1:length(no_persinc_1980)){
counties$indAnyDisaster_frequency[counties$polyname==no_persinc_1980[i]] <- NA
}
lapply(1:length(no_persinc_1980), function(x) counties$indAnyDisaster_frequency[counties$polyname==no_persinc_1980[x]] <- NA)
任何有關如何改進此方法的指導將不勝感激。
uj5u.com熱心網友回復:
不需要回圈。我會調查%in%
counties$indAnyDisaster_frequency[counties$polyname %in% no_persinc_1980] <- NA
counties
#> polyname indAnyDisaster_frequency
#> 1 alabama,autauga <NA>
#> 2 alabama,autauga <NA>
#> 3 alabama,baldwin <NA>
#> 4 alabama,baldwin <NA>
#> 5 alabama,barbour 3-5
#> 6 alabama,barbour 3-5
#> 7 alabama,bibb 1-2
#> 8 alabama,bibb 1-2
uj5u.com熱心網友回復:
idx <- which(counties$polyname %in% no_persinc_1980)
counties[ idx, 'indAnyDisaster_frequency' ] <- NA
uj5u.com熱心網友回復:
library(tidyverse)
NA在列中提取縣名persinc_1980
counties_nas <- counties_persinc_1980 %>%
filter(is.na(persinc_1980)) %>%
unique() %>%
pull(polyname)
如果向量中存在 polyname,則更indAnyDisaster_frequency改為NA
counties %>%
mutate(indAnyDisaster_frequency = case_when(polyname %in% counties_nas ~ NA_character_,
TRUE ~ indAnyDisaster_frequency))
polyname indAnyDisaster_frequency
<chr> <chr>
1 alabama,autauga NA
2 alabama,autauga NA
3 alabama,baldwin NA
4 alabama,baldwin NA
5 alabama,barbour 3-5
6 alabama,barbour 3-5
7 alabama,bibb 1-2
8 alabama,bibb 1-2
uj5u.com熱心網友回復:
我建議將兩個資料框連接在一起。這幾乎總是處理事情的最佳方式。
library(tidyverse)
counties <- data.frame(polyname = c("alabama,autauga","alabama,autauga",
"alabama,baldwin","alabama,baldwin",
"alabama,barbour","alabama,barbour",
"alabama,bibb", "alabama,bibb"),
indAnyDisaster_frequency = c("1-2", "1-2", "0", "0",
"3-5", "3-5", "1-2","1-2"))
counties_persinc_1980 <- data.frame(polyname = c("alabama,autauga","alabama,autauga",
"alabama,baldwin","alabama,baldwin",
"alabama,barbour","alabama,barbour",
"alabama,bibb", "alabama,bibb"),
persinc_1980 = c(NA, NA, NA, NA, 25, 30, 32, 28))
# join
disasters <- left_join(counties, counties_persinc_1980, by = "polyname")
print(disasters)
#> polyname indAnyDisaster_frequency persinc_1980
#> 1 alabama,autauga 1-2 NA
#> 2 alabama,autauga 1-2 NA
#> 3 alabama,autauga 1-2 NA
#> 4 alabama,autauga 1-2 NA
#> 5 alabama,baldwin 0 NA
#> 6 alabama,baldwin 0 NA
#> 7 alabama,baldwin 0 NA
#> 8 alabama,baldwin 0 NA
#> 9 alabama,barbour 3-5 25
#> 10 alabama,barbour 3-5 30
#> 11 alabama,barbour 3-5 25
#> 12 alabama,barbour 3-5 30
#> 13 alabama,bibb 1-2 32
#> 14 alabama,bibb 1-2 28
#> 15 alabama,bibb 1-2 32
#> 16 alabama,bibb 1-2 28
# which missing
disasters %>%
filter(is.na(persinc_1980)) %>%
pull(polyname) %>%
unique()
#> [1] "alabama,autauga" "alabama,baldwin"
使用reprex v2.0.2創建于 2022-10-26
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/520909.html
標籤:r
