我想將資料幀中的所有值作為條件傳遞給 dplyr::case_when() 和 stringr::str_detect(),同時使用相應的列標題 als 替換值。
我有這兩個資料框:
> print(city_stack)
# A tibble: 11 × 1
city
<chr>
1 Britz
2 Berlin-Reinickendorf
3 Berlin-Kladow
4 Berlin-Spindlersfeld
5 Berlin-Mahlsdorf
6 Berlin-Lichterfelde
7 Berlin-Spandau
8 Berlin-Biesdorf
9 Berlin-Niedersch?neweide
10 Rüdersdorf bei Berlin
11 Berlin-Nordend
> print(districts_stack)
# A tibble: 10 × 2
Berlin K?ln
<chr> <chr>
1 Adlershof Rodenkirchen
2 Altglienicke Chorweiler
3 Baumschulenweg Ehrenfeld
4 Biesdorf Kalk
5 Blankenburg Lindenthal
6 Blankenfelde Mülheim
7 Bohnsdorf Nippes
8 Britz Porz
9 Buch K?lner Zoo
10 Buckow Universit?t zu K?ln
我嘗試使用嵌套的 for 回圈:
for (i in colnames(districts_stack)){
for (j in districts_stack[[i]]){
mutate(city_stack, case_when(
str_detect(city, paste0(j) ~ i,
TRUE ~ city)
)
}
}
雖然這完全有效,但效率極低,并且會因我實際使用的龐大資料框而出現問題。我覺得應該有一個使用 purrr::map() 的更有效的解決方案,但我無法想出任何可行的方法。
資料幀的 dput():
dput(city_stack[1:11,])
structure(list(city = c("Britz", "Berlin-Reinickendorf", "Berlin-Kladow",
"Berlin-Spindlersfeld", "Berlin-Mahlsdorf", "Berlin-Lichterfelde",
"Berlin-Spandau", "Berlin-Biesdorf", "Berlin-Niedersch?neweide",
"Rüdersdorf bei Berlin", "Berlin-Nordend")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
> dput(districts_stack[1:10,1:2])
structure(list(Berlin = c("Adlershof", "Altglienicke", "Baumschulenweg",
"Biesdorf", "Blankenburg", "Blankenfelde", "Bohnsdorf", "Britz",
"Buch", "Buckow"), K?ln = c("Rodenkirchen", "Chorweiler", "Ehrenfeld",
"Kalk", "Lindenthal", "Mülheim", "Nippes", "Porz", "K?lner Zoo",
"Universit?t zu K?ln")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
uj5u.com熱心網友回復:
我不是 100% 確定您正在尋找的輸出。但是,我相信這是朝著正確方向邁出的一步。我建議融合district_stack資料并df使用模糊字串匹配將新資料與城市名稱連接起來,而不是遍歷地區值并檢查匹配項。
這就是我所理解的回圈中發生的事情。然后您就有了一個資料框,您可以在其中更輕松地city使用替換值if_else。
我從這個執行緒中汲取靈感:dplyr:inner_join with a partial string match
library(tidyverse)
library(fuzzyjoin) # to join the data based on fuzzy matches to get results in one dataframe for easier manipulation
city_stack <- structure(list(city = c("Britz", "Berlin-Reinickendorf", "Berlin-Kladow",
"Berlin-Spindlersfeld", "Berlin-Mahlsdorf", "Berlin-Lichterfelde",
"Berlin-Spandau", "Berlin-Biesdorf", "Berlin-Niedersch?neweide",
"Rüdersdorf bei Berlin", "Berlin-Nordend")), row.names = c(NA,
-11L), class = c("tbl_df", "tbl", "data.frame"))
districts_stack <- structure(list(Berlin = c("Adlershof", "Altglienicke", "Baumschulenweg",
"Biesdorf", "Blankenburg", "Blankenfelde", "Bohnsdorf", "Britz",
"Buch", "Buckow"), K?ln = c("Rodenkirchen", "Chorweiler", "Ehrenfeld",
"Kalk", "Lindenthal", "Mülheim", "Nippes", "Porz", "K?lner Zoo",
"Universit?t zu K?ln")), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame")) %>%
pivot_longer(., cols = everything(), names_to='city', values_to='district') %>%
arrange(city)
city_stack %>% # left join to get all potential string matches, then mutate
regex_left_join(districts_stack, by = c(city = "district")) %>%
mutate(city.x = if_else(!is.na(city.y), district, city.x))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/368653.html
