我有一個編碼位置的字符列。位置可以是“城市、州、國家”或“州、國家”或“國家”的形式。我想將它們分成三列,但要確保如果少于三個元素,每個值都會進入正確的州或國家/地區列。
這是我的資料和我嘗試過的:
tib <- tribble(~obs, ~location,
1, "Miami, Florida, United States",
2, "Astrakhan Oblast, Russia",
3, "Mozambique")
separate(tib, location, c("city", "state", "country"), ", ")
結果:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 Astrakhan Oblast Russia NA
3 3 Mozambique NA NA
從某種意義上說,我想以separate相反的順序運行該函式,以便結果如下所示:
# A tibble: 3 × 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
更新:
這是一個可行的選擇,但我希望有更簡單的方法:
tib %>% mutate(country = str_extract(location, "[A-Za-z ] $"),
state = str_extract(location, "(?<=\\,)[A-Za-z ] (?=\\,)"),
city = str_extract(location, "^[A-Za-z ] (?=\\,)"))
uj5u.com熱心網友回復:
對于您的特定示例,您可以使用fill引數 inseparate()更改為從左側而不是右側填充缺失值。
tidyr::separate(tib, location, c("city", "state", "country"), ", ", fill = "left")
# A tibble: 3 x 4
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
uj5u.com熱心網友回復:
這是使用的另一種方法separate_rows:這是在親愛的@akrun 的幫助下創建的,使用完整的 NA 填充組以使其長度與最大組相同。第一次嘗試是使用complete:
library(dplyr)
library(tidyr)
tib %>%
separate_rows("location", sep = ", ") %>%
group_by(obs) %>%
mutate(new = rev(c("country", "state", "city")[row_number()])) %>%
ungroup %>%
pivot_wider(names_from = new, values_from = location)
輸出:
obs city state country
<dbl> <chr> <chr> <chr>
1 1 Miami Florida United States
2 2 NA Astrakhan Oblast Russia
3 3 NA NA Mozambique
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325678.html
上一篇:如何在R中附加日期/持續時間法術
下一篇:如何根據時間塊創建唯一索引
