我有一個像這樣的地址的單列資料框:
ADDRESS
123 Main Street Unit A
456 Main Street Apt 3
789 Main Street Floor 2
我想決議地址以將 Unit/Apt/Floor 資訊與街道地址的其余部分分開。有沒有一種簡單的方法來實作這一點,一開始就知道分隔符應該是“Unit”、“Apt”和“Floor”?
所需的最終結果將是一個兩列資料框,如下所示:
ADDRESS UNIT
123 Main Street Unit A
456 Main Street Apt 3
789 Main Street Floor 2
我曾嘗試separate從tidyr包中使用,但它只接受(據我所知)一個分隔符引數。因此,可以通過多次呼叫來完成此任務,separate但這似乎很愚蠢。
df <- df %>% tidyr::separate(ADDRESS, into = c("ADDRESS","UNIT"), sep = ' Apt')
# This would need to repeated using ' Unit' and ' Floor'.
同樣,似乎stringr::str_split_fixed()應該能夠處理這個任務,但我再次無法弄清楚如何通過單個呼叫完成該程序(即,一次指定三個分隔符)。
stringr::str_split_fixed(df$Address, c(' Unit', ' Apt', ' Floor'), 2)
# Does not work! Additionally does not result in additional column in dataframe as desired.
以下是創建示例資料框的代碼:
library(dplyr) # for piping
library(tidyr)
library(stringr)
df <- data.frame(ADDRESS = c("123 Main Street Unit A", "456 Main Street Apt 3", "789 Main Street Floor 2"))
uj5u.com熱心網友回復:
這是否有效:
使用基礎 R:
gsub('(\\d \\sMain Street\\s)(.*)','\\2',df$ADDRESS)
[1] "Unit A" "Apt 3" "Floor 2"
使用 dplyr 和 stringr:
library(dplyr)
library(stringr)
df %>% mutate(UNIT = str_extract(ADDRESS, '(?<=Main Street ).*'))
ADDRESS UNIT
1 123 Main Street Unit A Unit A
2 456 Main Street Apt 3 Apt 3
3 789 Main Street Floor 2 Floor 2
uj5u.com熱心網友回復:
使用tidyr::separate你可以:
library(tidyr)
df <- data.frame(ADDRESS = c("123 Main Street Unit A", "456 Main Street Apt 3", "789 Main Street Floor 2"))
df %>%
separate(ADDRESS, sep = "\\s(?=Unit|Apt|Floor)", into = c("address", "unit"))
#> address unit
#> 1 123 Main Street Unit A
#> 2 456 Main Street Apt 3
#> 3 789 Main Street Floor 2
uj5u.com熱心網友回復:
這也可能對基礎 R 有所幫助:
df$UNIT <- trimws(regmatches(df$ADDRESS, regexpr("\\d \\s Main\\s Street\\K(.*)", df$ADDRESS, perl = TRUE)))
ADDRESS UNIT
1 123 Main Street Unit A Unit A
2 456 Main Street Apt 3 Apt 3
3 789 Main Street Floor 2 Floor 2
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/312497.html
上一篇:如何計算r中兩列之間的差距
下一篇:將字串提取到每行中的不同單詞-R
