我有一列中帶有字串的資料框。如何僅提取數字前的大寫子字串并將它們添加到另一列?以 DE 為例,但還有更多國家/地區縮寫,它們總是出現在數字之前。
TD<-data.frame(a=c("WHATEVERDE 11111","","Whatever DE 11111","DE 11111",""),
b=c("","What DE EverDE 1111","","",""),
c=c("Whatever","","","","WhateverDE 11111"))
我想創建另一列,如下所示:
> TD
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE
我嘗試應用解決方案:
sub("^([[:alpha:]]*).*", "\\1", "DE 11111") but is not universal.
帶縮寫的向量:
names<-c('AT','BE','DE','BG','CZ','DK','FR','GR','ES','NL','HU','GB','IT')
uj5u.com熱心網友回復:
我們回圈across列,提取在零個或多個空格和一個或多個數字之前的 2 個字母大寫國家代碼子字串,coalesce輸出使其回傳每行的第一個非 NA 提取元素
library(dplyr)
library(stringr)
library(purrr)
library(countrycode)
pat <- countrycode::codelist %>%
pull(iso2c) %>%
na.omit %>%
str_c(collapse = "|") %>%
sprintf(fmt = "(%s)(?=\\s*\\d )")
TD %>%
mutate(result = invoke(coalesce,
across(everything(), ~ str_extract(., pat))))
-輸出
a b c result
1 WHATEVERDE 11111 Whatever DE
2 What DE EverDE 1111 DE
3 Whatever DE 11111 DE
4 DE 11111 DE
5 WhateverDE 11111 DE
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/371339.html
