嗨,我有一張桌子,例如:
House,Name1,Email1@xyz.com
Flat,Name2;Name3,Email2@xyz.com;Email3@xyz.com
Mobile Home,Name4,Email4@xyz.com
Camper-Van,Name5;Name6;Name7;Name8,Email5@xyz.com;Email6@xyz.com;Email7@xyz.com;Email8@xyz.com
我確實需要:
House,Name1,Email1@xyz.com
Flat,Name2,Email2@xyz.com
Flat,Name3,Email3@xyz.com
Mobile Home,Name4,Email4@xyz.com
Camper-Van,Name5,Emil5@xyz.com
Camper-Van,Name6,Email6@xyz.com
Camper-Van,Name7,Email7@xyz.com
Camper-Van,Name8,Email8@xyz.com
問題是,一種住房的姓名和電子郵件數量是未知的。我確實生成了三個串列:
Housing:
House
Flat
Campervan
Names:
Name1
Name2
Name3
Name4
Name5
Name6
Name7
Name8
Email:
Email1@xyz.com
Email2@xyz.com
...
Email8@xyz.com
但是我被困在如何重復 House、Flat 和 Campervan,因為第 1 列中的每個類別都有姓名或電子郵件(兩者總是提取相同的數量)。這將使所有串列在長度上相互匹配。如果我能夠做到這一點,我就可以生成我確實需要的資訊。任何幫助表示贊賞。
注意:姓名和電子郵件地址不相同,例如 Name1 是 hans 他的電子郵件可能是 [email protected] 通過對姓名和電子郵件進行編號我確實嘗試表明電子郵件和姓名是經過排序的并且不能隨機登記
uj5u.com熱心網友回復:
library(tidyverse)
example_text <-"House,Name1,Email@1
Flat,Name2;Name3,Email@2;Email@3
Mobile Home,Name4,Email@4
Camper-Van,Name5;Name6;Name7;Name8,Email@5;Email@6;Email@7;Email@8
"
example_text %>%
read_lines() %>%
map(~ {
# the first words until a delimiter
house <- .x %>% str_extract("^[^;,] ")
elements <- .x %>% str_remove(house) %>% str_split("[,;]") %>% simplify() %>% discard(~ .x == "")
# Everything with an @ symbol betwwen two demiliters (, or ;)
Emails <- elements %>% keep(~ .x %>% str_detect("@"))
# Everything which is not one of the above
Names <- elements %>% setdiff(Emails)
tibble(
House = house,
Emails = Emails,
Names = Names
)
}) %>%
reduce(bind_rows)
#> # A tibble: 8 x 3
#> House Emails Names
#> <chr> <chr> <chr>
#> 1 House Email@1 Name1
#> 2 Flat Email@2 Name2
#> 3 Flat Email@3 Name3
#> 4 Mobile Home Email@4 Name4
#> 5 Camper-Van Email@5 Name5
#> 6 Camper-Van Email@6 Name6
#> 7 Camper-Van Email@7 Name7
#> 8 Camper-Van Email@8 Name8
由reprex 包(v2.0.1)于 2021 年 11 月 24 日創建
uj5u.com熱心網友回復:
使用 data.table 中的資料(使用轉換setDT()),使用 data.table 連接和 data.tabletstrsplit()函式:
library(data.table)
# Data for the demo (please provide this yourself in future questions)
dt1 <-
data.table(type = c("House", "Flat", "Mobile", "Camper-van"),
name = c("Name1", "Name2;Name3", "Name4", "Name5;Name6;Name7;Name8"),
mail = c("Email1", "Email2;Email3", "Email4", "Email5;Email6;Email7;Email8"))
# solution
dt1[, c("type" = list(type), tstrsplit(name, ";"))][, melt(.SD, id.vars="type")][!is.na(value), .(.I, type, "name" = value)][
dt1[, c("type" = list(type), tstrsplit(mail, ";"))][, melt(.SD, id.vars="type")][!is.na(value), .(.I, "mail" = value)], on="I"][, -c("I")]
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/366843.html
