我有兩個資料集,一個是從其他縣到 A 縣的移民流入,另一個是從 A 縣到其他縣的移民流出。為了將兩個資料集組合為:
County State Key Inflow Outflow Year
兩個資料集之間的公共列是Origin_Place,Origin_StateName和Yearinmigration inflow和Dest_place,Dest_StateName和Yearin migration outflow。
部分問題是公共列的行數不相等。另一個問題是不同的縣可以屬于同一個州,如下面的虛擬資料所示。所以,我在想的是通過concatenating FIPS(每個縣唯一)和Year. 這樣,我可以將縣與其各自的州和其余相關列的值組合在一行中。
我怎樣才能將這兩者組合成一個資料集,這樣我就不必對每個常見的縣和州名稱以及 FIPS 和年份進行硬編碼?缺失值將是NA.
我的原始遷移流出資料有 517 個觀測值,遷移流入資料有 441 個,因此每個資料集中的縣數不同。
期望的輸出:
County State Key Inflow Outflow Year
A FL 12019 111 223 2019
A FL 12019 8888 224 2019
A FL 12019 NA 2333 2019
A FL 12019 NA 4444 2019
A FL 12019 NA 5555 2019
A FL 12019 NA 6666 2019
A FL 12019 NA 7777 2019
A FL 12020 9999 NA 2020
B BB 22019 223 NA 2019
C CC 32019 224 NA 2019
D FL 42019 2333 111 2019
E FL 52019 4444 8888 2019
F FL 62019 5555 9999 2020
G GG 72019 6666 NA 2019
H HH 82019 7777 NA 2019
虛擬資料:
# People moving out of county A to other counties
Origin_Place = c("A", "A", "A", "A", "A", "A", "A")
FIPS_Origin_County = c(1, 1, 1, 1, 1, 1, 1)
Origin_StateName = c("FL", "FL", "FL", "FL", "FL", "FL", "FL")
Individuals = c(223, 224, 2333, 4444, 5555, 6666, 7777)
Dest_place = c("B", "C", "D", "E", "F", "G", "H")
FIPS_Dest_County = c(2, 3, 4, 5, 6, 7, 8)
Dest_StateName = c("BB", "CC", "FL", "FL", "FL", "GG", "HH")
Year = c(2019, 2019, 2019, 2019, 2020, 2020, 2020)
Outflow_df = data.frame(Origin_County_Name, FIPS_Origin_County, Origin_StateName, Individuals, Dest_place, FIPS_Dest_County, Dest_StateName, Year)
# People moving in county A from other counties
Origin_Place = c("D", "E", "F")
FIPS_Origin_County = c(5, 6, 7)
Origin_StateName = c("FL", "FL", "FL")
Individuals = c(111, 8888, 9999)
Dest_place = c("A", "A", "A")
FIPS_Dest_County = c(1, 1, 1)
Dest_StateName = c("FL", "FL", "FL")
Year = c(2019, 2019, 2020)
Inflow_df = data.frame(Origin_Place, FIPS_Origin_County, Origin_StateName, Individuals, Dest_County_Name, FIPS_Dest_County, Dest_StateName, Year)
uj5u.com熱心網友回復:
也許這有幫助
library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
bind_rows(lst(Inflow_df, Outflow_df), .id = 'datname') %>%
pivot_longer(cols = contains("_"), names_to = ".value",
names_pattern = ".*_([^_] $)") %>%
mutate(Key = str_c(County, Year), rn = rowid(Key, datname)) %>%
pivot_wider(names_from = datname, values_from = Individuals) %>%
arrange(rn) %>%
select(-rn)
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/522757.html
標籤:rdplyr
