通過不同的公共列組合遷移進出資料-有解無憂

我有兩個資料集，一個是從其他縣到 A 縣的移民流入，另一個是從 A 縣到其他縣的移民流出。為了將兩個資料集組合為：

County State Key Inflow Outflow Year

兩個資料集之間的公共列是Origin_Place,Origin_StateName和Yearinmigration inflow和Dest_place,Dest_StateName和Yearin migration outflow。

部分問題是公共列的行數不相等。另一個問題是不同的縣可以屬于同一個州，如下面的虛擬資料所示。所以，我在想的是通過concatenating FIPS（每個縣唯一）和Year. 這樣，我可以將縣與其各自的州和其余相關列的值組合在一行中。

我怎樣才能將這兩者組合成一個資料集，這樣我就不必對每個常見的縣和州名稱以及 FIPS 和年份進行硬編碼？缺失值將是NA.

我的原始遷移流出資料有 517 個觀測值，遷移流入資料有 441 個，因此每個資料集中的縣數不同。

期望的輸出：

County  State   Key     Inflow      Outflow Year
A       FL      12019   111         223     2019
A       FL      12019   8888        224     2019
A       FL      12019   NA          2333    2019
A       FL      12019   NA          4444    2019
A       FL      12019   NA          5555    2019
A       FL      12019   NA          6666    2019
A       FL      12019   NA          7777    2019
A       FL      12020   9999        NA      2020
B       BB      22019   223         NA      2019
C       CC      32019   224         NA      2019
D       FL      42019   2333        111     2019
E       FL      52019   4444        8888    2019
F       FL      62019   5555        9999    2020
G       GG      72019   6666        NA      2019
H       HH      82019   7777        NA      2019

虛擬資料：

# People moving out of county A to other counties
Origin_Place = c("A", "A", "A", "A", "A", "A", "A")

FIPS_Origin_County = c(1, 1, 1, 1, 1, 1, 1)

Origin_StateName = c("FL", "FL", "FL", "FL", "FL", "FL", "FL")

Individuals = c(223, 224, 2333, 4444, 5555, 6666, 7777)

Dest_place = c("B", "C", "D", "E", "F", "G", "H")

FIPS_Dest_County = c(2, 3, 4, 5, 6, 7, 8)

Dest_StateName = c("BB", "CC", "FL", "FL", "FL", "GG", "HH")

Year = c(2019, 2019, 2019, 2019, 2020, 2020, 2020)

Outflow_df = data.frame(Origin_County_Name, FIPS_Origin_County,  Origin_StateName,  Individuals, Dest_place, FIPS_Dest_County, Dest_StateName, Year)

# People moving in county A from other counties
Origin_Place = c("D", "E", "F")

FIPS_Origin_County = c(5, 6, 7)    

Origin_StateName = c("FL", "FL", "FL")    

Individuals = c(111, 8888, 9999)

Dest_place = c("A", "A", "A")

FIPS_Dest_County = c(1, 1, 1)

Dest_StateName = c("FL", "FL", "FL")

Year = c(2019, 2019, 2020)

Inflow_df = data.frame(Origin_Place, FIPS_Origin_County,  Origin_StateName,  Individuals, Dest_County_Name, FIPS_Dest_County,  Dest_StateName, Year)

uj5u.com熱心網友回復：

也許這有幫助

library(dplyr)
library(tidyr)
library(stringr)
library(data.table)
bind_rows(lst(Inflow_df, Outflow_df), .id = 'datname') %>% 
  pivot_longer(cols = contains("_"), names_to = ".value",
   names_pattern = ".*_([^_] $)") %>% 
  mutate(Key = str_c(County, Year), rn = rowid(Key, datname))  %>% 
  pivot_wider(names_from = datname, values_from = Individuals) %>% 
  arrange(rn) %>% 
  select(-rn)

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/522757.html

標籤：rdplyr

上一篇：在R中使用dplyr滿足多個條件時識別第一個唯一值

下一篇：如何在R基礎圖中將軸值的格式更改為百分比