結合遷入遷出資料-有解無憂

我有兩個資料集，一個是從其他縣到 A 縣的移民流入，另一個是從 A 縣到其他縣的移民流出。為了將兩個資料集組合為：

期望的輸出：

Key         County          State   FIPS    Inflow  Outflow FiscalYear  Year
510012012   Accomack County VA      51001   NA      27      2011 - 2012 2012
160012012   Ada County      ID      16001   12      18      2011 - 2012 2012
80012012    Adams County    CO      8001    22      39      2011 - 2012 2012
80012011    Adams County    CO      8001    42      31      2010 - 2011 2011
450032012   Aiken County    SC      45003   NA      21      2011 - 2012 2012
120012012   Alachua County  FL      12001   433     NA      2011 - 2012 2012

我怎樣才能將這兩者組合成一個資料集，這樣我就不必對每個常見的縣和州名稱以及 FIPS 和年份進行硬編碼？缺失值將是 NA。

兩個資料集之間的共同值是key.

我的原始遷移流出資料有 517 個觀測值，遷移流入資料有 441 個，因此每個資料集中的縣數不同。

樣本資料：

# People moving out of county A to other counties
        inflow_df =  structure(list(Origin_FIPS = c(12001L, 8001L, 16001L, 12001L, 
8001L, 16001L), Origin_StateName = c("FL", "CO", "ID", "FL", 
"CO", "ID"), Origin_Place = c("Alachua County", "Adams County", 
"Ada County", "Alachua County", "Adams County", "Ada County"), 
    InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), FiscalYear = c("2011 - 2012", 
    "2011 - 2012", "2011 - 2012", "2010 - 2011", "2010 - 2011", 
    "2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L, 
    2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L, 
    80012011L, 160012011L)), class = "data.frame", row.names = c(NA, 
-6L))
        
# People moving in county A from other counties
  outflow_df =  structure(list(Dest_FIPS = c(51001L, 16001L, 8001L, 8001L, 45003L
    ), Dest_StateName = c("VA", "ID", "CO", "CO", "SC"), Dest_Place = c("Accomack County", 
    "Ada County", "Adams County", "Adams County", "Aiken County"), 
        OutIndividuals = c(27L, 16L, 39L, 31L, 21L), FiscalYear = c("2011 - 2012", 
        "2011 - 2012", "2011 - 2012", "2010 - 2011", "2011 - 2012"
        ), Year = c(2012L, 2012L, 2012L, 2011L, 2012L), Key = c(510012012L, 
        160012012L, 80012012L, 80012011L, 450032012L)), class = "data.frame", row.names = c(NA, 
    -5L))

uj5u.com熱心網友回復：

Origin_Place我們可以通過給它們提供一致的名稱（大概其中一個應該與另一個匹配Dest_Place）然后執行連接來整理這兩個表。full_join輸出在任一表中找到的所有鍵，在本例中為c("Key", "County", "State", "FIPS", "FiscalYear", "Year").

我本來預計這inflow_df將反映看到流入的縣（即目的地）并outflow_df反映有流出的縣（即起源），因此似乎可能在問題中交換了表名。

inflow2 <- 
  inflow_df %>%
  transmute(Key,
            County = Origin_Place,
            State  = Origin_StateName,
            FIPS   = Origin_FIPS,
            Inflow = InIndividuals,
            FiscalYear,
            Year)

outflow2 <- 
  outflow_df %>%
  transmute(Key,
            County  = Dest_Place,
            State   = Dest_StateName,
            FIPS    = Dest_FIPS,
            Outflow = OutIndividuals,
            FiscalYear,
            Year)

inflow2 %>%
  full_join(outflow2)

結果（順便說一句，所需的輸出似乎與給定的資料不一致，但我希望這是您正在尋找的）

Joining, by = c("Key", "County", "State", "FIPS", "FiscalYear", "Year")
        Key          County State  FIPS Inflow  FiscalYear Year Outflow
1 120012012  Alachua County    FL 12001    433 2011 - 2012 2012      NA
2  80012012    Adams County    CO  8001     30 2011 - 2012 2012      39
3 160012012      Ada County    ID 16001     16 2011 - 2012 2012      16
4 120012011  Alachua County    FL 12001    381 2010 - 2011 2011      NA
5  80012011    Adams County    CO  8001     42 2010 - 2011 2011      31
6 160012011      Ada County    ID 16001     21 2010 - 2011 2011      NA
7 510012012 Accomack County    VA 51001     NA 2011 - 2012 2012      27
8 450032012    Aiken County    SC 45003     NA 2011 - 2012 2012      21

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/525513.html

標籤：rdplyr

上一篇：tidyr基于groupby計算變異新列

下一篇：如何將觀察結果轉換為變數/列？