我有兩個資料集,一個是從其他縣到 A 縣的移民流入,另一個是從 A 縣到其他縣的移民流出。為了將兩個資料集組合為:
期望的輸出:
Key County State FIPS Inflow Outflow FiscalYear Year
510012012 Accomack County VA 51001 NA 27 2011 - 2012 2012
160012012 Ada County ID 16001 12 18 2011 - 2012 2012
80012012 Adams County CO 8001 22 39 2011 - 2012 2012
80012011 Adams County CO 8001 42 31 2010 - 2011 2011
450032012 Aiken County SC 45003 NA 21 2011 - 2012 2012
120012012 Alachua County FL 12001 433 NA 2011 - 2012 2012
我怎樣才能將這兩者組合成一個資料集,這樣我就不必對每個常見的縣和州名稱以及 FIPS 和年份進行硬編碼?缺失值將是 NA。
兩個資料集之間的共同值是key.
我的原始遷移流出資料有 517 個觀測值,遷移流入資料有 441 個,因此每個資料集中的縣數不同。
樣本資料:
# People moving out of county A to other counties
inflow_df = structure(list(Origin_FIPS = c(12001L, 8001L, 16001L, 12001L,
8001L, 16001L), Origin_StateName = c("FL", "CO", "ID", "FL",
"CO", "ID"), Origin_Place = c("Alachua County", "Adams County",
"Ada County", "Alachua County", "Adams County", "Ada County"),
InIndividuals = c(433L, 30L, 16L, 381L, 42L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2010 - 2011",
"2010 - 2011"), Year = c(2012L, 2012L, 2012L, 2011L, 2011L,
2011L), Key = c(120012012L, 80012012L, 160012012L, 120012011L,
80012011L, 160012011L)), class = "data.frame", row.names = c(NA,
-6L))
# People moving in county A from other counties
outflow_df = structure(list(Dest_FIPS = c(51001L, 16001L, 8001L, 8001L, 45003L
), Dest_StateName = c("VA", "ID", "CO", "CO", "SC"), Dest_Place = c("Accomack County",
"Ada County", "Adams County", "Adams County", "Aiken County"),
OutIndividuals = c(27L, 16L, 39L, 31L, 21L), FiscalYear = c("2011 - 2012",
"2011 - 2012", "2011 - 2012", "2010 - 2011", "2011 - 2012"
), Year = c(2012L, 2012L, 2012L, 2011L, 2012L), Key = c(510012012L,
160012012L, 80012012L, 80012011L, 450032012L)), class = "data.frame", row.names = c(NA,
-5L))
uj5u.com熱心網友回復:
Origin_Place我們可以通過給它們提供一致的名稱(大概其中一個應該與另一個匹配Dest_Place)然后執行連接來整理這兩個表。full_join輸出在任一表中找到的所有鍵,在本例中為c("Key", "County", "State", "FIPS", "FiscalYear", "Year").
我本來預計這inflow_df將反映看到流入的縣(即目的地)并outflow_df反映有流出的縣(即起源),因此似乎可能在問題中交換了表名。
inflow2 <-
inflow_df %>%
transmute(Key,
County = Origin_Place,
State = Origin_StateName,
FIPS = Origin_FIPS,
Inflow = InIndividuals,
FiscalYear,
Year)
outflow2 <-
outflow_df %>%
transmute(Key,
County = Dest_Place,
State = Dest_StateName,
FIPS = Dest_FIPS,
Outflow = OutIndividuals,
FiscalYear,
Year)
inflow2 %>%
full_join(outflow2)
結果(順便說一句,所需的輸出似乎與給定的資料不一致,但我希望這是您正在尋找的)
Joining, by = c("Key", "County", "State", "FIPS", "FiscalYear", "Year")
Key County State FIPS Inflow FiscalYear Year Outflow
1 120012012 Alachua County FL 12001 433 2011 - 2012 2012 NA
2 80012012 Adams County CO 8001 30 2011 - 2012 2012 39
3 160012012 Ada County ID 16001 16 2011 - 2012 2012 16
4 120012011 Alachua County FL 12001 381 2010 - 2011 2011 NA
5 80012011 Adams County CO 8001 42 2010 - 2011 2011 31
6 160012011 Ada County ID 16001 21 2010 - 2011 2011 NA
7 510012012 Accomack County VA 51001 NA 2011 - 2012 2012 27
8 450032012 Aiken County SC 45003 NA 2011 - 2012 2012 21
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/525513.html
標籤:rdplyr
下一篇:如何將觀察結果轉換為變數/列?
