我有兩個非常大的資料框。第一個資料框有一個縣名串列及其相關的 fip 代碼。第二個資料集只有它們的 fip 代碼。
我想在第二個資料框中添加兩列及其關聯的縣名。
假設這是 df1
df1 = data.frame(countyname = c("Archuleta County, CO","Baca County, CO","Cheyenne County, CO","Kiowa County, CO","Cimarron County, OK","Rio Arriba County, NM","Conejos County, CO"),
fipscounty = c(8007,8009,8017,8061,35039,40025,8021))
countyname fipscounty
1 Archuleta County, CO 8007
2 Baca County, CO 8009
3 Cheyenne County, CO 8017
4 Kiowa County, CO 8061
5 Cimarron County, OK 35039
6 Rio Arriba County, NM 40025
7 Conejos County, CO 8021
編輯:這是 df2
df2 = data.frame(county1=c(8007,8007,8009,8017),
distance=c(4,3,2,1),
county2=c(35039,8021,40025,8061))
county1 distance county2
1 8007 4 35039
2 8007 3 8021
3 8009 2 40025
4 8017 1 8061
編輯:我希望最終結果如下所示:
countyname fipscounty distance countyneighbor fipscounty2
1 Archuleta County, CO 8007 4 Cimarron County, OK 35039
2 Archuleta County, CO 8007 3 Conejos County, CO 8021
3 Baca County, CO 8009 2 Rio Arriba County, NM 40025
4 Cheyenne County, CO 8017 1 Kiowa County, CO 8061
我想使用 df1 和 df2 的 fips 代碼將縣名從 df1 轉移到 df2。由于它們沒有相同的列名,我可能不得不使用索引號來執行此操作。但是,我不想轉移整行,否則我會有重復的 fips 列。
我試過這個,但它當然出錯了
df2 <- left_join(df1,df2, by= df1[2])
我該怎么做?
uj5u.com熱心網友回復:
使用match.
m <- match(df2$county1, df1$fipscounty)
res <- cbind(df1[m, ], df1[match(df2$county2, df1$fipscounty), ])
names(res)[c(2, 4)] <- names(df2)[c(1, 3)]
res
# countyname county1 countyname county2
# 1 Archuleta County, CO 8007 Cimarron County, OK 35039
# 1.1 Archuleta County, CO 8007 Conejos County, CO 8021
# 2 Baca County, CO 8009 Rio Arriba County, NM 40025
# 3 Cheyenne County, CO 8017 Kiowa County, CO 8061
編輯
根據您的編輯,您可以包含merge和append作為工具。
m1 <- merge(df1a, df2a, by.x='fipscounty', by.y='county1')[c(2, 1, 3:4)]
append(m1,
list(countyneighbor=df1a[match(m1$county2, df1a$fipscounty),
'countyname']), 3) |>
as.data.frame()
# countyname fipscounty distance countyneighbor county2
# 1 Archuleta County, CO 8007 4 Cimarron County, OK 35039
# 2 Archuleta County, CO 8007 3 Conejos County, CO 8021
# 3 Baca County, CO 8009 2 Rio Arriba County, NM 40025
# 4 Cheyenne County, CO 8017 1 Kiowa County, CO 8061
注意: 使用 R >= 4.1。
資料:
df1 <- structure(list(countyname = c("Archuleta County, CO", "Baca County, CO",
"Cheyenne County, CO", "Kiowa County, CO", "Cimarron County, OK",
"Rio Arriba County, NM", "Conejos County, CO"), fipscounty = c(8007,
8009, 8017, 8061, 35039, 40025, 8021)), class = "data.frame", row.names = c(NA,
-7L))
df2 <- structure(list(county1 = c(8007, 8007, 8009, 8017), county2 = c(35039,
8021, 40025, 8061)), class = "data.frame", row.names = c(NA,
-4L))
df1a <- structure(list(countyname = c("Archuleta County, CO", "Baca County, CO",
"Cheyenne County, CO", "Kiowa County, CO", "Cimarron County, OK",
"Rio Arriba County, NM", "Conejos County, CO"), fipscounty = c(8007,
8009, 8017, 8061, 35039, 40025, 8021)), class = "data.frame", row.names = c(NA,
-7L))
df2a <- structure(list(county1 = c(8007, 8007, 8009, 8017), distance = c(4,
3, 2, 1), county2 = c(35039, 8021, 40025, 8061)), class = "data.frame", row.names = c(NA,
-4L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/462260.html
下一篇:識別值何時在r中重復
