第 1 行和第 4 行具有相同的資訊。唯一的區別是它們出現在下面的列已被翻轉。
我已經知道尤馬縣和夏延縣是第 1 行的鄰居。我不需要在第 4 行重復此資訊。
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
4 Cheyenne County, KS 20023 Yuma County, CO 8125
5 Cheyenne County, KS 20023 Dundy County, NE 31057
我不介意這些縣不止一次出現,我只關心每行中的整體資訊與前一個不同。我想保留第1行并洗掉第4行,以便最終看起來像這樣
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
如何洗掉資料集中具有重復資訊的行?
uj5u.com熱心網友回復:
你也可以這樣做:
idx <- duplicated(t(apply(CountyList[c('fipscounty', 'fipsneighbor')], 1, sort)))
CountyList[!idx, ]
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
uj5u.com熱心網友回復:
這是另一個可能的基本 R 選項:
df[!duplicated(t(apply(df, 1, sort))),]
輸出
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
資料
df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO",
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS",
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO",
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L,
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))
uj5u.com熱心網友回復:
interaction在找到具有“較小”(即字母表中的第一個)名稱以及“較大”名稱的名稱后,我們可以使用生成唯一因子。然后我們可以data.frame根據它過濾:
CountyList <- read.table(text="countyname fipscounty neighborname fipsneighbor
1 'Yuma County, CO' 8125 'Cheyenne County, KS' 20023
2 'Yuma County, CO' 8125 'Chase County, NE' 31029
3 'Cheyenne County, KS' 20023 'Kit Carson County, CO' 8063
4 'Cheyenne County, KS' 20023 'Yuma County, CO' 8125
5 'Cheyenne County, KS' 20023 'Dundy County, NE' 31057")
fname <- pmin(CountyList$countyname,CountyList$neighborname) #Get first name
lname <- pmax(CountyList$countyname,CountyList$neighborname) #Get last names
duplicate.key <- as.numeric(interaction(fname,lname)) # Create factors from interaction and convert to numeric
CountyList[match(unique(duplicate.key),duplicate.key),] # Only keep first occurence
countyname fipscounty neighborname fipsneighbor
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
5 Cheyenne County, KS 20023 Dundy County, NE 31057
uj5u.com熱心網友回復:
這是一種tidyverse方法。
首先unite將所有列一起放入new_col(即將所有列粘貼在一起)。然后將new_col背面分成單獨的部分和sort它們。將此保存到new_col2. 接下來我們只保留 的distinct行new_col2。最后洗掉新創建的列。
library(tidyverse)
df %>%
unite("new_col", everything(), sep = "_", remove = F) %>%
rowwise() %>%
mutate(new_col2 = paste(sort(str_split(new_col, "_", simplify = T)), collapse = "")) %>%
ungroup() %>%
distinct(new_col2, .keep_all = T) %>%
select(-starts_with("new_col"))
# A tibble: 4 × 4
countyname fipscounty neighborname fipsneighbor
<chr> <int> <chr> <int>
1 Yuma County, CO 8125 Cheyenne County, KS 20023
2 Yuma County, CO 8125 Chase County, NE 31029
3 Cheyenne County, KS 20023 Kit Carson County, CO 8063
4 Cheyenne County, KS 20023 Dundy County, NE 31057
資料
df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO",
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS",
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO",
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L,
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/461318.html
上一篇:如何將虛擬變數列轉換為多列?
下一篇:如何根據其他列的值更改列的值?
