R：洗掉已翻轉的重復行-有解無憂

第 1 行和第 4 行具有相同的資訊。唯一的區別是它們出現在下面的列已被翻轉。

我已經知道尤馬縣和夏延縣是第 1 行的鄰居。我不需要在第 4 行重復此資訊。

           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
4 Cheyenne County, KS      20023       Yuma County, CO         8125
5 Cheyenne County, KS      20023      Dundy County, NE        31057

我不介意這些縣不止一次出現，我只關心每行中的整體資訊與前一個不同。我想保留第1行并洗掉第4行，以便最終看起來像這樣

           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

如何洗掉資料集中具有重復資訊的行？

uj5u.com熱心網友回復：

你也可以這樣做：

idx <- duplicated(t(apply(CountyList[c('fipscounty', 'fipsneighbor')], 1, sort)))
CountyList[!idx, ]

          countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

uj5u.com熱心網友回復：

這是另一個可能的基本 R 選項：

df[!duplicated(t(apply(df, 1, sort))),]

輸出

         countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

資料

df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO", 
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS", 
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO", 
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L, 
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))

uj5u.com熱心網友回復：

interaction在找到具有“較小”（即字母表中的第一個）名稱以及“較大”名稱的名稱后，我們可以使用生成唯一因子。然后我們可以data.frame根據它過濾：

CountyList <- read.table(text="countyname fipscounty          neighborname fipsneighbor
1     'Yuma County, CO'       8125   'Cheyenne County, KS'        20023
2     'Yuma County, CO'       8125      'Chase County, NE'        31029
3 'Cheyenne County, KS'      20023 'Kit Carson County, CO'         8063
4 'Cheyenne County, KS'      20023       'Yuma County, CO'         8125
5 'Cheyenne County, KS'      20023      'Dundy County, NE'        31057")


fname <- pmin(CountyList$countyname,CountyList$neighborname) #Get first name
lname <- pmax(CountyList$countyname,CountyList$neighborname) #Get last names

duplicate.key <- as.numeric(interaction(fname,lname)) # Create factors from interaction and convert to numeric

CountyList[match(unique(duplicate.key),duplicate.key),] # Only keep first occurence


           countyname fipscounty          neighborname fipsneighbor
1     Yuma County, CO       8125   Cheyenne County, KS        20023
2     Yuma County, CO       8125      Chase County, NE        31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
5 Cheyenne County, KS      20023      Dundy County, NE        31057

uj5u.com熱心網友回復：

這是一種tidyverse方法。

首先unite將所有列一起放入new_col（即將所有列粘貼在一起）。然后將new_col背面分成單獨的部分和sort它們。將此保存到new_col2. 接下來我們只保留的distinct行new_col2。最后洗掉新創建的列。

library(tidyverse)

df %>% 
  unite("new_col", everything(), sep = "_", remove = F) %>% 
  rowwise() %>% 
  mutate(new_col2 = paste(sort(str_split(new_col, "_", simplify = T)), collapse = "")) %>% 
  ungroup() %>% 
  distinct(new_col2, .keep_all = T) %>% 
  select(-starts_with("new_col"))

# A tibble: 4 × 4
  countyname          fipscounty neighborname          fipsneighbor
  <chr>                    <int> <chr>                        <int>
1 Yuma County, CO           8125 Cheyenne County, KS          20023
2 Yuma County, CO           8125 Chase County, NE             31029
3 Cheyenne County, KS      20023 Kit Carson County, CO         8063
4 Cheyenne County, KS      20023 Dundy County, NE             31057

資料

df <- structure(list(countyname = c("Yuma County, CO", "Yuma County, CO", 
"Cheyenne County, KS", "Cheyenne County, KS", "Cheyenne County, KS"
), fipscounty = c(8125L, 8125L, 20023L, 20023L, 20023L), neighborname = c("Cheyenne County, KS", 
"Chase County, NE", "Kit Carson County, CO", "Yuma County, CO", 
"Dundy County, NE"), fipsneighbor = c(20023L, 31029L, 8063L, 
8125L, 31057L)), class = "data.frame", row.names = c(NA, -5L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/qiye/461318.html

標籤：r 数据框数据操作

上一篇：如何將虛擬變數列轉換為多列？

下一篇：如何根據其他列的值更改列的值？