我正在努力為以下問題撰寫 R 代碼:df1 和 df2 是兩個資料幀。
> df1 <- read.csv(file = 'Indx.csv')
> df1
St_Name I1 I2 I3 I4
1 TN 0.10 0.15 0.20 0.25
2 AZ 0.30 0.35 0.40 0.45
3 TX 0.50 0.55 0.60 0.65
4 KS 0.70 0.75 0.80 0.85
5 KY 0.90 0.95 0.11 0.12
6 MN 0.13 0.14 0.16 0.17
> df2 <- as.data.frame(fromJSON(file = "NewIndx.json"))
> df2
St_Name I1 I3
1 KS 100 200
# The output should be
> df1
St_Name I1 I2 I3 I4
1 TN 0.10 0.15 0.20 0.25
2 AZ 0.30 0.35 0.40 0.45
3 TX 0.50 0.55 0.60 0.65
4 KS 100 0.75 200 0.85
5 KY 0.90 0.95 0.11 0.12
6 MN 0.13 0.14 0.16 0.17
>
實作這一目標的最佳代碼是什么?
uj5u.com熱心網友回復:
這不是最佳選擇,但它是獲得所需內容的一種方式。
如果您安裝了data.table軟體包并且不介意安裝輕量級軟體包:
install.packages("kim")
library(kim)
df3 <- merge_data_tables(df2, df1, "St_Name")
df3 <- order_rows_specifically_in_dt(df3, "St_Name", df1[, St_Name])
data.table::setcolorder(df3, names(df1))
df1 <- df3
df1
uj5u.com熱心網友回復:
我們可以使用coalesce_joinEdward Visel 提供的這個稍微修改過的函式:
library(tidyverse)
# the function:
coalesce_join <- function(x, y,
by = NULL, suffix = c(".y", ".x"),
join = dplyr::full_join, ...) {
joined <- join(y, x, by = by, suffix = suffix, ...)
# names of desired output
cols <- union(names(y), names(x))
to_coalesce <- names(joined)[!names(joined) %in% cols]
suffix_used <- suffix[ifelse(endsWith(to_coalesce, suffix[1]), 1, 2)]
# remove suffixes and deduplicate
to_coalesce <- unique(substr(
to_coalesce,
1,
nchar(to_coalesce) - nchar(suffix_used)
))
coalesced <- purrr::map_dfc(to_coalesce, ~dplyr::coalesce(
joined[[paste0(.x, suffix[1])]],
joined[[paste0(.x, suffix[2])]]
))
names(coalesced) <- to_coalesce
dplyr::bind_cols(joined, coalesced)[cols]
}
# apply
coalesce_join(df1, df2, by = 'St_Name')
St_Name I1 I3 I2 I4
1 KS 100.00 200.00 0.75 0.85
2 TN 0.10 0.20 0.15 0.25
3 AZ 0.30 0.40 0.35 0.45
4 TX 0.50 0.60 0.55 0.65
5 KY 0.90 0.11 0.95 0.12
6 MN 0.13 0.16 0.14 0.17
uj5u.com熱心網友回復:
請讓我知道這是否是您的預期。
library(tidyr)
id<- "St_Name"
df_1<- melt(df_1, id.vars = id, measure.vars = setdiff(colnames(df_1),id))
df_2 <- melt(df_2, id.vars = id, measure.vars = setdiff(colnames(df_2),id))
result <- merge(df_1,df_2, by=c("St_Name","variable"),no.dups = TRUE,all.x = TRUE)
result$value.x[which(!is.na(result$value.y))]<- result$value.y[which(!is.na(result$value.y))]
result <- result[,-4]
result <-spread(result, variable, value.x)
轉載請註明出處,本文鏈接:https://www.uj5u.com/ruanti/343679.html
下一篇:從兩個不同的資料集中減去列
