如何根據過濾值用修改后的列替換R中的列？（去除面板資料中的例外值）-有解無憂

我有一個像這樣的面板資料集

年	ID	治療年	治療時間	結果
2000年	1	2011年	-11	2
2002年	1	2011年	-10	3
2004年	2	2015年	-9	22

等等等等。我正在嘗試通過“Winsorize”處理例外值。最終目標是制作一個散點圖，X 軸為 time_to_treatment，Y 軸為結果。

我想用它的 winsorized 結果替換每個 time_to_treatment 的結果，即將所有極值替換為 5% 和 95% 分位數值。到目前為止，我試圖做的是這個，但它不起作用。

for(i in range(dataset$time_to_treatment)){
    dplyr::filter(dataset, time_to_treatment == i)$outcome <-  DescTools::Winsorize(dplyr::filter(dataset,time_to_treatment==i)$outcome)
}

我收到錯誤 -過濾器錯誤（資料集，time_to_treatment == i）<- *vtmp*：找不到函式“過濾器<-”

誰能提供更好的方法？謝謝。

我的實際資料，其中：沖突 = 結果，傭金 = 治療年份，CD_mun = id。

關注的時間段指標是time_to_t

組：年份，CD_MUN，型別 [6]

型別	CD_MUN	年	time_to_t	沖突	委員會
色度	資料庫	資料庫	資料庫	整數	資料庫
清單	1100023	2000年	-11	1	2011年
清單	1100189	2000年	-3	2	2003年
清單	1100205	2000年	-9	5	2009年
清單	1500602	2000年	-4	1	2004年
清單	3111002	2000年	-11	2	2011年
清單	3147006	2000年	-10	1	2010年

uj5u.com熱心網友回復：

假設“時間段”是指'commission'列，您可以使用ave.

transform(dat, conflicts_w=ave(conflicts, commission, FUN=DescTools::Winsorize))
#    type  CD_MUN year time_to_t conflicts commission conflicts_w
# 1 manif 1100023 2000       -11         1       2011        1.05
# 2 manif 1100189 2000        -3         2       2003        2.00
# 3 manif 1100205 2000        -9         5       2009        5.00
# 4 manif 1500602 2000        -4         1       2004        1.00
# 5 manif 3111002 2000       -11         2       2011        1.95
# 6 manif 3147006 2000       -10         1       2010        1.00

資料：

dat <- structure(list(type = c("manif", "manif", "manif", "manif", "manif", 
"manif"), CD_MUN = c(1100023L, 1100189L, 1100205L, 1500602L, 
3111002L, 3147006L), year = c(2000L, 2000L, 2000L, 2000L, 2000L, 
2000L), time_to_t = c(-11L, -3L, -9L, -4L, -11L, -10L), conflicts = c(1L, 
2L, 5L, 1L, 2L, 1L), commission = c(2011L, 2003L, 2009L, 2004L, 
2011L, 2010L)), class = "data.frame", row.names = c(NA, -6L))

uj5u.com熱心網友回復：

首先你可以使用這個：

# The data
set.seed(123)
df <- data.frame(
  time_to_treatment = seq(-15, 0, 1),
  outcome = sample(1:30, 16, replace=T)
)

# A solution without Winsorize based solely on dplyr
library(dplyr)
df %>% 
  mutate(outcome05 = quantile(outcome, probs = 0.05), # 5% quantile
         outcome95 = quantile(outcome, probs = 0.95), # 95% quantile
         outcome = ifelse(outcome <= outcome05, outcome05, outcome), # replace
         outcome = ifelse(outcome >= outcome95, outcome95, outcome)) %>% 
  select(-c(outcome05, outcome95))

您可以將其調整為您的確切問題。

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/390452.html

標籤：r 整理宇宙离群值面板数据

上一篇：洗掉在R中有重復的行

下一篇：如何在R中模擬每組具有例外值的分組資料分布