如何根據R中具有不同觀察長度的另一個資料幀中的值在資料幀中創建虛擬物件？-有解無憂

提前感謝您閱讀我的問題。

在 RI 中，有以下兩個資料框（Df1 和 Df2）——它們是我剛剛撰寫的示例——我打算根據以下規則創建一個虛擬變數：對于 Df1 中的每個 id，如果此觀察的年份id 大于或等于 Df2 中年份的觀察 id，dummy 取值為 1，否則為 0。資料框 Df3 是我希望實作的結果。我該怎么做？DF1

ID	年	x1
1	2017年	0.3
1	2018年	0.5
1	2019年	0.45
1	2020年	0.5
1	2021年	0.6
2	2018年	0.2
2	2019年	0.3
2	2020年	0.4
2	2021年	0.5

DF2

ID	年
1	2019年
2	2020年

DF3

ID	年	x1	假的
1	2017年	0.3	0
1	2018年	0.5	0
1	2019年	0.45	1
1	2020年	0.5	1
1	2021年	0.6	1
2	2018年	0.5	0
2	2019年	0.45	0
2	2020年	0.5	1
2	2021年	0.6	1

一些背景：

I've tried creating two loops for the real data frame I am working on. Below is the code I have tried. The data frame is called data_school and my id, year, and dummy variables are id_escola, ano, and internet_fixa, respectively. I did a full join between my two initial data frames as a result I got data_school. Since it was a many to one join, I created the dummy variable and only the exact matches have values equal to 1 and everything else is a NA. Then I proceeded to do the following loop first iterating by all id and getting for each id, the year of reference for the dummy, and then iterating for each unique year of that id and replacing it according to the rule. For the first rows it works well, but after some rows, it gets the following error "Error in if (data_school[data_school$id_escola == id & data_school$ano == : argument is of length zero" What should I do?

for (id in unique(data_school$id_escola)) {
  current_subset <- subset(data_school, id_escola == id & is.na(internet_fixa) == F)
  year_implementation <- current_subset$ano
  current <- subset(data_school, id_escola == id)
  for (i in unique(current$ano)){
    if (data_school[data_school$id_escola == id & data_school$ano == i,]$ano < year_implementation) {
      data_school[data_school$id_escola == id & data_school$ano == i, "internet_fixa"] <- 0
    } else {
      data_school[data_school$id_escola == id & data_school$ano == i, "internet_fixa"] <- 1
    }
  }
}

PS：如果您愿意，如果不夠清楚，您可以忽略最后一部分（某些背景關系）。

uj5u.com熱心網友回復：

這是否有效：

library(dplyr)
df2 %>% rename('df2_year' = year) %>% left_join(df1, by = 'id') %>% group_by(id) %>% mutate(dummy = if_else(year >= df2_year, 1, 0)) %>% select(-df2_year)
# A tibble: 6 x 4
# Groups:   id [2]
     id  year    x1 dummy
  <int> <int> <dbl> <dbl>
1     1  2017  0.3      0
2     1  2018  0.5      0
3     1  2019  0.45     1
4     1  2020  0.5      1
5     1  2021  0.6      1
6     2    NA NA       NA

使用的資料：

df1
  id year   x1
1  1 2017 0.30
2  1 2018 0.50
3  1 2019 0.45
4  1 2020 0.50
5  1 2021 0.60
df2
  id year
1  1 2019
2  2 2020

示例資料中的 df1 中缺少 id = 2。

uj5u.com熱心網友回復：

我們可以使用連接 data.table

library(data.table)
setDT(df1)[df2, dummy :=  (year >= i.year), on = .(id)]

-輸出

> df1
   id year   x1 dummy
1:  1 2017 0.30     0
2:  1 2018 0.50     0
3:  1 2019 0.45     1
4:  1 2020 0.50     1
5:  1 2021 0.60     1

資料

df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L), year = 2017:2021, 
    x1 = c(0.3, 0.5, 0.45, 0.5, 0.6)), class = "data.frame", row.names = c(NA, 
-5L))

df2 <- structure(list(id = 1:2, year = 2019:2020),
 class = "data.frame", row.names = c(NA, 
-2L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/327633.html

標籤：r 数据框 dplyr

上一篇：包不存在，從注釋處理器生成的java檔案

下一篇：MySQL的條件過濾器處理