提前感謝您閱讀我的問題。
在 RI 中,有以下兩個資料框(Df1 和 Df2)——它們是我剛剛撰寫的示例——我打算根據以下規則創建一個虛擬變數:對于 Df1 中的每個 id,如果此觀察的年份id 大于或等于 Df2 中年份的觀察 id,dummy 取值為 1,否則為 0。資料框 Df3 是我希望實作的結果。我該怎么做?DF1
| ID | 年 | x1 |
|---|---|---|
| 1 | 2017年 | 0.3 |
| 1 | 2018年 | 0.5 |
| 1 | 2019年 | 0.45 |
| 1 | 2020年 | 0.5 |
| 1 | 2021年 | 0.6 |
| 2 | 2018年 | 0.2 |
| 2 | 2019年 | 0.3 |
| 2 | 2020年 | 0.4 |
| 2 | 2021年 | 0.5 |
DF2
| ID | 年 |
|---|---|
| 1 | 2019年 |
| 2 | 2020年 |
DF3
| ID | 年 | x1 | 假的 |
|---|---|---|---|
| 1 | 2017年 | 0.3 | 0 |
| 1 | 2018年 | 0.5 | 0 |
| 1 | 2019年 | 0.45 | 1 |
| 1 | 2020年 | 0.5 | 1 |
| 1 | 2021年 | 0.6 | 1 |
| 2 | 2018年 | 0.5 | 0 |
| 2 | 2019年 | 0.45 | 0 |
| 2 | 2020年 | 0.5 | 1 |
| 2 | 2021年 | 0.6 | 1 |
一些背景:
I've tried creating two loops for the real data frame I am working on. Below is the code I have tried. The data frame is called data_school and my id, year, and dummy variables are id_escola, ano, and internet_fixa, respectively. I did a full join between my two initial data frames as a result I got data_school. Since it was a many to one join, I created the dummy variable and only the exact matches have values equal to 1 and everything else is a NA. Then I proceeded to do the following loop first iterating by all id and getting for each id, the year of reference for the dummy, and then iterating for each unique year of that id and replacing it according to the rule. For the first rows it works well, but after some rows, it gets the following error "Error in if (data_school[data_school$id_escola == id & data_school$ano == : argument is of length zero" What should I do?
for (id in unique(data_school$id_escola)) {
current_subset <- subset(data_school, id_escola == id & is.na(internet_fixa) == F)
year_implementation <- current_subset$ano
current <- subset(data_school, id_escola == id)
for (i in unique(current$ano)){
if (data_school[data_school$id_escola == id & data_school$ano == i,]$ano < year_implementation) {
data_school[data_school$id_escola == id & data_school$ano == i, "internet_fixa"] <- 0
} else {
data_school[data_school$id_escola == id & data_school$ano == i, "internet_fixa"] <- 1
}
}
}
PS:如果您愿意,如果不夠清楚,您可以忽略最后一部分(某些背景關系)。
uj5u.com熱心網友回復:
這是否有效:
library(dplyr)
df2 %>% rename('df2_year' = year) %>% left_join(df1, by = 'id') %>% group_by(id) %>% mutate(dummy = if_else(year >= df2_year, 1, 0)) %>% select(-df2_year)
# A tibble: 6 x 4
# Groups: id [2]
id year x1 dummy
<int> <int> <dbl> <dbl>
1 1 2017 0.3 0
2 1 2018 0.5 0
3 1 2019 0.45 1
4 1 2020 0.5 1
5 1 2021 0.6 1
6 2 NA NA NA
使用的資料:
df1
id year x1
1 1 2017 0.30
2 1 2018 0.50
3 1 2019 0.45
4 1 2020 0.50
5 1 2021 0.60
df2
id year
1 1 2019
2 2 2020
- 示例資料中的 df1 中缺少 id = 2。
uj5u.com熱心網友回復:
我們可以使用連接 data.table
library(data.table)
setDT(df1)[df2, dummy := (year >= i.year), on = .(id)]
-輸出
> df1
id year x1 dummy
1: 1 2017 0.30 0
2: 1 2018 0.50 0
3: 1 2019 0.45 1
4: 1 2020 0.50 1
5: 1 2021 0.60 1
資料
df1 <- structure(list(id = c(1L, 1L, 1L, 1L, 1L), year = 2017:2021,
x1 = c(0.3, 0.5, 0.45, 0.5, 0.6)), class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(id = 1:2, year = 2019:2020),
class = "data.frame", row.names = c(NA,
-2L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/327633.html
下一篇:MySQL的條件過濾器處理
