我有以下資料框:
library(dplyr)
library(tidyverse)
library(concordance)
Year <- c(2016,2016,2017,2019,2020,2020,2020,2013,2010,2010)
Pf <- c("HS4","HS4","HS4","HS5","HS5","HS5","HS5","HS4","HS3","HS3")
Code <- c("391890","440929","851660","732399","720839","050510","830241","321590","010210","010210")
Slen <- c("6","6","6","6","6","6","6","6","6","6")
df <- data.frame(Year,Pf,Code,Slen)
'Pf' 列包含 3 種不同型別的行:“HS3”、“HS4”和“HS5”。我想執行矢量化操作并將concord()函式應用于“代碼”列”,但是為了做到這一點,“Pf”必須是唯一的,這就是為什么在我設定“Pf”列是唯一的資料幀之前
# Subset data where Pf column is unique
df.H5 <- subset(df, Pf == "HS5")
df.H4 <- subset(df, Pf == "HS4")
df.H3 <- subset(df, Pf == "HS3")
現在我將一個函式應用于每個資料幀。這里concord()函式適用于“代碼”列并將這些字符轉換為不同的字符。但是,如果目標(引數)和“Pf”列中的值相同,則它不起作用,例如,如果 Pf="HS3"(在 df 中)和目標 =“HS3”,則代碼不會運行,這就是原因我不將代碼應用于 df.H3
# Apply function to df.H5
df.H5<- df.H5 %>%
group_by(Pf, Slen) %>%
mutate(
Code2 = concord(Code, origin = unique(Pf), dest.digit = unique(Slen), destination = "HS3", all = FALSE)
) %>%
ungroup()
# Apply function to df.H4
df.H4<- df.H4 %>%
group_by(Pf, Slen) %>%
mutate(
Code2 = concord(Code, origin = unique(Pf), dest.digit = unique(Slen), destination = "HS3", all = FALSE)
) %>%
ungroup()
#add column todf.H3 in order to merge these 3 tafarames
df.H3$Code2 <- df.H3$Code
#merge
df2 <- rbind(df.H4, df.H5, df.H3)
我的目標是以某種方式自動化這個程序。例如,如果destination = "HS3",則代碼應用整個資料而無需預先設定子集,并且如果destination(引數)和Pf 中的行相互匹配,則代碼不適用于它,只需從“代碼”復制粘貼值在這種情況下生成“Code2”列
uj5u.com熱心網友回復:
您可以將邏輯放在一個函式中,并以by拆分資料和應用函式的方法使用它。P == 'HS3'在該函式中,您可以在不應處理的情況下進行案例處理。最后unsplit。
cf <- \(x) {
Code2 <- if (!any(x$P == 'HS3')) {
concordance::concord(x$Code, x$Pf[1], x$Slen[1],
destination="HS3", all=FALSE)
} else {
x$Code
}
cbind(x, Code2)
}
by(df, df$Pf, cf) |>
unsplit(df$Pf)
# Year Pf Code Slen Code2
# 1 2016 HS4 391890 6 391890
# 2 2016 HS4 440929 6 440929
# 3 2017 HS4 851660 6 851660
# 4 2019 HS5 732399 6 732399
# 5 2020 HS5 720839 6 720839
# 6 2020 HS5 050510 6 050510
# 7 2020 HS5 830241 6 830241
# 8 2013 HS4 321590 6 321590
# 9 2010 HS3 010210 6 010210
# 10 2010 HS3 010210 6 010210
資料:
df <- structure(list(Year = c(2016, 2016, 2017, 2019, 2020, 2020, 2020,
2013, 2010, 2010), Pf = c("HS4", "HS4", "HS4", "HS5", "HS5",
"HS5", "HS5", "HS4", "HS3", "HS3"), Code = c("391890", "440929",
"851660", "732399", "720839", "050510", "830241", "321590", "010210",
"010210"), Slen = c("6", "6", "6", "6", "6", "6", "6", "6", "6",
"6")), class = "data.frame", row.names = c(NA, -10L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/462009.html
