我正在嘗試對我的資料進行故障排除,并檢查某個名稱是否出現在同一行的兩個不同列中(相同的觀察結果):
df1 <- data.frame(
text1 = c("John Jay Jakson",
"John Jay Jakson",
"John Jay Jakson",
"John Jack Jakson"),
text2 = c("Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson"))
df2 <- data.frame(
names = c("John", "Jay", "Jackson", "Jerry", "Jack", "Jameson"))
我想出的代碼如下
data.check = sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE)
或者:
which(sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE))
但這些都不是篩選資料的最佳方式。相反,我想在 df1 中創建一個新列 df1$check,它根據 df1$text1 和 df1$text2 下該行中的每一行是否具有相同的名稱來保存 1/0。
我知道將此代碼分配給新列將不起作用:
df1$check = sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2))==TRUE)
它給了我錯誤的第 4 行,這應該是正確的。
任何幫助表示贊賞,謝謝。
uj5u.com熱心網友回復:
sapplyOP 代碼中的輸出回傳一個邏輯matrix.
> sapply(df2$names, function(x) (grepl(x, df1$text1) & grepl(x, df1$text2)))
John Jay Jackson Jerry Jack Jameson
[1,] FALSE FALSE FALSE FALSE FALSE FALSE
[2,] FALSE FALSE FALSE FALSE FALSE FALSE
[3,] FALSE FALSE FALSE FALSE FALSE FALSE
[4,] FALSE FALSE FALSE FALSE TRUE FALSE
的每一列都matrix應該收斂到一個邏輯值來創建一個向量。我們可以rowSums在邏輯矩陣上用換行,然后將行明智的總和轉換為邏輯向量 ( > 0) 并將其強制轉換回二進制 ( - TRUE-> 1, FALSE-> 0)
df1$check <- (rowSums(sapply(df2$names, function(x)
(grepl(x, df1$text1) & grepl(x, df1$text2)))) > 0)
df1$check
[1] 0 0 0 1
或者另一種選擇是回圈 with lapply,回傳 alist并使用Reducewith|回傳一個向量
df1$check <- (Reduce(`|`, lapply(df2$names, function(x)
(grepl(x, df1$text1) & grepl(x, df1$text2)))))
uj5u.com熱心網友回復:
我包括一個 dplyr 方法:
# import required libraries
library(dplyr)
library(stringr)
# create your data (I added two more rows)
df1 <- data.frame(
text1 = c("John Jay Jakson",
"John Jay Jakson",
"John Jay Jakson",
"John Jack Jakson","Peter","John Snow"),
text2 = c("Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson",
"Jerry Jack Jameson","Peter", "Clay Snow"))
df2 <- data.frame(
names = c("John", "Jay", "Jackson", "Jerry", "Jack", "Jameson"))
# optionally convert df2 to vector or list
v2<-as.vector(df2$names)
#use of str_detect() to look for the string
# use of case_when() that works like if/else
# by including the | operator between the different names
# create a new column called check to store 1s and 0s
df1<-df1%>%
mutate(check=case_when(str_detect(text1,paste(v2, collapse = "|"))==TRUE & str_detect(text2,paste(v2, collapse = "|"))==TRUE ~"1",
TRUE~"0"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/yidong/392721.html
