我正在嘗試從板球評論中獲取特定的關鍵詞,我正在尋找的一些關鍵詞是串列中 2 到 3 個單詞的組合,所以,
這是我在評論中查看的關鍵字串列
region <- c("third man", "deep fine leg", "long leg", "deep square leg", "Deep mid wicket",
"cow corner", "long on", "Deep extra cover", "Deep Cover", "Deep point",
"Deep backword point", "fly slip", "backword point", "point", "cover", "Extra covers",
"mid off", "mid on", "mid wicket", "square leg", "backword square leg", "fine leg",
"slips", "gully", "silly point", "silly mid off", "silly mid on", "short leg",
"leg gully", "leg slip")
*Pretorius 到 Umesh Yadav,1 跑,由 Pretorius 投球,觸球速度較慢,因為它已經沿著地面被驅動到長距離
Pretorius 到 Chahar,六歲,這是一個很棒的鏡頭。Pretorius 在外線投球,速度較慢,Chahar 跪倒在地上,打出精彩的高射,在深度額外的掩護處越過邊界
Pretorius 到 Umesh Yadav,1 次跑動,在關閉時觸球更飽滿,Umesh Yadav 將其訓練到長距離單打*
當一個特定的球有 2 個或多個單詞的組合時,我如何匹配評論中的關鍵字。我正在排除上述串列中的哪個單詞與
我使用 R 版本 4.2.1 和 RStudio的評論相匹配
uj5u.com熱心網友回復:
最好在匹配之前對句子和關鍵字進行預處理(即轉換為小寫、洗掉標點符號等)。
例如,你的句子
Pretorius 到 Chahar,六歲,這是一個很棒的鏡頭。Pretorius 在外線投球,速度較慢,Chahar 跪倒在地上,打出精彩的高射,在深度額外的掩護處越過邊界
region由于您各自的值并非所有字符都為小寫字符,因此與您的向量中的某些內容不匹配。
不確定您想要的輸出,但為了回傳每個句子的匹配項,我會使用dplyrand做類似的事情stringr。
library(stringr)
library(dplyr)
sentence <- data.frame(sens = c("Pretorius to Umesh Yadav, 1 run, pitched up by Pretorius, touch slower as it has been driven along the ground to long-off",
"Pretorius to Chahar, SIX, that's a great shot. Pitched up by Pretorius outside off, a slower one and Chahar goes down on his knee and plays a fantastic lofted shot to clear the boundary at deep extra cover"))
region <- c("third man", "deep fine leg", "long leg", "deep square leg", "Deep mid wicket",
"cow corner", "long on", "Deep extra cover", "Deep Cover", "Deep point",
"Deep backword point", "fly slip", "backword point", "point", "cover", "Extra covers",
"mid off", "mid on", "mid wicket", "square leg", "backword square leg", "fine leg",
"slips", "gully", "silly point", "silly mid off", "silly mid on", "short leg",
"leg gully", "leg slip")
sentence %>%
rowwise() %>%
mutate(match = paste0(str_extract_all(tolower(sens), paste0(tolower(region), collapse = "|"), simplify = TRUE), collapse = "|"))
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/515292.html
