我需要從書面文本的每個段落中提取第一句話。我還需要保留段落結構,以便第一句話是它自己的段落。
我需要為此使用 R。
我知道我必須添加一個回圈函式,但我不知道如何添加。
非常感謝,伙計們。
uj5u.com熱心網友回復:
假設每個句子都用 分割,.每個段落都用 分割\n。例如,
dummy <- c("first sentence. blablabla.
first sentence2. blablablabblah.")
然后通過使用stringr::str_split,
sapply(str_split(dummy, "\n", simplify = TRUE), function(x) str_split(x, "\\.", simplify = T)[1])
你可以得到
first sentence. blablabla. first sentence2. blablablabblah.
"first sentence" " first sentence2"
如果您的輸入是段落向量,
dummy <- c("first sentence. blablabla.","first sentence2. blablablabblah.")
sapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1])
first sentence. blablabla. first sentence2. blablablabblah.
"first sentence" "first sentence2"
您的文本的代碼。
dummy <- c("Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low. But, Mister Speaker, I appreciate the very constructive approach that you and other leaders took at the end of last year to pass a budget and make tax cuts permanent for working families." , "So I hope we can work together this year on some priorities like criminal justice reform.So, who knows, we might surprise the cynics again.")
lapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1])
[[1]]
[1] "Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low"
[[2]]
[1] "So I hope we can work together this year on some priorities like criminal justice reform"
unlist(lapply(dummy, function(x)str_split(x, "\\.", simplify = T)[1]))
[1] "Now, I truly understand that because it's an election season expectations for what we will achieve this year is really low"
[2] "So I hope we can work together this year on some priorities like criminal justice reform"
轉載請註明出處,本文鏈接:https://www.uj5u.com/qukuanlian/343290.html
標籤:r
上一篇:對生成的輸出表進行調整
