我的資料由許多成對的文本組成,這些文本被分成句子,每行一個。我想通過二元組內的揚聲器連接資料,基本上將資料轉換為說話輪次。這是一個示例資料集:
dyad <- c(1,1,1,1,1,2,2,2,2)
speaker <- c("John", "John", "John", "Paul","John", "George", "Ringo", "Ringo", "George")
text <- c("Let's play",
"We're wasting time",
"Let's make a record!",
"Let's work it out first",
"Why?",
"It goes like this",
"Hold on",
"Have to tighten my snare",
"Ready?")
dat <- data.frame(dyad, speaker, text)
這就是我希望資料的樣子:
dyad speaker text
1 1 John Let's play. We're wasting time. Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on. Have to tighten my snare
6 2 George Ready?
我試過按發件人分組并從dplyr粘貼/折疊,但串聯結合了發件人的所有文本而不保留說話順序。例如,John 的最后一個陳述(“Why”)在輸出中與他的其他文本一起結束,而不是在 Paul 的評論之后。我還嘗試檢查下一個發言者(使用Lead(sender))是否與當前發言者相同,然后合并,但它只檢查相鄰行,在這種情況下,它錯過了約翰在示例中的第三條評論。似乎應該很簡單,但我無法實作。并且應該靈活地組合給定說話者的任何系列連續行。
提前致謝
uj5u.com熱心網友回復:
使用rleid(from data.table) 和中paste的行創建另一個組summarise
library(dplyr)
library(data.table)
library(stringr)
dat %>%
group_by(dyad, grp = rleid(speaker), speaker) %>%
summarise(text = str_c(text, collapse = ' '), .groups = 'drop') %>%
select(-grp)
-輸出
# A tibble: 6 × 3
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 Paul Let's work it out first
3 1 John Why?
4 2 George It goes like this
5 2 Ringo Hold on Have to tighten my snare
6 2 George Ready?
uj5u.com熱心網友回復:
不像親愛的阿克倫的解決方案那么優雅。helper與rleid此處的功能相同,無需額外的包:
library(dplyr)
dat %>%
mutate(helper = (speaker != lag(speaker, 1, default = "xyz")),
helper = cumsum(helper)) %>%
group_by(dyad, speaker, helper) %>%
summarise(text = paste0(text, collapse = " "), .groups = 'drop') %>%
select(-helper)
dyad speaker text
<dbl> <chr> <chr>
1 1 John Let's play We're wasting time Let's make a record!
2 1 John Why?
3 1 Paul Let's work it out first
4 2 George It goes like this
5 2 George Ready?
6 2 Ringo Hold on Have to tighten my snare
轉載請註明出處,本文鏈接:https://www.uj5u.com/qita/350441.html
上一篇:替換R中的反斜杠
