R在組內跨行連接但保留序列-有解無憂

我的資料由許多成對的文本組成，這些文本被分成句子，每行一個。我想通過二元組內的揚聲器連接資料，基本上將資料轉換為說話輪次。這是一個示例資料集：

dyad <- c(1,1,1,1,1,2,2,2,2)
speaker <- c("John", "John", "John", "Paul","John", "George", "Ringo", "Ringo", "George")
text <- c("Let's play",
          "We're wasting time",
          "Let's make a record!",
          "Let's work it out first",
          "Why?",
          "It goes like this",
          "Hold on",
          "Have to tighten my snare",
          "Ready?")

dat <- data.frame(dyad, speaker, text)

這就是我希望資料的樣子：

  dyad speaker                                                text
1      1    John Let's play. We're wasting time. Let's make a record!
2      1    Paul                              Let's work it out first
3      1    John                                                 Why?
4      2  George                                    It goes like this
5      2   Ringo                    Hold on. Have to tighten my snare
6      2  George                                               Ready?

我試過按發件人分組并從dplyr粘貼/折疊，但串聯結合了發件人的所有文本而不保留說話順序。例如，John 的最后一個陳述（“Why”）在輸出中與他的其他文本一起結束，而不是在 Paul 的評論之后。我還嘗試檢查下一個發言者（使用Lead(sender)）是否與當前發言者相同，然后合并，但它只檢查相鄰行，在這種情況下，它錯過了約翰在示例中的第三條評論。似乎應該很簡單，但我無法實作。并且應該靈活地組合給定說話者的任何系列連續行。

提前致謝

uj5u.com熱心網友回復：

使用rleid(from data.table) 和中paste的行創建另一個組summarise

library(dplyr)
library(data.table)
library(stringr)
dat %>% 
   group_by(dyad, grp = rleid(speaker), speaker) %>% 
   summarise(text = str_c(text, collapse = ' '), .groups = 'drop') %>% 
   select(-grp)

-輸出

# A tibble: 6 × 3
   dyad speaker text                                              
  <dbl> <chr>   <chr>                                             
1     1 John    Let's play We're wasting time Let's make a record!
2     1 Paul    Let's work it out first                           
3     1 John    Why?                                              
4     2 George  It goes like this                                 
5     2 Ringo   Hold on Have to tighten my snare                  
6     2 George  Ready?

uj5u.com熱心網友回復：

不像親愛的阿克倫的解決方案那么優雅。helper與rleid此處的功能相同，無需額外的包：

library(dplyr)
dat %>% 
  mutate(helper = (speaker != lag(speaker, 1, default = "xyz")),
         helper = cumsum(helper)) %>% 
  group_by(dyad, speaker, helper) %>% 
  summarise(text = paste0(text, collapse = " "), .groups = 'drop') %>% 
  select(-helper)

     dyad speaker text                                              
  <dbl> <chr>   <chr>                                             
1     1 John    Let's play We're wasting time Let's make a record!
2     1 John    Why?                                              
3     1 Paul    Let's work it out first                           
4     2 George  It goes like this                                 
5     2 George  Ready?                                            
6     2 Ringo   Hold on Have to tighten my snare

轉載請註明出處，本文鏈接：https://www.uj5u.com/qita/350441.html

標籤：r dplyr 级联序列

上一篇：替換R中的反斜杠

下一篇：在R代碼中使用na.rm=TRUE進行匯總時出現問題