從長到寬重塑資料框，但每行的列數不同-有解無憂

我在 R 中有以下資料幀格式

問題編號	answer_id	問題文本	answer_text
10000	0001	有多少人活著？	20000
10000	0002	有多少人活著？	50000
10000	0003	有多少人活著？	60000
10000	0004	有多少人活著？	900000
20000	0021	什么是生命的意義？	是的
20000	0072	什么是生命的意義？	不
20000	0083	什么是生命的意義？	也許
20000	0094	什么是生命的意義？	好的
20000	0097	什么是生命的意義？	漠不關心

我希望它采用以下格式：

問題編號	問題文本	answer_text_1	answer_text_2	answer_text_3	answer_text_4	...	answer_text_n
10000	有多少人活著？	20000	50000	60000	900000	...	不適用
20000	什么是生命的意義？	是的	不	也許	好的	...	漠不關心

因此，如您所見，我想要 question_id，然后是 question_text 本身，然后是一組列，該列等于問題的最大答案數量。所以有些問題是對還是錯，所以只會填寫 2 列。但也可以存在多項選擇題，可能有 7 種不同的選項可供選擇。我希望它具有適應性。

我唯一能想到的就是tidyr::pivot_wider()。我似乎無法完成這項作業。任何幫助將不勝感激。謝謝！！

========== 編輯 ============

我試過的代碼

qa_columns <- function(qa_df, question_list){
  df <- qa_df %>%
    dplyr::filter(question_id %in% question_list) %>%
    tidyr::pivot_wider(values_from = answer_text)
  return(df)
}

qa <- qa_columns(qa_text, question_list = question_list)

uj5u.com熱心網友回復：

嘗試為每組添加一個行號，然后保留question_text為 id：

library(dplyr)
library(tidyr) # pivot_wider
dat %>%
  group_by(question_id) %>%
  mutate(rn = row_number()) %>%
  ungroup() %>%
  pivot_wider(c("question_id", "question_text"), names_from = "rn", names_prefix = "answer_text_", values_from = "answer_text")
# # A tibble: 2 x 7
#   question_id question_text                answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
#         <int> <chr>                        <chr>         <chr>         <chr>         <chr>         <chr>        
# 1       10000 how many people are alive?   20000         50000         60000         900000        <NA>         
# 2       20000 what is the meaning of life? yes           no            maybe         ok            indifference

請注意，這會強制將數字添加到character您的answer_text_#列中，這通常是不可避免的。

uj5u.com熱心網友回復：

使用dcast來自data.table

library(data.table)
dcast(setDT(dat), question_id   question_text ~ 
   paste0("answer_text_", rowid(question_id)), value.var = "answer_text")

-輸出

 question_id                question_text answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
1:       10000   how many people are alive?         20000         50000         60000        900000          <NA>
2:       20000 what is the meaning of life?           yes            no         maybe            ok  indifference

資料

dat <- structure(list(question_id = c(10000L, 10000L, 10000L, 10000L, 
20000L, 20000L, 20000L, 20000L, 20000L), answer_id = c(1L, 2L, 
3L, 4L, 21L, 72L, 83L, 94L, 97L), question_text = c("how many people are alive?", 
"how many people are alive?", "how many people are alive?", "how many people are alive?", 
"what is the meaning of life?", "what is the meaning of life?", 
"what is the meaning of life?", "what is the meaning of life?", 
"what is the meaning of life?"), answer_text = c("20000", "50000", 
"60000", "900000", "yes", "no", "maybe", "ok", "indifference"
)), class = "data.frame", row.names = c(NA, -9L))

轉載請註明出處，本文鏈接：https://www.uj5u.com/houduan/325698.html

標籤：r

上一篇：R中簡單但不容易的合并任務

下一篇：在RShiny中，如何對資料幀的指定列求和并將結果輸出到表格中？