我在 R 中有以下資料幀格式
| 問題編號 | answer_id | 問題文本 | answer_text |
|---|---|---|---|
| 10000 | 0001 | 有多少人活著? | 20000 |
| 10000 | 0002 | 有多少人活著? | 50000 |
| 10000 | 0003 | 有多少人活著? | 60000 |
| 10000 | 0004 | 有多少人活著? | 900000 |
| 20000 | 0021 | 什么是生命的意義? | 是的 |
| 20000 | 0072 | 什么是生命的意義? | 不 |
| 20000 | 0083 | 什么是生命的意義? | 也許 |
| 20000 | 0094 | 什么是生命的意義? | 好的 |
| 20000 | 0097 | 什么是生命的意義? | 漠不關心 |
我希望它采用以下格式:
| 問題編號 | 問題文本 | answer_text_1 | answer_text_2 | answer_text_3 | answer_text_4 | ... | answer_text_n |
|---|---|---|---|---|---|---|---|
| 10000 | 有多少人活著? | 20000 | 50000 | 60000 | 900000 | ... | 不適用 |
| 20000 | 什么是生命的意義? | 是的 | 不 | 也許 | 好的 | ... | 漠不關心 |
因此,如您所見,我想要 question_id,然后是 question_text 本身,然后是一組列,該列等于問題的最大答案數量。所以有些問題是對還是錯,所以只會填寫 2 列。但也可以存在多項選擇題,可能有 7 種不同的選項可供選擇。我希望它具有適應性。
我唯一能想到的就是tidyr::pivot_wider()。我似乎無法完成這項作業。任何幫助將不勝感激。謝謝!!
========== 編輯 ============
我試過的代碼
qa_columns <- function(qa_df, question_list){
df <- qa_df %>%
dplyr::filter(question_id %in% question_list) %>%
tidyr::pivot_wider(values_from = answer_text)
return(df)
}
qa <- qa_columns(qa_text, question_list = question_list)
uj5u.com熱心網友回復:
嘗試為每組添加一個行號,然后保留question_text為 id:
library(dplyr)
library(tidyr) # pivot_wider
dat %>%
group_by(question_id) %>%
mutate(rn = row_number()) %>%
ungroup() %>%
pivot_wider(c("question_id", "question_text"), names_from = "rn", names_prefix = "answer_text_", values_from = "answer_text")
# # A tibble: 2 x 7
# question_id question_text answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
# <int> <chr> <chr> <chr> <chr> <chr> <chr>
# 1 10000 how many people are alive? 20000 50000 60000 900000 <NA>
# 2 20000 what is the meaning of life? yes no maybe ok indifference
請注意,這會強制將數字添加到character您的answer_text_#列中,這通常是不可避免的。
uj5u.com熱心網友回復:
使用dcast來自data.table
library(data.table)
dcast(setDT(dat), question_id question_text ~
paste0("answer_text_", rowid(question_id)), value.var = "answer_text")
-輸出
question_id question_text answer_text_1 answer_text_2 answer_text_3 answer_text_4 answer_text_5
1: 10000 how many people are alive? 20000 50000 60000 900000 <NA>
2: 20000 what is the meaning of life? yes no maybe ok indifference
資料
dat <- structure(list(question_id = c(10000L, 10000L, 10000L, 10000L,
20000L, 20000L, 20000L, 20000L, 20000L), answer_id = c(1L, 2L,
3L, 4L, 21L, 72L, 83L, 94L, 97L), question_text = c("how many people are alive?",
"how many people are alive?", "how many people are alive?", "how many people are alive?",
"what is the meaning of life?", "what is the meaning of life?",
"what is the meaning of life?", "what is the meaning of life?",
"what is the meaning of life?"), answer_text = c("20000", "50000",
"60000", "900000", "yes", "no", "maybe", "ok", "indifference"
)), class = "data.frame", row.names = c(NA, -9L))
轉載請註明出處,本文鏈接:https://www.uj5u.com/houduan/325698.html
標籤:r
上一篇:R中簡單但不容易的合并任務
