我正在嘗試使用 at.test()來比較資料框中多列的方法,其中要比較的值在每列中。每行都有幾列元資料(Date, Assay, Timing)。我的資料如下df所示,其中收集的資料未配對,meas1并且meas2是不相關的不同測量結果。我試圖進行的比較是在每個日期、每個化驗和每個測驗meas1[Timing=="Start"]之間進行比較。meas1[Timing == "End"]我的實際資料有大約 10 列測量資料,這會影響我對某些子集的語法。
library(tidyverse)
df <- data.frame(Date=rep(c("2022-01-01","2022-01-02"), each = 18),
Assay = rep(c("Gly", "Asp", "Con"), each = 3, times = 4),
Timing = c(rep("Start",9),rep("End",9)),
meas1=round(rnorm(36,5,3),0),
meas2=round(rnorm(36,8,9),0))
我嘗試了幾種不同的方法。一種是嘗試使用元資料inner_join()的pivot_longer()單獨資料框將資料結合在一起,但我沒有得到預期的結果。
comp <- list(Assay = c("Gly","Asp","Con"),
first = "Start",
last = "End",
test = names(df %>% select(-Date,-Assay,-Timing))) %>%
cross_df()
df_pivot <- df %>%
pivot_longer(c(-Date,-Assay,-Timing), names_to = "test")
t_tests <- comp %>%
inner_join(df_pivot, by = c("Assay", "test", "first"="Timing")) %>%
rename(initial = value) %>%
inner_join(df_pivot, by = c("Date", "Assay", "test", "last"="Timing")) %>%
rename(final = value)
t_tests
# A tibble: 108 × 7
Assay first last test Date initial final
<chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 Gly Start End meas1 2022-01-01 8 8
2 Gly Start End meas1 2022-01-01 8 9
3 Gly Start End meas1 2022-01-01 8 4
4 Gly Start End meas1 2022-01-01 4 8
5 Gly Start End meas1 2022-01-01 4 9
6 Gly Start End meas1 2022-01-01 4 4
7 Gly Start End meas1 2022-01-01 -1 8
8 Gly Start End meas1 2022-01-01 -1 9
9 Gly Start End meas1 2022-01-01 -1 4
10 Gly Start End meas1 2022-01-02 6 1
# … with 98 more rows
# ? Use `print(n = ...)` to see more rows
每個不同的最終值都會重復初始值,這不是我想要的,因為資料沒有配對。我試圖只獲得 36 行:2 個日期、3 個化驗、2 個測驗、每個測驗的 6 個值(3 個值乘 2 列)。換言之,行 1:9 應壓縮為 3 行(第 1、5 和 9 行),僅包含唯一的初始值和最終值。這是我需要幫助的地方。1,5,9 模式應該重復,但我希望避免事后對資料進行切片。
假設該部分已正確完成,我將按如下方式進行,這為我提供了t.test()我想要的結果摘要:
t_tests <- t_tests %>%
mutate(first = NULL, last = NULL) %>%
group_by(Date,Assay,test) %>%
group_modify(~broom::tidy(t.test(.x$initial,.x$final))) %>% ungroup()
t_tests
# A tibble: 12 × 13
Date Assay test estimate estimate1 estimate2 statistic p.value parameter conf.low conf.high method alternative
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <chr> <chr>
1 2022-01-01 Asp values1 -2.33 5.67 8 -1.79 0.0989 11.7 -5.18 0.511 Welch Two Sample t-test two.sided
2 2022-01-01 Asp values2 9.33 6.67 -2.67 2.14 0.0643 8.16 -0.698 19.4 Welch Two Sample t-test two.sided
3 2022-01-01 Con values1 3.67 6.67 3 2.17 0.0552 10.1 -0.0984 7.43 Welch Two Sample t-test two.sided
4 2022-01-01 Con values2 -8.33 2.67 11 -2.93 0.0110 14.1 -14.4 -2.23 Welch Two Sample t-test two.sided
5 2022-01-01 Gly values1 0.333 5 4.67 0.343 0.737 13.1 -1.76 2.43 Welch Two Sample t-test two.sided
6 2022-01-01 Gly values2 -0.333 5.67 6 -0.100 0.922 11.5 -7.63 6.96 Welch Two Sample t-test two.sided
7 2022-01-02 Asp values1 2 6 4 1.36 0.193 16 -1.12 5.12 Welch Two Sample t-test two.sided
8 2022-01-02 Asp values2 11 11.7 0.667 2.02 0.0731 9.27 -1.26 23.3 Welch Two Sample t-test two.sided
9 2022-01-02 Con values1 -2 4.33 6.33 -1.75 0.0999 15.4 -4.43 0.429 Welch Two Sample t-test two.sided
10 2022-01-02 Con values2 11 11.3 0.333 5.64 0.0000761 13.2 6.79 15.2 Welch Two Sample t-test two.sided
11 2022-01-02 Gly values1 -2.33 3 5.33 -4.43 0.000594 13.8 -3.47 -1.20 Welch Two Sample t-test two.sided
12 2022-01-02 Gly values2 1 6 5 0.267 0.793 14.5 -7.00 9.00 Welch Two Sample t-test two.sided
提前致謝!
uj5u.com熱心網友回復:
您需要在每個日期/測定/時間組中添加一個 run_id,以便您可以匹配將其用作連接標準以避免重復。
有線索,當你說
我試圖只獲得 36 行:2 個日期、3 個化驗、2 個測驗、每個測驗的 6 個值(3 個值乘 2 列)
您有一個帶有 2 個唯一日期的日期列、一個帶有 3 個唯一測定的測定列、一個帶有 2 個唯一測驗的測驗列……您還需要一個帶有 3 個唯一值的列,用于“2 列的 3 個值”。我會打電話給專欄run_id。
我還將跳過comp資料框,本質上是進行自聯接:
pivot2 = df %>%
group_by(Date, Assay, Timing) %>%
mutate(run_id = row_number()) %>%
ungroup() %>%
pivot_longer(starts_with("meas"), names_to = "test")
t_tests =
full_join(
filter(pivot2, Timing == "Start") %>% select(-Timing, initial = value),
filter(pivot2, Timing == "End") %>% select(-Timing, final = value),
by = c("Date", "Assay", "run_id", "test")
)
# # A tibble: 36 × 6
# Date Assay run_id test initial final
# <chr> <chr> <int> <chr> <dbl> <dbl>
# 1 2022-01-01 Gly 1 meas1 1 -1
# 2 2022-01-01 Gly 1 meas2 4 7
# 3 2022-01-01 Gly 2 meas1 0 1
# 4 2022-01-01 Gly 2 meas2 10 8
# 5 2022-01-01 Gly 3 meas1 8 5
# 6 2022-01-01 Gly 3 meas2 -16 4
# 7 2022-01-01 Asp 1 meas1 6 7
# 8 2022-01-01 Asp 1 meas2 28 -5
# 9 2022-01-01 Asp 2 meas1 4 6
# 10 2022-01-01 Asp 2 meas2 9 9
# # … with 26 more rows
# # ? Use `print(n = ...)` to see more rows
我使用 afull_join這樣即使一個資料/分析/計時組合具有不同的運行次數,仍然會包含所有內容。
轉載請註明出處,本文鏈接:https://www.uj5u.com/gongcheng/513417.html
標籤:rdplyr统计数据
