抱歉,如果以前有人問過這個問題,找不到論壇,因為我什至不知道如何查找。但這是我的問題,我在 R 中有這個資料框:
Area Item Year Unit Value
<chr> <chr> <chr> <chr> <chr>
1 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2000-2002 % 87
2 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2001-2003 % 88
3 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2002-2004 % 91
4 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2003-2005 % 92
5 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2004-2006 % 92
6 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2005-2007 % 94
7 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2006-2008 % 95
8 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2007-2009 % 97
9 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2008-2010 % 100
10 Afghanistan Average dietary energy supply adequacy (percent) (3-year average) 2009-2011 % 102
資料框中有51個專案,但有些地區和有些年份沒有某些專案。我希望得到如下結果,以便能夠使用相關矩陣、熱圖、資料可視化等,但我不知道如何:
Area Year Item1 Item2 ... Item52
Afghanistan 2000-2002 87 NA ... NA
Afghanistan 2001-2002 NA* 88 ... NA
* 阿富汗 2001-2002 年可能有 Item1 值,但對于示例,我沒有放它。
其中 Item-i 是 51 個不同專案的名稱,資料框填充了 NA,其中該專案在該區域和年份的值未測量。
謝謝!
uj5u.com熱心網友回復:
鑒于您的解釋,我假設資料已排序,即 51 個元素按順序可能缺失為 NA。
df<-data.frame(Area=c(rep("Afghanistan", 51*12),
rep("Pakistan", 51*12)),
Item=paste("Average dieatary item", rep(1:51, each=12)),
Year = rep(paste(2000:2011, 2002:2013, sep="-"), 51),
Value = c(87,88,91,92,92,94,95,97,100,102,200,300, sample(100, 51*2*12-12,T)))
result <-do.call(rbind, by(df, list(df$Year, df$Area), function(x) {
data <- data.frame(Area = unique(x$Area), Year = unique(x$Year), t(x$Value))
colnames(data)[3:53] = paste("Item",1:51)
data
}))
print(head(result[c(1,2,3:5,50:51)]))
#> Area Year Item 1 Item 2 Item 3 Item 48 Item 49
#> 1 Afghanistan 2000-2002 87 91 50 52 10
#> 2 Afghanistan 2001-2003 88 20 91 46 67
#> 3 Afghanistan 2002-2004 91 30 15 88 83
#> 4 Afghanistan 2003-2005 92 74 21 29 17
#> 5 Afghanistan 2004-2006 92 87 65 71 66
#> 6 Afghanistan 2005-2007 94 58 41 46 49
dplyr/tidyr 的其他方式
library(dplyr)
library(tidyr)
result2 <- df %>% group_by(Year, Area) %>% mutate(id= 1:n()) %>%
select(Area, id, Year, Value) %>%
pivot_wider(c(Area, Year), names_from = id, names_prefix = "Item", values_from = Value)
print(head(result2[c(1,2,3:5,50:51)]))
#> # A tibble: 6 × 7
#> # Groups: Year, Area [6]
#> Area Year Item1 Item2 Item3 Item48 Item49
#> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Afghanistan 2000-2002 87 91 50 52 10
#> 2 Afghanistan 2001-2003 88 20 91 46 67
#> 3 Afghanistan 2002-2004 91 30 15 88 83
#> 4 Afghanistan 2003-2005 92 74 21 29 17
#> 5 Afghanistan 2004-2006 92 87 65 71 66
#> 6 Afghanistan 2005-2007 94 58 41 46 49
使用reprex v2.0.2創建于 2022-11-07
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/529064.html
標籤:r数据库数据框数据清理
下一篇:FIPS平均多列
