我有一個資料集和任務:“收入前 10 名的人持有的主要信用卡的平均數量”。
dput(head(creditcard))
structure(list(card = structure(c(2L, 2L, 2L, 2L, 2L, 2L), levels = c("no","yes"), class = "factor"), reports = c(0L, 0L, 0L, 0L, 0L, 0L), age = c(37.66667, 33.25, 33.66667, 30.5, 32.16667, 23.25), income = c(4.52, 2.42, 4.5, 2.54, 9.7867, 2.5), share = c(0.03326991, 0.005216942, 0.004155556, 0.06521378, 0.06705059, 0.0444384), expenditure = c(124.9833, 9.854167, 15, 137.8692, 546.5033, 91.99667), owner = structure(c(2L, 1L, 2L, 1L, 2L, 1L), levels = c("no", "yes"), class = "factor"), selfemp = structure(c(1L, 1L, 1L, 1L, 1L, 1L), levels = c("no", "yes"), class = "factor"),
dependents = c(3L, 3L, 4L, 0L, 2L, 0L), days = c(54L, 34L,58L, 25L, 64L, 54L), majorcards = c(1L, 1L, 1L, 1L, 1L, 1L), active = c(12L, 13L, 5L, 7L, 5L, 1L), income_fam = c(1.13, 0.605, 0.9, 2.54, 3.26223333333333, 2.5)), row.names = c("1","2", "3", "4", "5", "6"), class = "data.frame")
我試著做這樣的任務
round(mean(creditcard[order(creditcard$income, decreasing = TRUE),]$majorcards[1:10]))
但我的解決方案結果并不理想,我不明白如何糾正它
uj5u.com熱心網友回復:
您可以使用 獲得收入最高的 10 個觀測值slice_max
,然后創建一個平均值為 的新資料集majorcards
library(dplyr)
creditcard %>%
slice_max(income, n = 10) %>%
summarise(mean(majorcards))
uj5u.com熱心網友回復:
如果您的資料集是每人一行,那么您可以這樣做:
library(dplyr)
creditcard %>%
arrange(desc(income)) %>%
slice_head(n=10) %>%
summarize(mean_cards = mean(majorcards,na.rm=T))
uj5u.com熱心網友回復:
也許是這樣的:
mean(creditcard$majorcards[which(creditcard$income%in%sort(creditcard$income, decreasing = TRUE)[1:10])])
uj5u.com熱心網友回復:
使用base R
with(creditcard, mean(head(majorcards[order(-income)], 10)))
或在data.table
library(data.table)
setDT(creditcard)[order(-income), mean(head(majorcards, 10))]
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/525503.html
標籤:r数据框意思是
上一篇:Rscale_fill_manual如何表示三種顏色
下一篇:根據另一列計算出列值作為百分比值