我有一個包含用戶資料的資料框:
age = c(45, 21, 32, 33, 46)
gender = c('female', 'female', 'male', 'male', 'female')
income = c('low', 'low', 'medium', 'high', 'low')
education = c('high', 'high', 'high', 'medium', 'medium')
df = data.frame(age, gender ,income, education)
由此,我想獲得一個清晰的串列,其中包含每個屬性的計數和份額,然后我將其附加到表/csv 中,該串列應該更清晰以供進一步使用,而不是一個功能正常的資料框。對于一個類似這樣的屬性:
nusers = nrow(users)
df = count(users, gender)
df['sot']=df['n']/totuser
write.table(df,'stat.csv',sep=';', row.names = FALSE, append = T)
多個屬性需要以下結果:
gender,n,sot
female,10,0.526315789
male,9,0.473684211
income,Freq,sot
low,4,0.210526316
medium,10,0.526315789
high,5,0.263157895
education,Freq,sot
low,8,0.421052632
medium,1,0.052631579
high,10,0.526315789
我(不是很精通)嘗試將其放入回圈中失敗了。我會如何最好地解決這個問題?
uj5u.com熱心網友回復:
您可以sink()為此使用:
library(dplyr)
n_gen <- df %>% group_by(gender) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_inc <- df %>% group_by(income) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_edu <- df %>% group_by(education) %>% summarise(Feq = n(), sot = n()/nrow(df))
sink('export.csv')
write.csv(n_gen, row.names = F)
write.csv(n_inc, row.names = F)
write.csv(n_edu, row.names = F)
sink()
您可以縮短它并將其寫在 for 回圈中。取決于您有多少列(在 df 中)可能是首選的
uj5u.com熱心網友回復:
您應該使用 'count_()' 而不是 'count()' 它是相同的函式,但它在 'var' 中使用變數而不是字串。
library(dplyr)
for (i in class) {
df = count_(users, i)
write.csv(df, row.names = T, file = paste0('Title_',i,'.txt'))
}
uj5u.com熱心網友回復:
這是該dplyr軟體包的解決方案。
實際代碼理論上可以限制在一行
library(dplyr)
# ...
for(nom in names(df)) write.table(df %>% count(!!sym(nom)) %>% mutate(sot = n/sum(n)), 'stat.csv', sep = ';', row.names = FALSE, append = TRUE)
產生輸出檔案 stat.csv
"age";"n";"sot"
21;1;0.2
32;1;0.2
33;1;0.2
45;1;0.2
46;1;0.2
"gender";"n";"sot"
"female";3;0.6
"male";2;0.4
"income";"n";"sot"
"high";1;0.2
"low";3;0.6
"medium";1;0.2
"education";"n";"sot"
"high";3;0.6
"medium";2;0.4
但為了清楚起見,我選擇打破作業流程,并附上注釋:
library(dplyr)
# ...
# Code to generate `df`
# ...
# Create list to accumulate the summaries
results <- list()
# For each variable (by name) in `df`...
for(nom in names(df)) {
# ...append to the list the results of summarizing by that variable.
results <- c(
results,
# Wrap summary in a `list` to append properly:
list(
df %>%
# Interpret the variable name as the variable itself, within the context
# of `df`; and count the occurrences of each of the values that variable
# takes on within `df`.
count(!!sym(nom)) %>%
# Sum up the counts to reconstruct the total amount; then divide the
# count `n` by that total, to obtain `sot`.
mutate(sot = n/sum(n))
) %>%
# Name that summary after the variable.
setNames(nm = nom)
)
}
# View results
results
鑒于您df在此處復制的樣本
structure(
list(
age = c(45 , 21 , 32 , 33 , 46 ),
gender = c("female", "female", "male" , "male" , "female"),
income = c("low" , "low" , "medium", "high" , "low" ),
education = c("high" , "high" , "high" , "medium", "medium")
),
class = "data.frame",
row.names = c(NA, -5L)
)
此作業流程應產生以下list內容results:
$age
age n sot
1 21 1 0.2
2 32 1 0.2
3 33 1 0.2
4 45 1 0.2
5 46 1 0.2
$gender
gender n sot
1 female 3 0.6
2 male 2 0.4
$income
income n sot
1 high 1 0.2
2 low 3 0.6
3 medium 1 0.2
$education
education n sot
1 high 3 0.6
2 medium 2 0.4
My solution covers every variable in df, but feel free to exclude variables like age by modifying the for-loop.
To write all this as the file stat.csv, delimited by ; as in your code, simply finish with:
for(summr in results) {
write.table(
x = summr,
file = 'stat.csv',
sep = ';',
row.names = FALSE,
append = TRUE
)
}
轉載請註明出處,本文鏈接:https://www.uj5u.com/net/312506.html
上一篇:計算2個資料幀中變數之間的相關性
下一篇:R如何使用斷點指定自定義顏色漸變
