R按組/回圈功能計數并輸出到csv-有解無憂

我有一個包含用戶資料的資料框：

age = c(45, 21, 32, 33, 46)
gender = c('female', 'female', 'male', 'male', 'female')
income = c('low', 'low', 'medium', 'high', 'low')
education = c('high', 'high', 'high', 'medium', 'medium')

df = data.frame(age, gender ,income, education)

由此，我想獲得一個清晰的串列，其中包含每個屬性的計數和份額，然后我將其附加到表/csv 中，該串列應該更清晰以供進一步使用，而不是一個功能正常的資料框。對于一個類似這樣的屬性：

nusers = nrow(users)
df = count(users, gender)
df['sot']=df['n']/totuser
write.table(df,'stat.csv',sep=';', row.names = FALSE, append = T)

多個屬性需要以下結果：

gender,n,sot
female,10,0.526315789
male,9,0.473684211
income,Freq,sot
low,4,0.210526316
medium,10,0.526315789
high,5,0.263157895
education,Freq,sot
low,8,0.421052632
medium,1,0.052631579
high,10,0.526315789

我（不是很精通）嘗試將其放入回圈中失敗了。我會如何最好地解決這個問題？

uj5u.com熱心網友回復：

您可以sink()為此使用：

library(dplyr)
n_gen <- df %>% group_by(gender) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_inc <- df %>% group_by(income) %>% summarise(Feq = n(), sot = n()/nrow(df))
n_edu <- df %>% group_by(education) %>% summarise(Feq = n(), sot = n()/nrow(df))

sink('export.csv')

write.csv(n_gen, row.names = F)
write.csv(n_inc, row.names = F)
write.csv(n_edu, row.names = F)

sink()

您可以縮短它并將其寫在 for 回圈中。取決于您有多少列（在 df 中）可能是首選的

uj5u.com熱心網友回復：

您應該使用 'count_()' 而不是 'count()' 它是相同的函式，但它在 'var' 中使用變數而不是字串。

library(dplyr)

for (i in class) {
   df = count_(users, i)
   write.csv(df, row.names = T, file = paste0('Title_',i,'.txt'))
}

uj5u.com熱心網友回復：

這是該dplyr軟體包的解決方案。

實際代碼理論上可以限制在一行

library(dplyr)

# ...

for(nom in names(df)) write.table(df %>% count(!!sym(nom)) %>% mutate(sot = n/sum(n)), 'stat.csv', sep = ';', row.names = FALSE, append = TRUE)

產生輸出檔案 stat.csv

"age";"n";"sot"
21;1;0.2
32;1;0.2
33;1;0.2
45;1;0.2
46;1;0.2
"gender";"n";"sot"
"female";3;0.6
"male";2;0.4
"income";"n";"sot"
"high";1;0.2
"low";3;0.6
"medium";1;0.2
"education";"n";"sot"
"high";3;0.6
"medium";2;0.4

但為了清楚起見，我選擇打破作業流程，并附上注釋：

library(dplyr)


# ...
# Code to generate `df`
# ...


# Create list to accumulate the summaries
results <- list()

# For each variable (by name) in `df`...
for(nom in names(df)) {
  # ...append to the list the results of summarizing by that variable.
  results <- c(
    results,
    # Wrap summary in a `list` to append properly:
    list(
      df %>%
        # Interpret the variable name as the variable itself, within the context
        # of `df`; and count the occurrences of each of the values that variable
        # takes on within `df`.
        count(!!sym(nom)) %>%
        # Sum up the counts to reconstruct the total amount; then divide the
        # count `n` by that total, to obtain `sot`.
        mutate(sot = n/sum(n))
    ) %>%
      # Name that summary after the variable.
      setNames(nm = nom)
  )
}


# View results
results

鑒于您df在此處復制的樣本

structure(
  list(
    age       = c(45      , 21      , 32      , 33      , 46      ),
    gender    = c("female", "female", "male"  , "male"  , "female"),
    income    = c("low"   , "low"   , "medium", "high"  , "low"   ),
    education = c("high"  , "high"  , "high"  , "medium", "medium")
  ),
  class = "data.frame",
  row.names = c(NA, -5L)
)

此作業流程應產生以下list內容results：

$age
  age n sot
1  21 1 0.2
2  32 1 0.2
3  33 1 0.2
4  45 1 0.2
5  46 1 0.2

$gender
  gender n sot
1 female 3 0.6
2   male 2 0.4

$income
  income n sot
1   high 1 0.2
2    low 3 0.6
3 medium 1 0.2

$education
  education n sot
1      high 3 0.6
2    medium 2 0.4

My solution covers every variable in df, but feel free to exclude variables like age by modifying the for-loop.

To write all this as the file stat.csv, delimited by ; as in your code, simply finish with:

for(summr in results) {
  write.table(
    x = summr, 
    file = 'stat.csv',
    sep = ';',
    row.names = FALSE,
    append = TRUE
  )
}

轉載請註明出處，本文鏈接：https://www.uj5u.com/net/312506.html

標籤：r 循环数数

上一篇：計算2個資料幀中變數之間的相關性

下一篇：R如何使用斷點指定自定義顏色漸變