我正在制作一個函式Prop.Histogram(),將資料繪制為直方圖,顯示添加了正態分布曲線的比例。曲線的添加對我來說很難實作,但我成功了(見下面的代碼)!
注意:我個人更喜歡%>%在我的代碼中使用 magrittr 包中的管道運算子。雖然可能不是每個人都熟悉這個運算子和/或這個包(或者他們不喜歡使用它),但我也會在下面不使用 magrittr 的情況下提供相同的代碼。
使用 magrittr 的代碼
Prop.Histogram <- function(data,
xlim_min, xlim_max, x_BreakSize,
ylim_max, y_steps) {
# Load packages
library(magrittr)
# Make histogram of data without y-axis
hist(data, freq = FALSE, ylab = "Proportion",
xlim = c(xlim_min, xlim_max), breaks = seq(from = xlim_min, to = xlim_max, by = x_BreakSize),
ylim = c(0, ylim_max %>% divide_by(., x_BreakSize)), yaxt = "n")
# I divided ylim_max by x_BreakSize, as I want ylim_max to be equal to the max proportion shown on the y_axis (and not to the max density)
# Add y-axis that shows proportion and not density
axis(side = 2,
at = seq(from = 0, to = ylim_max %>% divide_by(., x_BreakSize), by = y_steps %>% divide_by(., x_BreakSize)),
labels = seq(from = 0, to = ylim_max, by = y_steps))
box()
# Add curve to histogram
curve(dnorm(x, mean = mean(data), sd = sd(data)), lwd = 5, add = TRUE, yaxt = "n")
}
不使用 magrittr 的相同代碼
Prop.Histogram <- function(data,
xlim_min, xlim_max, x_BreakSize,
ylim_max, y_steps) {
# Load packages
library(magrittr)
# Make histogram of data without y-axis
hist(data, freq = FALSE, ylab = "Proportion",
xlim = c(xlim_min, xlim_max), breaks = seq(from = xlim_min, to = xlim_max, by = x_BreakSize),
ylim = c(0, ylim_max/x_BreakSize), yaxt = "n")
# I divided ylim_max by x_BreakSize, as I want ylim_max to be equal to the max proportion shown on the y_axis (and not to the max density)
# Add y-axis that shows proportion and not density
axis(side = 2,
at = seq(from = 0, to = ylim_max/x_BreakSize, by = y_steps/x_BreakSize),
labels = seq(from = 0, to = ylim_max, by = y_steps))
box()
# Add curve to histogram
curve(dnorm(x, mean = mean(data), sd = sd(data)), lwd = 5, add = TRUE, yaxt = "n")
}
這段代碼完全符合我的要求:它繪制比例并在圖中添加一條正態分布曲線。不過,我確實很難理解為什么添加曲線實際上有效。
Main question (1): I have to put x as the first argument in dnorm(), and even though I have not defined x, it works! So my first and main question is: what is x, what does it do, and why does it work in my function?
Second question (2): My second question is whether it is possible (and, if so, how) to use magrittr pipe-operators (%>%) in the line of code that adds the curve to the plot. (Even if using operators is not the best way to do so in this case, I am still interested in the answer as I am eager to learn!)
First of all, for those who want to try out my code: here is some data that is representative of data that I want to plot:
data <- rnorm(724, mean = 84, sd = 33)
Prop.Histogram(data,
xlim_min = -50, xlim_max = 200, x_BreakSize = 10,
ylim_max = 0.15, y_step = 0.05)
Main question (1): role of x in dnorm()/curve()
I started by using data instead of x as the first argument of dnorm(), but this didn't work as it resulted in the following error message:
Error in curve(dnorm(data, mean = mean(data), sd = sd(data)), lwd = 5, :
'expr' must be a function, or a call or an expression containing 'x'
But then, when I take dnorm(data, mean = mean(data), sd = sd(data)) and run it individually (not as an argument of curve(), it gives me 724 values (of which I don't know what they meaning, but at least it's not an error message). Which is weird, since using data as the first argument when dnorm() is part of curve in my formula results in an error message as we saw previously.
Then, when I change data for x and run dnorm(x, mean = mean(data), sd = sd(data)) (again not as an argument of curve()), it gives me another error message:
Error in dnorm(x, mean = mean(data), sd = sd(data)) :
object 'x' not found
This I can understand, as I've not defined x anywhere in my code. But that rises the question: why do I not get this same error message when I run my (working) function.
In short, I observed that x must be the first argument in dnorm() when dnorm() is used as an argument in curve(), but x cannot be used as the first argument when dnorm() is used individually. Conclusion: I am lost.
Of course, when I am lost in R, I always look at the help page of R. The help page of dnorm() states that x is a vector of quantiles... that's it. I know those words individually, but have no idea what it means in my code (as I've not defined x, so what vector or what quantiles is the R help page talking about?).
第二題(2):代碼中magrittr的使用
我嘗試curve(dnorm(x, mean = mean(data), sd = sd(data)), lwd = 5, add = TRUE, yaxt = "n")使用 magrittr 撰寫代碼,但它不起作用。以下是我嘗試過的一些示例:
data %>% dnorm(x, mean = mean(.), sd = sd(.)) %>% curve(., lwd = 5, add = TRUE, yaxt = "n")
data %>% dnorm(x, mean = mean(.), sd = sd(.)) %>% curve(lwd = 5, add = TRUE, yaxt = "n")
dnorm(x, mean = mean(data), sd = sd(data)) %>% curve(., lwd = 5, add = TRUE, yaxt = "n")
它們都導致相同的錯誤訊息:
Error in dnorm(x, mean = mean(data), sd = sd(data)) :
object 'x' not found
我想知道%>%在這種情況下是否可以使用 magrittr 運算子(即使它不是最佳選擇)。
PS。這是我第一次發帖,如有需要,請隨時提供反饋或向我詢問更多資訊。先感謝您!
uj5u.com熱心網友回復:
該curve()函式使用非標準評估。x只是它將繪制的運算式中的占位符。詳情請參閱?curve。
事實上,x不需要是第一個引數,它可以出現在運算式中的任何位置。但是你會希望它附加到 的第一個引數dnorm,所以把它放在第一位效果很好。如果您想查看sd引數對 0 處密度的影響,可以使用
curve(dnorm(0, sd = x))
當你把它放在第一位時,正在尋找的虛擬x物件curve()將系結到 的第一個引數dnorm(),它恰好也被命名為x,正如你在幫助頁面上看到的那樣。它是您要計算密度的位置。
當您打電話時,您要求它計算中每個位置的具有均值和標準差dnorm(data, mean = mean(data), sd = sd(data))的正態分布的密度。這就是為什么你得到一個很長的向量回應。mean(data)sd(data)data
對于第二個問題: magrittr將管道左側的結果傳遞給右側的函式呼叫。這些結果的出現位置有一些復雜的規則:
如果您不在
.函式呼叫中使用,則將該值用作第一個引數。如果您確實使用
.,則該論點會出現在那里,但也可能首先出現。我忘記了確切的規則;詳情見?pipe。
所以要得到你想要的,你可以這樣做:
data %>% {curve(dnorm(x, mean = mean(.), sd = sd(.), lwd = 5, add = TRUE, yaxt = "n")}
我不得不使用大括號來正確magrittr處理.。
轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/450552.html
