我有一個 ggplot 用于變數增長速率和任期之間的對數關系:
pdata %>%
ggplot(aes(x = log(TENURE), y = GROWTH_RATE))
geom_point(color = 'gray', alpha = 0.3)
geom_smooth(method = 'lm', formula = 'y ~ x')

但 geom_smooth 似乎更適合:
pdata %>%
ggplot(aes(x = log(TENURE), y = GROWTH_RATE))
geom_point(color = 'gray', alpha = 0.3)
geom_smooth(method = 'lm', formula = 'y ~ log(x)')

哪個情節是正確的?哪個圖顯示了基于帶有公式的線性模型的平滑擬合線y ~ log(TENURE)?
uj5u.com熱心網友回復:
看起來您的潛在增長率隨著任期對數的對數而變化。以下是具有“日志日志”關系的一些示例資料:
tibble(TENURE = runif(1E4, min = 7, max = 1000),
GROWTH_RATE = rnorm(1E4, mean = 1, sd = 0.1) * log(log(TENURE))) %>%
ggplot(aes(log(TENURE), GROWTH_RATE))
geom_point(alpha = 0.3, color = "gray50")
geom_smooth(method = 'lm', formula = 'y ~ x')
根據日志繪制增長會導致像第一個一樣松散。請注意,lm正在使用來自您x和y映射的轉換值,因此我們可以看到它正在使用log(TENURE)for x。(請參閱底部以確認這一點。)

但是針對任期日志的日志建模更合適。在這里,當我們使用 時y ~ log(x),這意味著y ~ log( [log(TENURE)] )因為 x 被全域映射ggplot(aes(...))到與 TENURE 的日志相關。
... geom_smooth(method = 'lm', formula = 'y ~ log(x)')

If instead the original relationship had been a good fit for y ~ log(x), like the different generated data here, your first lm would have matched better:
tibble(TENURE = runif(1E4, min = 7, max = 1000),
GROWTH_RATE = rnorm(1E4, mean = 1, sd = 0.1) * log(TENURE)) %>%
ggplot(aes(log(TENURE), GROWTH_RATE))
geom_point(alpha = 0.3, color = "gray50")
geom_smooth(method = 'lm', formula = 'y ~ x')

轉載請註明出處,本文鏈接:https://www.uj5u.com/qiye/336896.html
下一篇:新列的條件創建(變異)
