Thursday 15 February 2018

Regression lines in ggplot

Based on a question at today's code-club, here's how you plot a scatter plot with an overlain regression line in ggplot2:

Set up some data and import the necessaries:

library(tibble)
library(ggplot2)

set.seed(1)

x <- rnorm(30)
y <- rnorm(30, x)
my_df <- data_frame(x = x, y = y)

Here, x and y are sampled from a normal distribution and the expected slope of the regression line of y on x is 1.

A scatter plot of y against x is as follows:

gg <- ggplot(my_df, aes(x = x, y = y)) +
  geom_point()

gg




If you're happy for R to compute the appropriate regression line for you, you can use the following to overlay the line:

gg + geom_smooth(method = "lm", se = FALSE)




Otherwise, you can compute the parameters of any line that you want to include, and then overlay them using geom_abline.

To do this, we first calculate the regression coefficients for y against x

fit <- lm(y ~ x)
coef(fit)
# (Intercept)         x
#    0.129...   1.04...

.. and then overlay the associated linegraph (we've used col = #3366FF to match the defaults for GeomSmooth)

gg +
  geom_abline(
    intercept = coef(fit)["(Intercept)"],
    slope = coef(fit)["x"],
    col = "#3366FF"
    )



You can pick your own slopes / intercepts.

No comments:

Post a Comment