Simple linear regression (2/5)- slope and intercept of linear regression model
□ Simple linear regression (1/5)- correlation and covariance
In my previous post, I explained about correlation and covariance. Now, I’ll explain about slope (β1) and intercept (β0) of linear regression model. In the whole picture to explain a linear regression model, β1 is calculated as β1 = r * Sy / Sx
We already know how to calculate correlation (r), and only we need to calculate the ratio between standard deviation of x and y. Let’s go back to the data. According to amount of nitrogen fertilizer, I investigate how yield is changed.
This is simple linear regression model.
β0 is intercept when x is 0 β1 is slope ε is error
Why correlation is relevant to slope?
First, let’s think about that!!
If x and y is the same, the scatter graph would be like this.
Now, I’d like to draw lines about mean
and
of x and y. I did draw lines like below.mean ± standard deviation
Did you find an interesting thing?
The trend line goes through between the mean of x and y (30,30), and the line also goes though the between x̄ ± Stdev
and ȳ ± Stdev
(green arrow). In this case, the slope will be 1 (as correlation is also 1).
β1 = r * Sy/Sx = 1 * 15.8/15.8 = 1
However, if x and y is the same, we don’t need to analyze regression model. Regression model is to predict the relationship between two variables.
Here is another case.
Now, x and y is not the same. As the same way, let’s draw lines. The trend line goes through between the mean of x and y (30, 18), but it does not go though the between x̄ ± Stdev
and ȳ ± Stdev
(yellow arrow).
When x and y is the same (correlation, r = 1), the trend line goes through (x̄ - Sx
, ȳ - Sy
) and (x̄ + Sx
, ȳ + Sy
). However, when y is different from x, the trend line will go through (x̄ - Sx
, ȳ - r*Sy
) and (x̄ + Sx
, ȳ + r*Sy
).
That’s why correlation is relevant to slope of linear regression.
β1 = r*Sy/Sx
In this case, slope will be β1 = 0.845 * 18.5/15.8 = 0.99
Then, we can also calculate intercept, β0
This equation would be modified below because we know the mean of x and y.
Therefore, β0 = 18.0 – 0.99 * 30.0 = – 11.7
So, the model equation will be y= -11.7 + 0.99x
Let’s verify our calculation is correct.
x<- c(10,20,30,40,50)
y<- c(5,9,8,18,50)
dataA<- data.frame(x,y)
summary(lm(y~x))
It’s the same. Now, we fully understand correlation and slope and intercept in simple linear regression.
Let’s go back to our data!!
We can calculate β0 and β1.
β1 = 0.985 * 24.1/15.8 = 1.50
In previous post, I already explained how to calculate correlation. In this data, correlation was 0.985.
β0 = 134.0 - 1.50 * 30.0 = 89.0
Therefore, the model equation would be y= 89.0 + 1.5x
Again!! let’s check in R
x<- c(10,20,30,40,50)
y<- c(100,120,140,150,160)
dataA<- data.frame(x,y)
summary(lm(y~x))
The next step is to understand standard error of slope and intercept. How is the below standard error calculated?
The answer will be in the next post!!