What is ANCOVA (3/3)? The common slope and adjusted mean
Previous post
□ What is ANCOVA (1/3)? The basic concept
□ What is ANCOVA (2/3)? How to interpret Parameter Estimates
In the previous posts, I explained the basic concept of ANCOVA and how to interpret statistical results. Now, I will discuss the most important concept that is not commonly mentioned.
The statistical program provided the following model in the previous posts.
Control: y= 9.53 + 0.0558 x
Fast: y= 6.39 + 0.0558 x
Slow: y= 13.10 + 0.0558 x
Then, if we apply the mean of height, 49.9 in the model, we can estimate yield.
Control: y= 9.53 + 0.0558 * 49.9 ≈ 12.31 Fast: y= 6.39 + 0.0558 * 49.9 ≈ 9.17 Slow: y= 13.10 + 0.0558 * 49.9 ≈ 15.89
However, there is a slight difference between the actual data I collected and the value provided by the statistical program. I will now explain why this difference occurs.
In the above data, the value inside the blue box would be changed depending on which statistical program we use or which variable is chosen as the reference. However, even though the values may differ, the sum of the values would remain the same. For example, in the dataset below, if we use either SPSS or JMP, the values of the intercepts would change, but the sum of the estimated yields for each fertilizer type would still be the same.
However, even if different statistical programs are used, one value remains unchanged, which is the common slope, which is 0.0558.
How was this slope calculated? This is the most important concept of ANCOVA, but it is not easy to find information about it. Let’s go back to the beginning. In the post, What is ANCOVA (1/3)? The basic concept.
I performed data partitioning to calculate the effect of fertilizer and residual on yield. Similarly, let’s perform data partitioning to calculate the effect of fertilizer and residual on crop height.
A researcher realized height should not be ignored after fitting between height and yield per fertilizer treatment,
and decided to include height as a covariate. In this case, it would need one slope (it is called common slope) instead of three slopes. How can we obtain one slope in this case? What if the three slopes have no intercept? If so, all lines go through “zero”, only one slope would be able to exist. The way all lines go through “zero” is to calculate residuals
We already calculated residuals of yield and height. Then, let’s put height as x-axis, and yield as y-axis to draw a line graph.
Then, let’s draw a graph.
Now, we can see the slope of the line is 0.0558. This value is the same as what statistical programs provided.
If the common slope was calculated, yield would be also adjusted according to the covariate.
That is, if height is a factor affecting yield, we need to adjust for such noise. This is ANCOVA.
The mean that is adjusted by the covariate is called the “adjusted mean.” The adjusted mean is calculated by the equation shown above. The symbol ˆβ indicates “the common slope” and (x̄i. – x̄..
) indicates the effect of the covariate.
The effect of covariate (x̄i. – x̄..)
was already calculated.
Control: -3.3 Fast: 4.3 Slow: -1.0
and β1 was 0.0558
Then, the adjusted mean of yield is
Control= 12.13 – 0.0558 * (-3.3) = 12.31 Fast= 9.41 – 0.0558 * (4.3) = 9.17 Slow= 15.83 – 0.0558 * (-1.0) = 15.89
Statistical programs provide ‘adjusted mean’ in ANCOVA. This is the reason for the difference between the actual values I collected and the values provided by statistical programs. This difference is caused by the effect of the covariate, which is represented by β1(x̄i. – x̄..)
. The yield was estimated using the adjusted means shown below.
Control: y= 9.53 + 0.0558 * 49.9 ≈ 12.31
Fast: y= 6.39 + 0.0558 * 49.9 ≈ 9.17
Slow: y= 13.10 + 0.0558 * 49.9 ≈ 15.89
Therefore, the model could be well estimated
Control: y= 9.53 + 0.0558 * x
Fast: y= 6.39 + 0.0558 * x
Slow: y= 13.10 + 0.0558 * x
Fertilizer [Slow] had the highest yield, and for every 1cm increase in crop height, the final yield would increase by a ratio of 0.0558.