What is ANCOVA (2/3)? How to interpret Parameter Estimates
Previous post
□ What is ANCOVA (1/3)? The basic concept
In previous post, I explained how to interpret ANCOVA table (red box in below tables). In this post, I’ll explain how to interpret Parameter Estimates (blue box in below table) in ANCOVA analysis.
Let’s check the ‘Parameter Estimates’ table. Most statistical programs set up one level of an experimental factor as “zero” and estimate the results based on that level. This concept is known as the Generalized Linear Model (GLM). If you look at the Parameter Estimates table, you will notice that there is no entry for ‘Fertilizer [Slow]’. This is because ‘Fertilizer Slow’ was set as the reference level, or zero, and the yields for the other fertilizer types were estimated relative to this level.
Control : 9.6717502 – 0.142494 + 0.0558x ≈ 9.53 + 0.058x Fast: 9.6717502 – 3.286649 + 0.0558x≈ 6.39 + 0.058x Slow: 9.6717502 + 0.0558x ≈ 9.67 + 0.058x
In the fertilizer [Control] group, the estimated model is y = 9.53 + 0.058x
. This means that for every 1cm increase in crop height, yield will increase by ‘height * 0.058’. So far, we have performed an Analysis of Variance (ANOVA), but now we can also have a linear regression model. When ANOVA and regression are combined, it is called a Generalized Linear Model (GLM).
Let’s take a look at the actual yield for each different type of fertilizer.
In my previous post, I calculated the above data using Excel. For fertilizer (control), the actual yield mean is 12.13, for fertilizer (slow), the actual yield mean is 15.83, for fertilizer (fast), the actual yield mean is 9.41, and the mean of crop height is 49.9.
However, we can also estimate yield by
Control : 9.6717502–0.142494+0.0558x ≈ 9.53 + 0.058 * 49.9 ≈ 12.31 Fast: 9.6717502–3.286649+0.0558x ≈ 6.39 + 0.058 * 49.9 ≈ 9.17 Slow: 9.6717502+0.0558x ≈ 9.67 + 0.0558 * 49.9 ≈ 12.45
Let’s cross check with JMP.
“The values for ‘control’ and ‘fast’ are the same, but the value for ‘slow’ in JMP is different from my calculation. While Fertilizer [Slow] is set up as 0 in the Parameter Estimates, the actual value for ‘slow’ is different. I was curious as to why and asked JMP for an explanation. As it turns out, when estimating yield in ANCOVA, we should choose ‘Expanded Estimates’ rather than ‘Parameter Estimates’.”
When selecting ‘Expanded Estimates’ in JMP, we obtain a table that provides additional information about the estimated values for each factor level. This table includes the estimated mean for each level of the factor, as well as the estimated effect of each covariate (in this case, crop height) on the response variable (yield). By comparing these estimates to the actual mean values for each level of the factor, we can evaluate how well our model fits the data.
Now, let’s estimate yield again in the same way we used before.
Control : 9.6717502–0.142494+0.0558x ≈ 9.53 + 0.0558 * 49.9 ≈ 12.31 Fast: 9.6717502–3.286649+0.0558x ≈ 6.39 + 0.0558 * 49.9 ≈ 9.17 Slow: 9.6717502+3.4291433+0.0558x ≈ 13.10 + 0.0558 * 49.9 ≈ 15.89
Those values match the results from JMP. The estimated yield for control is 12.31, for fast fertilizer it’s 9.17, and for slow fertilizer it’s 15.89.
To view the prediction expression in JMP, we can go to ‘Estimates’ and click on ‘Show Prediction Expression’.
Then, we can obtain the below equation.
This equation is the same as what I calculated to estimate yield according to different fertilizers. In Fertilizer [Slow], the greatest yield would be shown, and when crop height increases per 1cm, the yield will be increased 0.0558 times. Each statistical programs will calculate different intercepts according to its protocol, but the final estimation would be the same. For example, the below equation is from SPSS in which Fertilizer [Fast] was set up as 0. Therefore, intercepts are different from JMP, but the final estimation is the same. It’s also 12.31, 15.89 and 9.17.
Control: 6.385 + 3.144 + 0.056 * 49.9 ≈ 12.31
Fast: 6.385 + 0.056 * 49.9 ≈ 9.17
Slow: 6.385 + 6.716 + 0.056 * 49.9 ≈ 15.89
That is, the intercept itself does not have any meaning. Depending on which variable is set up as 0, the value would change. The important thing is to interpret the pattern. In this case, we can estimate the yield as follows:
Control: y= 9.53 + 0.0558 x
Fast: y= 6.39 + 0.0558 x
Slow: y= 13.10 + 0.0558 x
As the slope is the same as 0.0558, Fertilizer [Slow] with the greatest intercept will show the greatest yield compared with the other two fertilizers.
If you well followed until here, one question remains.
Why the yield values are different from the actual observation and estimation in statistical programs? To answer this, we need to understand slope of covariate and adjusted mean. I’ll explain about this in the next post.