What is ANCOVA (1/3)? The basic concept

What is ANCOVA (1/3)? The basic concept


Today, I will explain Analysis of Covariance (ANCOVA). ANCOVA is a statistical technique that involves including covariates, which are additional variables that may impact the dependent variable (y) in addition to the independent variable (x).


I have a dataset as shown below, and I would like to analyze crop yield based on different fertilizer types (Control, Slow-release, and Fast-release). The experimental design is a Completely Randomized Design (CRD) with 10 replicates.

RepFertilizerYieldRepFertilizerYieldRepFertilizerYield
1Control12.21Slow16.61Fast9.5
2Control12.42Slow15.82Fast9.5
3Control11.93Slow16.53Fast9.6
4Control11.34Slow15.04Fast8.8
5Control11.85Slow15.45Fast9.5
6Control12.16Slow15.66Fast9.8
7Control13.17Slow15.87Fast9.1
8Control12.78Slow15.88Fast10.3
9Control12.49Slow16.09Fast9.5
10Control11.410Slow15.810Fast8.5

Since we only have one experimental factor (fertilizer), One-Way ANOVA is an appropriate method to analyze the results. Let’s calculate the results manually for One-Way ANOVA.


After calculating like above, we can obtain sum of square (SS). Then we can verify SSTotal =SSTreatment + SSError

After performing the calculations as described above, we can obtain the sum of squares (SS). We can also verify that SSTotal =SSTreatment + SSError

SSTotal = SSTreatment + SSError

214.79 = 207.68 + 7.11


Let’s create an ANOVA table. The typical format for a One-Way ANOVA table is as follows. Since we used a Completely Randomized Design (CRD) without blocks, we can omit the column for “blocks.”

The total number of observations is denoted by N, and the number of levels in the experimental factor is denoted by k. In this case, N is equal to 30 and k is equal to 3. Therefore, the degrees of freedom for Treatment is 3-1 = 2, and the degrees of freedom for Error is 30-3 = 27. The degrees of freedom for Total is 30-1 = 29. The degrees of freedom also follow the same equation: SSTotal = SSTreatment + SSError.

Since we have already calculated the sum of squares (SS), we can simply input each value in the appropriate place in the ANOVA table.

To calculate the mean sum of squares (MSS), we divide the sum of squares (SS) by the corresponding degrees of freedom (df). This is why it is called the mean sum of squares. In this case, the MSS for fertilizer is 207.68 / 2 = 103.84, and the MSS for error is 7.11 / 27 = 0.26. Next, we can calculate the F-ratio, which is the ratio of MSS Treatment to MSS Error. The F-ratio is calculated as 103.84 / 0.26 = 394.33.

We then need to check the p-value. To do this, we can check the area under the F-distribution curve with respect to the F-value of 394.33, given the degrees of freedom for Treatment is 2 and the degrees of freedom for Error is 27. The area is almost 0.

Therefore, the p-value is almost 0, indicating a highly significant result (p-value<.001). This means that there are significant differences in crop yield among the fertilizer types. To cross-check our result, let’s use a statistical program. In this case, I will use R.

Fertilizer= rep(c("Conttol","Slow","Fast"), each=10)
Yield= c(12.2,12.4,11.9,11.3,11.8,12.1,13.1,12.7,12.4,11.4,16.6,15.8,16.5,15.0,15.4,15.6,15.8,15.8,16.0,15.8,9.5,9.5,9.6,8.8,9.5,9.8,9.1,10.3,9.5,8.5)
DataA= data.frame(Fertilizer,Yield)
summary(aov(Yield~Fertilizer, data=DataA))

The result is the same.



However!!!

A new problem has arisen for the researcher in this field. It has been discovered that the crops have different heights, and this difference seems to be quite significant. This raises the question:

“Does the difference in crop yield between the different fertilizer types truly come from the effect of the fertilizer, or is it simply due to the height differences among the crops?”

Although the experiments were designed based on the fertilizer factor alone, the presence of this additional noise (height differences) has become a concern for the researcher. While the researcher could choose to ignore the crop height, the clear differences in height among the crops are proving to be a source of annoyance. To address this issue, the researcher has decided to accept crop height as a covariate and has measured the height of the crops before harvesting.

In fact, performing ANCOVA using statistical programs is a relatively simple process – we can simply add “Height” as a covariate to the model, and the program will calculate the result automatically. However, it is crucial to have the ability to interpret the data beyond just using programs.

I’ll use JMP software to analyze ANCOVA.

To analyze the effect of crop fertilizers on yield while accounting for the effect of crop height, I included “Yield” as the dependent variable and “Fertilizer” and “Height” as independent variables in the ANCOVA model.



Let’s be careful!!

Since the covariate should be numeric, while the experimental factors are categorical, including both types of variables in the model is recognized by statistical programs as ANCOVA.

Many people mistakenly categorize covariates as categorical variables when they input data using units such as A, B, C, etc. This can lead to Two-Way ANOVA instead of ANCOVA. Similarly, Similarly, if block data is inputted as numeric values (1, 2, 3, etc.), it can be mistakenly recognized as a covariate. Therefore, it is important to input block data as categorical variables and covariate data in a numeric format.



Let’s review the results obtained from JMP. In ANCOVA, height (covariate) is highly significant (p<.001), indicating that crop yield is affected by crop height. However, even when height is included as a covariate, fertilizer remains highly significant (p<.001). This suggests that there are differences in crop yield among fertilizer types beyond those attributable to crop height.

Some might ask why it is necessary to perform ANCOVA when fertilizer is already highly significant in ANOVA (p<.001). Researchers are generally aware of the factors that may affect their experiments, and they should consider analyzing ANCOVA if they suspect some sources of noise.

Let’s compare the F-ratio in fertilizer between ANOVA and ANCOVA. The F-ratio increases significantly in ANCOVA compared to ANOVA. Recall that the F-ratio is the ratio of MSS treatment to MSS error, and a higher F-ratio leads to greater significance (lower p-value).

Until here, it’s a very basic concept of ANCOVA. In the next post, I’ll discuss how to analyze additional statistical data tables provided by programs for ANCOVA.



Following post
What is ANCOVA (2/3)? How to interpret Parameter Estimates


Extra tip! Here’s how to analyze Covariance (ANCOVA) using SAS Studio.


Extra tip! Here’s how to analyze Covariance (ANCOVA) using R Studio.

library (readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/Fertilizer%20(One%20Way%20ANOVA).csv"
dataA=data.frame(read_csv(url(github),show_col_types=FALSE))
dataA

library(car)
ancova_model=aov(Yield ~ Fertilizer + Height, data=dataA)
Anova(ancova_model, type="III")

Leave a Reply

If you include a website address in the comment section, I cannot see your comment as it will be automatically deleted and will not be posted. Please refrain from including website addresses.