R-Squared in ANOVA: A Practical Approach to Calculation and Interpretation
Every time we discuss R2, we typically associate it with regression models. However, R2 also has a significant role in ANOVA. There seems to be less information available on how to calculate and interpret R2 in ANOVA, so today’s topic will focus on how to interpret this measure in the context of ANOVA.
Let’s consider an example dataset. Suppose we measured the final yield at varying nitrogen levels. We established three replicates as a block. Consequently, this model will be a one-way ANOVA with a block.
This is the statistical model of one-way ANOVA with block
yij = μ + τi+ βj + εij
Where
μ = grand mean of yield
yij = each yield at treatment (i=Nitrogen) and replicates (j=Block)
τi = the effect of nitrogen (i)
βj= the effect of block (j)
εij = residuals
Let’s partition the data according to this model. If you follow each step, it will be as simple as middle school mathematics.
[Step 1] Grand mean
First, we’ll calculate the total mean of all the data, which comprises 12 data points. The mean is 126.8.
[Step 2] Mean of treatment
Second, we’ll calculate the mean for each nitrogen treatment (N0, N1, N2, N3) as well as for each block (I, II, III).
[Step 3] The effect of treatment
Third, let’s calculate the difference between the ‘mean of each treatment’ and the ‘grand mean.’ For instance, the mean of N0 is 99.0 and the grand mean is 126.8. Therefore, the effect of N0 is -27.8 (calculated as 99.0 – 126.8). For Block I, its mean is 115.0 while the grand mean is 126.8. Therefore, the effect of Block I is -11.8 (calculated as 115.0 – 126.8).
[Step 4] Error
Fourth, we’ll calculate the errors, also known as residuals.
What is error in ANOVA? If different yield values are observed within the same genotype, the variance in these values is considered the error. These variations within the same genotype can be caused by certain unknown factors or errors. The error is calculated as:
Each value - mean of treatment - mean of block + grand mean
“For instance, if the grain yield at N0 in block I is 99, the residual can be calculated as follows: 99 (grain yield) – 99.0 (mean of N0) – 115.0 (mean of Block I) + 126.8 (grand mean) = 11.8.
[Step 5] Total difference
The total difference is calculated as the difference between each individual value and the grand mean (126.8). With that, we’ve successfully completed data partitioning!
Data partitioning
Let’s wrap up each step!!
1) Level of nitrogen treatment (N0, N1, N2, N3)
2) Level of block (I, II, III)
3) Each yield value
4) Grain mean of all yield value
5) Mean of each nitrogen
6) Mean of each block
7) Effect of nitrogen (difference between 5 and 4)
8) Effect of block (difference between 6 and 4)
9) Error (3 - 5 - 6 + 4)
10) Total difference (difference between 3 and 4)
So, how can we partition the yield value of y11 (where ‘i’ represents the first nitrogen level, N0, and ‘j’ represents the first block, Block I), which is 99? Let’s revisit the statistical model of a one-way ANOVA with a block for clarity.
yij = μ + τi+ βj + εij
Where
μ = grand mean of yield
yij = each yield at treatment (i=Nitrogen) and replicates (j=Block)
τi = the effect of nitrogen (i)
βj= the effect of block (j)
εij = residuals
According to this model, the value 99 can be partitioned as follows:
99 = 126.8 (grand mean) - 27.8 (effect of N0) - 11.8 (effect of Block I) + 11.8 (residual)
What about y43? (y43 refers to the yield value of 139 for N3 in Block III). Using our model, we can partition this value as follows:
139 = 126.8 (grand mean) + 11.3 (effect of N3) - 1.0 (effect of Block III) + 2.0 (residual)
Sum of square
Why do we suddenly need to calculate the sum of squares?
If we add up all the values of the treatment effect or block effect, the sum will be zero, as the sum of deviations is always zero. This situation prevents us from calculating the variance. To bypass this, we square each value in the treatment effect and block effect. Then, dividing the sum of these squared values by ‘n-1’ (degree of freedom) yields the variance. In Excel, the sum of squares can be easily calculated using the SUMSQ()
function.”
Sum of square (SS) of nitrogen is 3248.3
Sum of square (SS) of block is 1206.5
Sum of square (SS) of residual (error) is 339.5
Sum of square (SS) of total is 4794.3
The sum of squares can be partitioned as follows:
SSTotal = SSTreatment + SSBlock + SSError
In this case, it would look like this: 4794.3 (total) = 3248.3 (treatment) + 1206.5 (block) + 339.5 (error)
We can also represent the nitrogen and block as a ‘Model.’ Therefore, we can express the equation as:
Total = Model + Error
In numbers, this would translate to: 4794.3 (total) = 4454.8 (model) + 339.5 (error)
Variance
As I mentioned earlier, dividing the sum of squares by ‘n-1’ (degree of freedom) produces the variance. Now, let’s proceed with calculating the variance.
degree of freedom of nitrogen = 4 - 1 = 3
degree of freedom of block = 3 - 1 = 2
degree of freedom of total = 12 - 1 = 11
degree of freedom of error = 11 - (3+2) = 6 (∵ Total - Model = Error)
or
degree of freedom of error = (t-1)*(b-1)
where (t= number of treatments / b = number of replicates)
∴ (4-1)*(3-1) = 6
If we divide each sum of squares (SS) by ‘n-1’ (the degrees of freedom), we obtain the mean sum of squares, or Mean Square (MS). In other words, the mean square is the same as the variance.
The mean square of the model (which includes the treatment and block) is calculated as 4454.8 (which is the sum of 3248.3 and 1206.5) divided by 5 (3 for treatment + 2 for block), which equals 890.96. This represents the variance of the model.
The mean square of the error is calculated as 339.5 (the sum of squares of the error) divided by 6 (degrees of freedom for the error), which equals 56.58. This is the variance of the error.
Finally, the mean square of the total is calculated as 4794.3 (total sum of squares) divided by 11 (total degrees of freedom), which equals 435.848. This represents the total variance.
To verify using statistical program
We have finished calculating everything. Now, let’s verify that our calculations are correct. First, I will use JMP.
Please check the values in the red box provided by JMP, and compare them with our own calculations. They are the same value. If we calculate the F-ratio (MS model / MS error), and find the p-value at degrees of freedom 5 (for the model) and 6 (for the error), the analysis of variance (ANOVA) is completed.
PQRS truly helps us understand the F or t-distribution visually. You can download the program from the website provided below.
https://pqrs.software.informer.com/
Given that the degrees of freedom for the model are 5, and the degrees of freedom for the error are 6, the area under the F-distribution at an F-value of 15.7458 will be 0.0021. In other words, the p-value is 0.0021.
Finally R-squared!!
It’s been a long journey! Now, let’s calculate R2.
In ANOVA, R2 is the ratio between SS model and SS total, indicating how much the model accounts for the total. If this ratio is high, it means the errors in the data are small.
In this data,
R2 = 4454.8 / 4794.3 = 0.93
In the data, the error is around 7%. (The error could arise from environmental factors, lack of repeatability, etc.). Therefore, another equation to calculate R2 would be:
R2 = 1- (339.5/4794.3) = 0.93
Tip!! variance of error
I analyze the same data using R.
# Data generation
Nitrogen= rep(c("N0","N1","N2","N3"),each=3)
Block= rep(c("I","II","III"),time=4)
Yield= c(99,109,89,115,142,133,121,157,142,125,150,139)
DataA= data.frame (Nitrogen,Block,Yield)
# One-way ANOVA with block
summary(aov(Yield~Nitrogen+factor(Block),data=DataA))
When you examine the ANOVA table, which value do you usually check first? Most people look at the p-value, but I believe the variance of the error, which is 56.6 in this case, is also an important value to check. Variance of error is commonly referred to as the Mean Square Error (MSE), or pooled variance. Different software presents this value in various forms in the ANOVA table.
Here is the same result using SAS.
proc glm data= DATABASE.DATA;
class Nitrogen Block;
model Yield= Nitrogen Block / ss1;
lsmeans Nitrogen / adjust=tukey pdiff=all alpha=0.05 cl;
quit;
Now I’ll compare the ANOVA table in three different programs.
Each program displays a slightly different form of the ANOVA table, but if we fully understand how to calculate the sum of squares, mean square, F-ratio, etc., we can find what we need in the ANOVA table.
Did you locate the MSE, 56.6, in the ANOVA table?
Now, I’ll introduce a different approach to this value. Please see the post below.