Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel

Understanding Mean Absolute Error (MAE) in ANOVA: A Step-by-Step Guide to Calculation in Excel






Mean Absolute Error (MAE) is a metric used to measure the accuracy of a model’s predictions. It calculates the average magnitude of the errors in a set of predictions, without considering their direction. In other words, MAE measures the average absolute difference between the actual values and the predicted values. MAE is typically used in the context of regression analysis and prediction error evaluation, rather than in ANOVA (Analysis of Variance), which focuses on comparing the means of different groups. However, the concept of measuring error can still be relevant when assessing the fit of a model in ANOVA contexts.

Today, I’ll explain how to calculate MAE when conducting ANOVA.

Here is one dataset. Let’s say this data represents yield in response to different fertilizer types.

FertilizerYield
Control12.2
Control12.4
Control11.9
Control11.3
Control11.8
Control12.1
Control13.1
Control12.7
Control12.4
Control11.4
Slow16.6
Slow15.8
Slow16.5
Slow15.0
Slow15.4
Slow15.6
Slow15.8
Slow15.8
Slow16.0
Slow15.8
Fast9.5
Fast9.5
Fast9.6
Fast8.8
Fast9.5
Fast9.8
Fast9.1
Fast10.3
Fast9.5
Fast8.5

Fertilizer is an experimental factor, and yield is the final outcome from that factor. Therefore, fertilizer is the independent variable (x), while yield is the dependent variable (y). Each yield value is indicated as yij where i is fertilizers and j is replicates. It means y11 will be the first replicate at control, which is 12.2

xyij
FertilizerYield
Control12.2
Control12.4
Control11.9
Control11.3
Control11.8
Control12.1
Control13.1
Control12.7
Control12.4
Control11.4
Slow16.6
Slow15.8
Slow16.5
Slow15.0
Slow15.4
Slow15.6
Slow15.8
Slow15.8
Slow16.0
Slow15.8
Fast9.5
Fast9.5
Fast9.6
Fast8.8
Fast9.5
Fast9.8
Fast9.1
Fast10.3
Fast9.5
Fast8.5

Next, I’ll calculate the grain mean (..) across all fertilizers.

xyij ȳ..
FertilizerYieldGrand mean
Control12.212.46
Control12.412.46
Control11.912.46
Control11.312.46
Control11.812.46
Control12.112.46
Control13.112.46
Control12.712.46
Control12.412.46
Control11.412.46
Slow16.612.46
Slow15.812.46
Slow16.512.46
Slow15.012.46
Slow15.412.46
Slow15.612.46
Slow15.812.46
Slow15.812.46
Slow16.012.46
Slow15.812.46
Fast9.512.46
Fast9.512.46
Fast9.612.46
Fast8.812.46
Fast9.512.46
Fast9.812.46
Fast9.112.46
Fast10.312.46
Fast9.512.46
Fast8.512.46

Next, I’ll calculate the mean of each fertilizer, and it will be written as i.

xyij ȳ..i.
FertilizerYieldGrand meanFertilizer mean
Control12.212.4612.13
Control12.412.4612.13
Control11.912.4612.13
Control11.312.4612.13
Control11.812.4612.13
Control12.112.4612.13
Control13.112.4612.13
Control12.712.4612.13
Control12.412.4612.13
Control11.412.4612.13
Slow16.612.4615.83
Slow15.812.4615.83
Slow16.512.4615.83
Slow15.012.4615.83
Slow15.412.4615.83
Slow15.612.4615.83
Slow15.812.4615.83
Slow15.812.4615.83
Slow16.012.4615.83
Slow15.812.4615.83
Fast9.512.469.4
Fast9.512.469.4
Fast9.612.469.4
Fast8.812.469.4
Fast9.512.469.4
Fast9.812.469.4
Fast9.112.469.4
Fast10.312.469.4
Fast9.512.469.4
Fast8.512.469.4

Then, I will subtract the grand mean (ȳ..) from the fertilizer mean (i.), and this is called the fertilizer effect (or treatment effect).

xyij ȳ..i.ȳi.ȳ..
FertilizerYieldGrand meanFertilizer meanFertilizer effect
Control12.212.4612.13-0.33
(=12.13 – 12.46)
Control12.412.4612.13-0.33
Control11.912.4612.13-0.33
Control11.312.4612.13-0.33
Control11.812.4612.13-0.33
Control12.112.4612.13-0.33
Control13.112.4612.13-0.33
Control12.712.4612.13-0.33
Control12.412.4612.13-0.33
Control11.412.4612.13-0.33
Slow16.612.4615.833.37
Slow15.812.4615.833.37
Slow16.512.4615.833.37
Slow15.012.4615.833.37
Slow15.412.4615.833.37
Slow15.612.4615.833.37
Slow15.812.4615.833.37
Slow15.812.4615.833.37
Slow16.012.4615.833.37
Slow15.812.4615.833.37
Fast9.512.469.4-3.05
Fast9.512.469.4-3.05
Fast9.612.469.4-3.05
Fast8.812.469.4-3.05
Fast9.512.469.4-3.05
Fast9.812.469.4-3.05
Fast9.112.469.4-3.05
Fast10.312.469.4-3.05
Fast9.512.469.4-3.05
Fast8.512.469.4-3.05

Next, I will subtract the fertilizer mean (i.) from each yield value (yij), and this is called residuals (or errors).

xyij ȳ..i.ȳi.ȳ..yiji.
FertilizerYieldGrand meanFertilizer meanFertilizer effectResiduals
Control12.212.4612.13-0.33 0.07
(=12.2 – 12.13)
Control12.412.4612.13-0.330.27
Control11.912.4612.13-0.33-0.23
Control11.312.4612.13-0.33-0.83
Control11.812.4612.13-0.33-0.33
Control12.112.4612.13-0.33-0.03
Control13.112.4612.13-0.330.97
Control12.712.4612.13-0.330.57
Control12.412.4612.13-0.330.27
Control11.412.4612.13-0.33-0.73
Slow16.612.4615.833.370.77
Slow15.812.4615.833.37-0.03
Slow16.512.4615.833.370.67
Slow15.012.4615.833.37-0.83
Slow15.412.4615.833.37-0.43
Slow15.612.4615.833.37-0.23
Slow15.812.4615.833.37-0.03
Slow15.812.4615.833.37-0.03
Slow16.012.4615.833.370.17
Slow15.812.4615.833.37-0.03
Fast9.512.469.4-3.050.09
Fast9.512.469.4-3.050.09
Fast9.612.469.4-3.050.19
Fast8.812.469.4-3.05-0.61
Fast9.512.469.4-3.050.09
Fast9.812.469.4-3.050.39
Fast9.112.469.4-3.05-0.31
Fast10.312.469.4-3.050.89
Fast9.512.469.4-3.050.09
Fast8.512.469.4-3.05-0.91

Finally, I’ll calculate the total variation. I will subtract the grand mean (ȳ..) from each yield value (yij), and this is called Total.

xyij ȳ..i.ȳi.ȳ..yiji.yij..
FertilizerYieldGrand meanFertilizer meanFertilizer effectResidualsTotal
Control12.212.4612.13-0.33 0.07-0.26
Control12.412.4612.13-0.330.27-0.06
Control11.912.4612.13-0.33-0.23-0.56
Control11.312.4612.13-0.33-0.83-1.16
Control11.812.4612.13-0.33-0.33-0.66
Control12.112.4612.13-0.33-0.03-0.36
Control13.112.4612.13-0.330.970.64
Control12.712.4612.13-0.330.570.24
Control12.412.4612.13-0.330.27-0.06
Control11.412.4612.13-0.33-0.73-1.06
Slow16.612.4615.833.370.774.14
Slow15.812.4615.833.37-0.033.34
Slow16.512.4615.833.370.674.04
Slow15.012.4615.833.37-0.832.54
Slow15.412.4615.833.37-0.432.94
Slow15.612.4615.833.37-0.233.14
Slow15.812.4615.833.37-0.033.34
Slow15.812.4615.833.37-0.033.34
Slow16.012.4615.833.370.173.54
Slow15.812.4615.833.37-0.033.34
Fast9.512.469.4-3.050.09-2.96
Fast9.512.469.4-3.050.09-2.96
Fast9.612.469.4-3.050.19-2.86
Fast8.812.469.4-3.05-0.61-3.66
Fast9.512.469.4-3.050.09-2.96
Fast9.812.469.4-3.050.39-2.66
Fast9.112.469.4-3.05-0.31-3.36
Fast10.312.469.4-3.050.89-2.16
Fast9.512.469.4-3.050.09-2.96
Fast8.512.469.4-3.05-0.91-3.96

I calculated each variation, and we can prove Total variation = Treatment effect + Error.

  • Total Variation refers to the overall variability in the data.
  • Treatment Effect captures the variability attributable to the differences between treatment groups or conditions.
  • Error (or residual variation) accounts for the variability within each treatment group that is not explained by the treatment itself.

For example, in the first yield value in the table, -0.26 = – 0.33 + 0.07

Also, yield can be partitioned into yield= grand mean + Treatment effect + Error.

For example, in the first yield value in the table, 12.2 = 12.46 – 0.33 + 0.07

and it could be written as yij = μ + τi+ εij where μ is the grand mean, τi is the treatment effect, and εij is residuals.






From now on, I’m interested in the magnitude of each variation. When summing them, you’ll find that the total is always 0. The sum of the deviations (each value – the mean) is always 0. To avoid this result, I’ll square each deviation.

xyij ȳ..i.ȳi.ȳ..yiji.yij..(i...)2(yiji.)2(yij..)2
FertilizerYieldGrand
mean
Fertilizer
mean
Fertilizer
effect
ResidualsTotal Fertilizer
effect^2
Residuals^2Total^2
Control12.212.4612.13-0.33 0.07-0.260.110.000.07
Control12.412.4612.13-0.330.27-0.060.110.070.00
Control11.912.4612.13-0.33-0.23-0.560.110.050.31
Control11.312.4612.13-0.33-0.83-1.160.110.691.34
Control11.812.4612.13-0.33-0.33-0.660.110.110.43
Control12.112.4612.13-0.33-0.03-0.360.110.000.13
Control13.112.4612.13-0.330.970.640.110.940.41
Control12.712.4612.13-0.330.570.240.110.320.06
Control12.412.4612.13-0.330.27-0.060.110.070.00
Control11.412.4612.13-0.33-0.73-1.060.110.531.12
Slow16.612.4615.833.370.774.1411.380.5917.17
Slow15.812.4615.833.37-0.033.3411.380.0011.18
Slow16.512.4615.833.370.674.0411.380.4516.35
Slow15.012.4615.833.37-0.832.5411.380.696.47
Slow15.412.4615.833.37-0.432.9411.380.188.66
Slow15.612.4615.833.37-0.233.1411.380.059.88
Slow15.812.4615.833.37-0.033.3411.380.0011.18
Slow15.812.4615.833.37-0.033.3411.380.0011.18
Slow16.012.4615.833.370.173.5411.380.0312.56
Slow15.812.4615.833.37-0.033.3411.380.0011.18
Fast9.512.469.4-3.050.09-2.969.280.018.74
Fast9.512.469.4-3.050.09-2.969.280.018.74
Fast9.612.469.4-3.050.19-2.869.280.048.16
Fast8.812.469.4-3.05-0.61-3.669.280.3713.37
Fast9.512.469.4-3.050.09-2.969.280.018.74
Fast9.812.469.4-3.050.39-2.669.280.157.06
Fast9.112.469.4-3.05-0.31-3.369.280.1011.27
Fast10.312.469.4-3.050.89-2.169.280.794.65
Fast9.512.469.4-3.050.09-2.969.280.018.74
Fast8.512.469.4-3.05-0.91-3.969.280.8315.66
Σ 207.687.11214.79

From now on, we can sum each squared variation. Therefore, I will refer to them as the sum of squared Treatment (Fertilizer), the Sum of Squared Errors (residuals), and the Sum of Squared Total. As abbreviations, I’ll write them as SSTreatment , SSError, and SSTotal and we can also prove the below formula.

SSTotal = SSTreatment + SSError

which is 214.79 = 207.68 + 7.11

and mathematical formula, it can be written as

Σ(yij - ȳ..)2 = Σ(i. - ȳ..)2 + Σ(yij - ȳi.)2

As summarized in the figure below.






What is Mean Squared Error (MSE)?

Do you know the equation of variance?

If we know the population mean, we can calculate population variance (σ2) as follows.

However, in most cases, we do not know the population mean, and therefore, we take some samples (n) from the population and will use the mean of samples to calculate sample variance as follows.

Let’s connect the concept of the variance equation above with the concepts of SSTreatment , SSError, and SSTotal

Variance is calculated as the Sum of Squared (SS) divided by the degree of freedom (n-1).

We already calculated SSTreatment , SSError, and SSTotal and degree of freedom of each case is calculated as

df SSTotal = N - 1 (where N is total number of samples)
df SSTreatment = t - 1 (where t is number of levels of the treatment)
df SSError = N- t

So, we can calculate the variance of Total, Treatment, and Error as follows.

▪ Variance of Total = 214.79 / 30 = 7.1597
▪ Variance of Treatment = 207.69 / (3 – 1) = 103.845
▪ Variance of Error = 7.11 / (30 – 3) = 0.263

Also, we can refer to them as the Mean Squared of Total, Treatment, and Error since we divide the Sum of Squares by the sample size (specifically n-1 for sampling data). Therefore, the Mean Squared Treatment, and the Mean Squared Error are equivalent to the Variance of Treatment and Error, respectively. In statistics, the Mean Squared Error is often referred to as MSE for simplicity.

Therefore, in the dataset, the MSE is 0.236, which represents the variance of the error in the data.

Let’s check this calculation is correct using a statistical program. I use R.

First, I’ll upload the dataset.

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/refs/heads/main/Fertilizer%20(One%20Way%20ANOVA).csv"
df= data.frame(read_csv(url(github), show_col_types=FALSE))

head(df,5)
  Fertilizer Yield Height
1    Control  12.2     45
2    Control  12.4     52
3    Control  11.9     42
4    Control  11.3     35
5    Control  11.8     40
.
.
.

I will conduct a statistical analysis using One-Way ANOVA.

model= aov(Yield ~ Fertilizer, data=df)
summary (model)

            Df Sum Sq Mean Sq F value Pr(>F)    
Fertilizer   2 207.68  103.84   394.3 <2e-16 ***
Residuals   27   7.11    0.26      

MSE is the same as what I calculate manually.






How about Mean Absolute Error (MAE)?

This is the equation for MAE. It’s simple: instead of squaring the residuals, you take the absolute value and divide by the number of samples (n).

This is the difference between MSE and MAE.

Let’s calculate MAE according to the equation.

xyij ȳ..i.ȳi.ȳ..yiji.yij..(i...)2(yiji.)2(yij..)2|yiji.|
FertilizerYieldGrand
mean
Fertilizer
mean
Fertilizer
effect
ResidualsTotal Fertilizer
effect^2
Residuals^2Total^2Absolute
Error
Control12.212.4612.13-0.33 0.07-0.260.110.000.070.07
Control12.412.4612.13-0.330.27-0.060.110.070.000.27
Control11.912.4612.13-0.33-0.23-0.560.110.050.310.23
Control11.312.4612.13-0.33-0.83-1.160.110.691.340.83
Control11.812.4612.13-0.33-0.33-0.660.110.110.430.33
Control12.112.4612.13-0.33-0.03-0.360.110.000.130.03
Control13.112.4612.13-0.330.970.640.110.940.410.97
Control12.712.4612.13-0.330.570.240.110.320.060.57
Control12.412.4612.13-0.330.27-0.060.110.070.000.27
Control11.412.4612.13-0.33-0.73-1.060.110.531.120.73
Slow16.612.4615.833.370.774.1411.380.5917.170.77
Slow15.812.4615.833.37-0.033.3411.380.0011.180.03
Slow16.512.4615.833.370.674.0411.380.4516.350.67
Slow15.012.4615.833.37-0.832.5411.380.696.470.83
Slow15.412.4615.833.37-0.432.9411.380.188.660.43
Slow15.612.4615.833.37-0.233.1411.380.059.880.23
Slow15.812.4615.833.37-0.033.3411.380.0011.180.03
Slow15.812.4615.833.37-0.033.3411.380.0011.180.03
Slow16.012.4615.833.370.173.5411.380.0312.560.17
Slow15.812.4615.833.37-0.033.3411.380.0011.180.03
Fast9.512.469.4-3.050.09-2.969.280.018.740.09
Fast9.512.469.4-3.050.09-2.969.280.018.740.09
Fast9.612.469.4-3.050.19-2.869.280.048.160.19
Fast8.812.469.4-3.05-0.61-3.669.280.3713.370.61
Fast9.512.469.4-3.050.09-2.969.280.018.740.09
Fast9.812.469.4-3.050.39-2.669.280.157.060.39
Fast9.112.469.4-3.05-0.31-3.369.280.1011.270.31
Fast10.312.469.4-3.050.89-2.169.280.794.650.89
Fast9.512.469.4-3.050.09-2.969.280.018.740.09
Fast8.512.469.4-3.05-0.91-3.969.280.8315.660.91
Σ 207.687.11214.7911.18

MAE = 11.18 / 30 = 0.373

Mean Absolute Error (MAE) is calculated as 0.373. Let’s check this calculation is correct.

if(!require(Metrics)) install.packages("Metrics")
library(Metrics)

model= aov(Yield ~ Fertilizer, data=df)

mae(df$Yield, predict(model))
0.3726667

The value is the same.










Comments are closed.