Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?

Augment Models: How to Calculate Contrasts and Analyze Your Data with Excel and R?






I have the following data.

NitrogenSulphurRepYield
0011.0
0020.9
0030.8
N1S111.0
N1S121.2
N1S131.3
N1S212.1
N1S222.2
N1S232.3
N2S111.4
N2S121.6
N2S131.7
N2S212.5
N2S222.6
N2S232.8

Let’s assume that this data is the result of investigating how the yield responds to two different nitrogen fertilizer treatments (N1; 150 and N2; 200 kg ha-1) and two different sulfuric acid fertilizer treatments (S1; 24 and S2; 48 kg ha-1). After finalizing the experimental design in this way, I later realized that I should include a control group (= without any nitrogen or sulfuric acid fertilizer treatments).

If it were a basic factorial experiment, the data structure would probably be as shown below.

Treatment No. = 3 leves (0, 150, 200 kg ha-1) *  3 lelves (0, 24, 48 kg ha-1) = 9
Experimental Unit = 9 * 3 reps = 27 

However, the field layout had already been completed and there was no space available to include the control plots. Therefore, we forced the placement of only one control plot in each rep. As a result, the data structure became as follows.

This data structure is not a full factorial experiment. There are several terms used to describe this type of data structure. It is sometimes referred to as ‘factorial plus added control,’ which is the method used to analyze this type of data structure in the Genstat program. (Please see page 46 at https://genstat.kb.vsni.co.uk/wp-content/uploads/sites/2/AnovaGuide.pdf). The broadest term used to describe this is augmented design, and the method used to analyze augmented design is called Contrasts.

Note!!
Augmented models are not only used for unbalanced designs, but also for balanced designs. The use of contrasts is a general statistical approach that is applicable to various experimental designs and research questions. Contrasts are used to analyze treatment effects while controlling for other factors in the experiment, which is important for obtaining reliable and accurate statistical results. The augmented model is a type of linear model that uses contrasts to analyze the data and make statistical inferences about the treatment effects.





How to calculate Contrasts?

Contrasts can be used to perform significance tests for structured qualitative factors. They can help determine if there are significant differences between the levels of a factor by comparing the mean values of different factor levels.

There are five treatments in total (Control, N1S1, N1S2, N2S1, N2S2), and we are interested in answering the following questions:

  1. Is there a response to nitrogen application?
  2. Do varying levels of nitrogen application have an impact on the final yield?
  3. Is there a difference in response to sulphur application (S1 and S2)?
  4. Is there an interaction between nitrogen application (N1 and N2) and sulphur application (S1 and S2)?

To answer these questions statistically, the best approach is to use contrasts of significance rather than LSD or Tukey’s HSD tests. Contrasts are linear combinations of the different treatments and are a more powerful way to compare treatment means.

That is, the null hypothesis µ1 = µ2 represents a contrast of significance, as it can be written as a linear combination of µ1 = µ2 = 0 in which the sum of the two coefficients is equal to zero. We will see that every contrast has an associated sum of squares with a single degree of freedom, and we will use planned F-tests to evaluate them.






Therefore, if we want to get an answer to the first question, we can assess the following contrast:

H0:  µcontrol – (µN1S1 + µN1S2 + µN2S1 + µN2S2) / 4 = 0
Ha:  µcontrol – (µN1S1 + µN1S2 + µN2S1 + µN2S2) / 4 ≠ 0

This will be equivalent to

H0:  4µcontrol – µN1S1 – µN1S2 – µN2S1 – µN2S2 = 0
Ha:  4µcontrol – µN1S1 – µN1S2 – µN2S1 – µN2S2 ≠ 0

The coefficients of the linear combination defining the first contrast are 4, -1, -1, -1, -1 and their sum is naturally equal to zero.






The second contrast deals with a possible amount effect of nitrogen application. Therefore, we have to compare the means with low (N1) vs high (N2) nitrogen application amounts. It can be written as:

H0:  (µN1S1 + µN1S2) / 2 - (µN2S1 + µN2S2) / 2 = 0
Ha:  (µN1S1 + µN1S2) / 2 - (µN2S1 + µN2S2) / 2 ≠ 0

This will be equivalent to

H0:  µN1S1 + µN1S2 - µN2S1 - µN2S2 = 0
Ha:  µN1S1 + µN1S2 - µN2S1 - µN2S2 ≠ 0

The coefficients of this second contrast are: 0, 1, 1, -1, -1






Third, the difference in response to sulphur application (S1 and S2) can be written as:

H0:  (µN1S1 + µN2S1) / 2 - (µN1S2 + µN2S2) / 2 = 0
H0:  (µN1S1 + µN2S1) / 2 - (µN1S2 + µN2S2) / 2 ≠ 0

This will be equivalent to

H0:  µN1S1 + µN2S1 - µN1S2 - µN2S2 = 0
H0:  µN1S1 + µN2S1 - µN1S2 - µN2S2 ≠ 0

and re-arranged as

H0:  µN1S1 - µN1S2 + µN2S1 - µN2S2 = 0
H0:  µN1S1 - µN1S2 + µN2S1 - µN2S2 ≠ 0

The coefficients of this second contrast are: 0, 1, -1, 1, -1






Fourth, the interaction between nitrogen application and sulphur application would be simply calcuated by multiflying 2nd and 3rd Contrasts.

Finally, the five Contrasts will be defined by the following values.

ContrastsControlµN1S1µN1S2µN2S1µN2S2
(1) Control vs N treatment4-1-1-1-1
(2) N1 vs N2 011-1-1
(3) S1 vs S201-11-1
(2) * (3) 01-1-11





How to calculate sum of squares of Contrasts?

It could be complicated, but I’ll explain it in an easy way.

1) to calculate the sum of yield of each treatment.

First, I calculated the sum of yields of each treatment.

ContrastsControlµN1S1µN1S2µN2S1µN2S2
Total of 3 reps (as block)2.73.56.04.77.9

2) to calcualte the sum of squares of each Contrast.

Now, let’s calculate the sum of squares of each contrast.

ContrastsControlµN1S1µN1S2µN2S1µN2S2
Total of 3 reps (as block)2.73.56.04.77.9
(1) Control vs N treatment4-1-1-1-1

Where Ti is the total of the ith treatment in the n=3 reps. li is the coefficient of the linear combination for treatment it .

Using this, we can calculate the sum of squares of each contrast as shown in the table below.

ContrastsControlµN1S1µN1S2µN2S1µN2S2n∑li2∑(Tili)2SS
Total of 3 reps (as block)0.91.22.21.62.6???
(1) Control vs N treatment4-1-1-1-160141.612.36
(2) N1 vs N2 011-1-1120.640.52
(3) S1 vs S201-11-1124.03.31
(2) * (3) 01-1-11120.00.00
SS: Sum of squares

Then, we need to calculate the sum of squares of the total. Simply add all of the sum of squares of each contrast.

2.36 + 0.52 + 3.31 + 0.00 = 6.19

Is that correct? I’ll run ANOVA using R.

Treatment=c("Cotrol","Cotrol","Cotrol","N1_S1","N1_S1","N1_S1","N1_S2","N1_S2","N1_S2","N2_S1","N2_S1","N2_S1","N2_S2","N2_S2","N2_S2")
Rep=rep(c(1,2,3),time=5)
Yield=c(1,0.9,0.8,1,1.2,1.3,2.1,2.2,2.3,1.4,1.6,1.7,2.5,2.6,2.8)
dataA=data.frame(Treatment, Rep,Yield)

summary(aov(Yield~Treatment + factor(Rep), data=dataA))

            Df   Sum Sq   Mean Sq   F value     Pr(>F)    
Treatment    4   6.189    1.5473    125.459     2.99e-07 ***
factor(Rep)  2   0.081    0.0407    3.297       0.0903 .  
Residuals    8   0.099    0.0123                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

In this example, we have a set of t-1=4 orthogonal contrasts. If we add up the sum of squares for each contrast, their sum will exactly equal the Treatment sum of squares (6.189 ≈ 6.19) for the original experiment.

ContrastsControlµN1S1µN1S2µN2S1µN2S2n∑li2∑(Tili)2SS
Total of 3 reps (as block)0.91.22.21.62.66.189
(1) Control vs N treatment4-1-1-1-160141.612.36
(2) N1 vs N2 011-1-1120.640.52
(3) S1 vs S201-11-1124.03.31
(2) * (3) 01-1-11120.00.00
SS: Sum of squares

This means that an experiment can be partitioned into (t – 1) separate, independent experiments, one for each contrast.

Source of variationdfSSMSF-ratiop-value
Total146.369
Reps (as block)20.0810.04073.297n.s.
Treatment46.1891.5473125.459***
(1) Control vs N treatment12.362.36191.883***
(2) N1 vs N210.520.5242.344***
(3) S1 vs S2 13.313.31268.902***
(2) * (3) 10.000.000.068n.s.
Error80.0990.0123
i.e., 2.36/0.0123 ≈ 192

We can easily obtain the p-value in Excel if we know the F-ratio and degrees of freedom for treatment and error.

=FDIST(191.883,1,8)   ***
=FDIST(42.344,1,8)    ***
=FDIST(268.902,1,8)   *** 
=FDIST(0.068,1,8)     n.s.

The following post explains how to calculate F-ratio.


What is the F-ratio in statistics?






How to analyze an augment model using R?

Of course, we don’t have to manually calculate what I showed above. Now, I’ll explain how to analyze an augment model using R. I uploaded the data in R in the format shown below.

Treatment=c("Cotrol","Cotrol","Cotrol","N1_S1","N1_S1","N1_S1","N1_S2","N1_S2","N1_S2","N2_S1","N2_S1","N2_S1","N2_S2","N2_S2","N2_S2")
Rep=rep(c(1,2,3),time=5)
Yield=c(1,0.9,0.8,1,1.2,1.3,2.1,2.2,2.3,1.4,1.6,1.7,2.5,2.6,2.8)
dataA=data.frame(Treatment, Rep,Yield)

Let’s quickly check the data. Roughly, I can estimate the treatment effects, but let’s analyze step by step.

boxplot(Yield~Treatment,
        data=dataA,
        ylab="Yield",
        xlab="Treatment")

1) to decide on the order of the factor levels to determine the contrasts

If you followed my current post well, you can understand why the order of the factor levels is important. Depending on the order, the contrast will be completely changed.

Let’s check the order of the factor levels. It seems that the order is the same as what I manually

library(base)
levels(factor(dataA$Treatment))

[1] "Cotrol" "N1_S1"  "N1_S2"  "N2_S1"  "N2_S2" 

2) to set up Contrasts

I used the code below to provide the contrasts.

Contrasts=list(Ctrl_N=c(4,-1,-1,-1,-1),
               N1_N2=c(0,1,1,-1,-1),
               S1_S2=c(0,1,-1,1,-1),
               N_S=c(0,1,-1,-1,1))

The most tricky part of the augment model is from here because statistical programs do not automatically calculate contrasts. At least, we should calculate the contrasts according to our experimental design and what we want to analyze.

For example, the above picture is about setting up contrasts using JMP. Even though it seems different, the principle is the same. In this case, it would be the same as 1, 1, 1, -4, 1

3) to define the linear model

Now, we need to define the linear model. My model will be a linear model between yield and treatment with a block. Let’s see the statistical output. This is the same as what I showed before.

model= lm(Yield~Treatment + factor(Rep), data=dataA))

library(car)
Anova(model, type="II")

Anova Table (Type II tests)
Response: Yield
             Sum Sq   Df    F value       Pr(>F)    
Treatment    6.1893    4    125.4595      2.993e-07 ***
factor(Rep)  0.0813    2    3.2973        0.09028 .  
Residuals    0.0987    8                       
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

4) to check p-values for partitioned Contrasts

I’ll run the code below. This result is the same as what I calculated above.

library(emmeans)
marginal=emmeans(model, ~Treatment)
marginal

Treatment    emmean   SE       df   lower.CL   upper.CL
 Cotrol      0.90     0.0775   10    0.727     1.07
 N1_S1       1.17     0.0775   10    0.994     1.34
 N1_S2       2.20     0.0775   10    2.027     2.37
 N2_S1       1.57     0.0775   10    1.394     1.74
 N2_S2       2.63     0.0775   10    2.461     2.81
Confidence level used: 0.95 

##
contrast(marginal, Contrasts, adjust="sidak")

contrast  estimate    SE df      t.ratio    p.value
Ctrl_N    -3.9667     0.346 10   -11.451    <.0001
N1_N2     -0.8333     0.155 10   -5.379     0.0012
S1_S2     -2.1000     0.155 10   -13.555    <.0001
N_S        0.0333     0.155 10    0.215     0.9992
P value adjustment: sidak method for 4 tests

###

Contrasts matching
Ctrl_N = (1) control vs N treatment
N1_N2 = (2) N1 vs N2
S1_S2 = (3) S1 vs S2
N_S = 2*3

To answer the questions!!!

  1. Is there a response to nitrogen application?
  2. Do varying levels of nitrogen application have an impact on the final yield?
  3. Is there a difference in response to sulphur application (S1 and S2)?
  4. Is there an interaction between nitrogen application (N1 and N2) and sulphur application (S1 and S2)?

1) Is there a response to nitrogen application?
In this example, based on the previous ANOVA table and the means of each treatment, we can conclude that the application of nitrogen increases yield (based on the contrast; (1) control vs N treatment).

2) Do varying levels of nitrogen application have an impact on the final yield?
There is an effect of nitrogen amount (based on the contrast; (2) N1 vs N2).

3) Is there a difference in response to sulphur application (S1 and S2)?
It is better to localize nitrogen in the S2 treatment rather than the S1 treatment (based on the contrast; (3) S1 vs S2).

4) Is there an interaction between nitrogen application (N1 and N2) and sulphur application (S1 and S2)?
There is no interaction between the nitrogen and sulphur treatments (based on the contrast; (2)*(3)).






How about using the simple ANOVA model?

Someone might ask whether we can simply use the ANOVA model or not. I would like to say it’s up to you!! Let’s analyze the same data with a two-way ANOVA with a block.

# Data generation
Nitrogen=c("Cotrol","Cotrol","Cotrol","N1","N1","N1","N1","N1","N1","N2","N2","N2","N2","N2","N2")
Sulphur=c("Cotrol","Cotrol","Cotrol","S1","S1","S1","S2","S2","S2","S1","S1","S1","S2","S2","S2")
Rep=rep(c(1,2,3),time=5)
Yield=c(1,0.9,0.8,1,1.2,1.3,2.1,2.2,2.3,1.4,1.6,1.7,2.5,2.6,2.8)
dataA=data.frame(Nitrogen, Sulphur, Rep, Yield)

# two-way ANOVA
Model=(aov(Yield~Nitrogen+Sulphur+Nitrogen:Sulphur + factor(Rep), data=dataA)
summary(Model)

                 Df   Sum Sq  Mean Sq  F value    Pr(>F)    
Nitrogen          2   2.881   1.440    116.797    1.20e-06 ***
Sulphur           1   3.307   3.307    268.176    1.95e-07 ***
factor(Rep)       2   0.081   0.041    3.297      0.0903 .  
Nitrogen:Sulphur  1   0.001   0.001    0.068      0.8015    
Residuals         8   0.099   0.012                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

# LSD test
library(agricolae)
LSD_Nitrogen=LSD.test(Model,"Nitrogen")
LSD_Nitrogen

          Yield         groups
N2        2.100000      a
N1        1.683333      b
Cotrol    0.900000      c


LSD_Sulphur=LSD.test(Model,"Sulphur")
LSD_Sulphur

          Yield         groups
S2        2.416667      a
S1        1.366667      b
Cotrol    0.900000      c

The interpretation would be the same as what I did using Contrasts. However, the data I used are very simple. The actual field data are much more complicated, and therefore we can’t say the simple ANOVA model will be the same as Contrasts.

# Full code
# to generate data
Treatment=c("Cotrol","Cotrol","Cotrol","N1_S1","N1_S1","N1_S1","N1_S2","N1_S2","N1_S2","N2_S1","N2_S1","N2_S1","N2_S2","N2_S2","N2_S2")
Rep=rep(c(1,2,3),time=5)
Yield=c(1,0.9,0.8,1,1.2,1.3,2.1,2.2,2.3,1.4,1.6,1.7,2.5,2.6,2.8)
dataA=data.frame(Treatment, Rep,Yield)

# to set up Contrasts
Contrasts=list(Ctrl_N=c(4,-1,-1,-1,-1),
               N1_N2=c(0,1,1,-1,-1),
               S1_S2=c(0,1,-1,1,-1),
               N_S=c(0,1,-1,-1,1))

# to define model
model= lm(Yield~Treatment + factor(Rep), data=dataA)

# Output
library(emmeans)
marginal=emmeans(model, ~Treatment)
contrast(marginal, Contrasts, adjust="sidak")





Another approach: Treatments as members of a group

I mentioned that augmented models are used not only for unbalanced designs but also for balanced designs. A common use is when there are several treatments that can be thought of as members of a group.

FertilizerCode OrganicBiodegradationCharacteristics
Control no NControlNo N
Calcium NitrateCa(NO3)2Inorganicnot persistentN, Inorganic, not persistent
Sodium NitrateNaNO3Inorganicnot persistentN, Inorganic, not persistent
Amonium Sulfate(NH4)2SO4InorganicpersistentN, Inorganic, persistent
UreaCO(NH2)2OrganicN, Organic

For example, different types of fertilizer were used to investigate their effects on crop yield. Fertilizer type would be categorized as different groups, as described above.

My interests are below.

  1. Is there a response to N application?
  2. Are there differences between organic and inorganic nitrogen application?
  3. Is there a difference in response when applying persistent versus non-persistent inorganic N?
  4. Are there differences between the two types of non-persistent inorganic N?

Those questions are relevant groups. In this case, contrasts are a useful way to compare ‘as a group’ or ‘within a group’.


Now we can set up contrasts for each case. For example, for the first question, we can assess the following contrast:

H0: µcontrol – (µCalcium Nitrate + µSodium Nitrate + µAmonium Sulfate + µUrea) / 4 = 0
Ha: µcontrol – (µCalcium Nitrate + µSodium Nitrate + µAmonium Sulfate + µUrea) / 4 ≠ 0

therefore, the contrast will be 4, -1, -1, -1, -1


For second queation?

H0: µUrea – (µCalcium Nitrate + µSodium Nitrate + µAmonium Sulfate) / 3 = 0

and re-arranged as

H0: – µCalcium Nitrate – µSodium Nitrate – µAmonium Sulfate + 3 µUrea = 0
Ha: – µCalcium Nitrate – µSodium Nitrate – µAmonium Sulfate + 3 µUrea ≠ 0

therefore, the contrast will be 0, -1, -1, -1, 3


For third question [NH4 vs NO3]

H0: (µCalcium Nitrate + µSodium Nitrate)/ 2 – µAmonium Sulfate = 0
Ha: (µCalcium Nitrate + µSodium Nitrate)/ 2 – µAmonium Sulfate ≠ 0

and re-arranged as

H0: µCalcium Nitrate + µSodium Nitrate – 2 µAmonium Sulfate = 0
Ha: µCalcium Nitrate + µSodium Nitrate - 2 µAmonium Sulfate ≠ 0

therefore, the contrast will be 0, 1, 1, -2, 0


For the last question,

µCalcium Nitrate – µSodium Nitrate = 0
µCalcium Nitrate – µSodium Nitrate ≠ 0

therefore, the contrast will be 0, 1, -1, 0, 0

To summarize, the contrasts will be as follows:

FertilizerControlCa(NO3)2
Calcium Nitrate
NaNO3
Sodium Nitrate
(NH4)2SO4
Amonium Sulfate
CO(NH2)2
Urea
Q1 [N0_N]4-1-1-1-1
Q2 [Org_Inorg]0-1-1-13
Q3 [NH4_NO3]01-1-20
Q4 [Within NO3]01-100

We know how to analyze an augmented model. Here is the full R code.

Fertilizer=rep(c("Control","Calcium Nitrate","Sodium Nitrate","Amonium Sulfate","Urea"),each=4)
Block=rep(c("I","II","III","IV"),time=5)
Yield=c(265,250,278,220,358,366,424,361,365,330,330,305,402,392,427,377,365,370,383,350)
dataA=data.frame(Fertilizer,Block,Yield)

       Fertilizer      Block  Yield
1          Control     I      265
2          Control     II     250
3          Control     III    278
4          Control     IV     220
5  Calcium Nitrate     I      358
6  Calcium Nitrate     II     366
7  Calcium Nitrate     III    424
8  Calcium Nitrate     IV     361
9   Sodium Nitrate     I      365
10  Sodium Nitrate     II     330
11  Sodium Nitrate     III    330
12  Sodium Nitrate     IV     305
13 Amonium Sulfate     I      402
14 Amonium Sulfate     II     392
15 Amonium Sulfate     III    427
16 Amonium Sulfate     IV     377
17            Urea     I      365
18            Urea     II     370
19            Urea     III    383
20            Urea     IV     350

# to set up Contrasts
Contrasts=list(N0_N=c(4,-1,-1,-1,-1),
               Org_Inorg=c(0,-1,-1,-1,3),
               NH4_NO3=c(0,1,-1,-2,0),
               WithinNO3=c(0,1,-1,0,0))

# to set up the order of levels
dataA$Fertilizer=factor(dataA$Fertilizer, levels=c("Control", "Calcium Nitrate","Sodium Nitrate","Amonium Sulfate","Urea")) 

# to define model
model= lm(Yield~Fertilizer + Block, data=dataA)

# Output
library(emmeans)
marginal=emmeans(model, ~Fertilizer)
contrast(marginal, Contrasts, adjust="sidak")

 contrast  estimate    SE     df   t.ratio    p.value
 N0_N        -463.2    35.7   12   -12.988    <.0001
 Org_Inorg   -146.2    27.6   12   -5.293     0.0008
 NH4_NO3     -711.8    19.5   12   -36.432    <.0001
 WithinNO3     22.2    11.3   12    1.973     0.2584
 Results are averaged over the levels of: Block 
 P value adjustment: sidak method for 4 tests 

Using ANOVA, we can only analyze the difference in yield according to types of fertilizer. However, when we analyze using contrasts, we can compare data either ‘as a group’ or ‘within a group’.

ANOVA=aov(Yield~Fertilizer+Block, data=dataA)

            Df   Sum Sq   Mean Sq   F value    Pr(>F)    
Fertilizer   4   52258    13065     51.346     1.89e-07 ***
Block        3   5468     1823      7.164      0.00516 ** 
Residuals   12   3053     254                     
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

library(agricolae)
LSD_Fertilizer=LSD.test(ANOVA,"Fertilizer")
LSD_Fertilizer

                 Yield groups
Calcium Nitrate 399.50      a
Sodium Nitrate  377.25     ab
Amonium Sulfate 367.00      b
Urea            332.50      c
Control         253.25      d

Reference
□ Design and Analysis of Experimentsin Forestry and Agri Food Sciences: A Doctoral Coursework at the University of Lleida, 2018
An R Companion for the Handbook of Biological Statistics






Leave a Reply

If you include a website address in the comment section, I cannot see your comment as it will be automatically deleted and will not be posted. Please refrain from including website addresses.