What is split-split-plot design in agronomy research (feat. using R and SAS)?

March 12, 2023 JK Comments 0 Comment

In my previous post, I explained what split-plot design and the statistical model is, and also how it is different RCBD.

What is split-plot design in agronomy research?

I explained the main difference between split-plot design and RCBD is that in split-plot design, error is divided into two (error a and b), increasing the significance of interaction between the main plot and sub-plot.

Now our interest lies in cases where we have three factors. In a split-plot design, we typically have two factors (such as genotype and treatment, as I mentioned in my previous post), where genotypes are set up as the main plot and treatments as the sub-plot. But, when we have three factors, we can use an experimental design called a split-split plot design.

What is split-split plot design?

Now, I’d like to conduct an field experiment to elucidate how yield responds to planting date, herbicide application and nitrogen amount. All factors have two levels, and replicates are 3

▪ Planting date: 
    Early and Late sowing
▪ Herbicide application: 
    H0 and H1 where H0; no herbicide, and H1; herbicide 
▪ Nitrogen amount: 
    N0 and N1 where N0; no nitrogen, and N1; 200kg/ha N
▪ Replicates:
    Three (as blocks)

Therefore, treatment number will be 8 (= 2 * 2* 2), and experimental unit will be 24 (= 8 * 3). If we set up an experimental design as the Randomized Complete Block Design (RCBD), the plot layout would be like below.

All factors are randomized per block. However, this experimental design has some issues. For instance, when we plant crops for late sowing, the tractor cannot pass through the plots (indicated by arrow direction) because crops are already growing in the early sowing plots. Additionally, it would be difficult to apply herbicide to specific plots. However, if we divide the plots, these problems will be solved.

How to divide plots?

First, I divide the whole plot by planting date. So this will be the main plot.

Second, I randomly divide secondary plot within the main plot by herbicide application. So, this will be secondary plot.

Third, I randomly divide tertiary plot within the secondary plot by nitrogen application. So, this will be tertiary plot.

Then, let’s compare the plot between RCBD and split-split plot design. In the 1st block, the plot number is 8 (as treatment number is 8), but the plot layout would be different like below.

In split-split plot design, now a tractor can go through plots for late sowing because the plots between early and late sowing were completely divided. Also, it would be much easier to apply herbicide (H1).

What is a statistical model for split-split plot design?

The statistical model for 3-way ANOVA is below.

Y_ijkl = μ + α_i + β_j + γ_k + (αβ)_ij + (αγ)_ik + (βγ)_jk + (αβγ)_ijk + ρ_l + ε_ijkl

where
▪ y_ijkl = each yield for treatment (ijk, i=planting date, j=herbicide, k=nitrogen) and replicate (l)
▪ μ = grand mean of yield
▪ α_i = the effect of planting date (i)
▪ β_j = the effect of herbicide (j)
▪ γ_k = the effect of nitrogen (k)
▪ αβ_ij = the interaction between planting date (i) and herbicide (j)
▪ αγ_ik = the interaction between planting date (i) and nitrogen (k)
▪ βγ_jk = the interaction between herbicide (j) and nitrogen (k)
▪ αβγ_ijk = the interaction among planting date (i), herbicide (j) and nitrogen (k)
▪ ρ_l = block effect (l)
▪ ε_ijkl = error

and the statistical model for split-split plot design is below.

y_ijkl= μ + ρ_i + α_j + (ρα)_ij + β_k + (αβ)_jk + (ραβ)_ijk + γ_l  + (αγ)_jl+ (βγ)_kl+ (αβγ)_jkl + ε_ijkl 

where
▪ y_ijkl = each yield for treatment (jkl, j=planting date, k=herbicide, l=nitrogen) and replicate (i)
▪ μ = grand mean of yield
▪ ρ_i = block effect (i) 
▪ α_j = the effect of main plot; planting date (j)
▪ ρα_ij = error at main plot
▪ β_k = the effect of secondary plot; herbicide (k)
▪ αβ_jk = the interaction between main (planting date) and secondary plot (herbicide)
▪ ραβ_ijk = error at secondary plot
▪ γ_l = the effect of tertiary plot; nitrogen (l)
▪ αγ_jl = the interaction between main (planting date) and tertiary plot (nitrogen)
▪ βγ_kl = the interaction between second (herbicide) and tertiary plot (nitrogen)
▪ αβγ_jkl = the interaction among main (planting date), second (herbicide) and tertiary plot (nitrogen)
▪ ε_ijkl = error at tertiary plot

Let’s see how those two statistical model are different!!

I generate one data.

sowing_date=rep(c("Early" ,"Normal"), each=12)
herbicide=rep (rep(c("H0" ,"H1"), each=6),2)
nitrogen=rep (rep(c("H0" ,"N1"), each=3),4)
block=rep(c("Block 1" ,"Block 2", "Block 3"), times=8)
yield=c(30,27,25,40,41,42,37,38,40,48,47,46,25,27,26,41,41,42,38,39,42,57,59,60)
dataA=data.frame(sowing_date,herbicide,nitrogen,block,yield)

You can download this data in my GitHub
Download>> planting_data_herbicide_nitrogen.csv

First, I’ll do ANOVA analysis (3-way ANOVA)

anova3way=aov (yield ~ sowing_date + herbicide + nitrogen +
              sowing_date:herbicide + sowing_date:nitrogen +
              herbicide:nitrogen + sowing_date:herbicide:nitrogen +
              factor(block), data=dataA)
summary(anova3way)

Second, I’ll analyze split-split plot design. The R package I’ll use is ssp.plot {agricolae}

install.packages('agricolae')
library(agricolae)

The code for split-split plot design is below.

model= with(dataA,ssp.plot(block,sowing_date,herbicide,nitrogen,yield))

Like split-plot design, error is divided, but in split-split plot design, error will be divided into three (error a, b and c).

This is another code to analyze split-split plot design.

ANOVA=aov(yield~block+sowing_date*herbicide*nitrogen+
      Error(block/sowing_date/herbicide), data=dataA)
summary(ANOVA)

Code sumamry>> split-split plot design

The result is the same, but if you simply use R package ssp.plot {agricolae}, it’s much easier, but you never understand the principle of split-split plot design.

Please look at the second code I suggest,
aov(yield~block+sowing_date*herbicide*nitrogen+Error(block/sowing_date/herbicide)

Why block, sowing_date and herbicide set up as error, and why not nitrogen? This is because block, error of main and secondary plot were considered as random factors.

If you conduct split-split plot design in JMP, the model construct would be like below.

It means the code; ssp.plot(block,sowing_date,herbicide,nitrogen,yield) or aov(yield~block+sowing_date*herbicide*nitrogen+Error(block/sowing_date/herbicide) would be the same as model construct

How is split-split plot design different from 3-way ANOVA?

In my previous post, What is split-plot design in agronomy research? , I explained problems with RCBD when

treatment number will be increased (i.e., more factors or more levels of a factor), it would be difficult to obtain a homogeneous condition within a block.
experimental factors have biological or physical barriers. For example, if a factor is about virus inoculation, this randomized design would be dangerous because the virus would move to other non virus factor. Also, if specific treatment is about planting date, this randomized design cannot allow us to plant crops using a tractor at different times.

and I explained the basic principle to set up main and sub-plot in split-plot design.

If we think a factor is more important to be see the significance, it would be better to set up as the sub-factor, and if we think the effect of a factor is already known, set up as the main factor.
If the main effect of one factor (A) is more large than another factor (B), set up a factor (A) as the main plot.
In the field, a factor which is not able to be easily differentiated, it would be better to set up as the main plot. For example, planting date, nitrogen amount, water amount, etc.

Also, I mentioned about the benefit of split-plot design as

“In split-plot design, compared to RCBD, the significance of the main plot would decrease, while the significance of the sub-plot would increase. Additionally, the significance of the interaction between the main and sub-plot would increase.”

In split-split plot design, basically it would be the same as split-plot design.

Let’s compare RCBD (3-way ANOVA) and split-split plot design.

I thought ananyzing nitrogen effect should be more accurate, so I set up nitrogen as tertiary plot, and effect of sowing date would be already clear, so that I set up sowing date as the main plot.

In RCBD (3-way ANOVA), sowing date is significant (F-ratio: 22.073, p-value< 0.001), but it was not significant in split-split plot design (F-ratio: 13.935, p-value= 0.06). Please see the F-ratio for sowing date at both tables. In RCBD, it was 22.073 (≈ 54.0 / 2.4), but in split-split plot design, it was 13.935 (≈ 54.0 / 3.88). Mean square is the same (54.0), but error was changed (RCBD: 2.4, split-split plot design: 3.38), indicating significance of the main plot would be decreased due to increased error (2.4 → 3.88) in split-split plot design.

On the other hand, the significance for secondary plot (herbicide) was increased in split-split plot design, and this is due to decreased error (2.4 → 1.38). Also the interaction between factors would be increased. Please look at the interaction between sowing date and herbicide. In RCBD, F-ratio was 30.044 (≈ 73.5/2.4), but in split-split plot design, it was 53.455 (≈ 73.5/1.38). Again!! this is due to decreased error (2.4 → 1.38). Basically significance also increases in tertiary plot, but it was not in this data. When the factor shows highly significant, it would not be so different at both RCBD and split-split plot design. For example, because nitrogen effect was highly significant in RCBD, it would not be so different in split-split plot design.

However, when the significance for a factor is not highly clear, if you set up a factor as secondary or tertiary plot, you can obtain greater statistical significance for a factor and interaction with other factors.

Errors are the key to analyzing data

Eventually, it’s all about errors.

In split-split plot design, error will be divided into three, and it would result in increasing significance in secondary and tertiary plot.

Split-split plot design using SAS

I introduce how to analyze split-split plot design using SAS. First, let’s select Mixed models.

In Model, for fixed factors, select above factors. For random factors, select below factors.

In options tab, select Type3 as Estimation method.

Then when you run the program, you’ll obtain the result below.

If you compare this table with the table R provides, you can see it’s the same.

It means that R code, ssp.plot(block,sowing_date,herbicide,nitrogen,yield) or aov(yield~block+sowing_date*herbicide*nitrogen+Error(block/sowing_date/herbicide) represent the model construct in SAS. That’s why I told you if you use simply R code, you never understand the principle of split-split plot design.

This is SAS code for split-split plot design.

proc mixed data=WORK.YIELD method=type3 plots=(residualPanel) alpha=0.05;
	class sowing_date herbicide nitrogen block;
	model yield=sowing_date herbicide nitrogen sowing_date*herbicide 
	sowing_date*nitrogen herbicide*nitrogen sowing_date*herbicide*nitrogen /;
	random block sowing_date*block herbicide*block(sowing_date) /;
run;

In agronomy research, #split_split_plot_design is one of the most popular experimental designs. I explain what split-split plot design is and how it is different from #RCBD. Also, I explain the statistial model for split-split plot design (https://t.co/2TYwUcTCm0)
— Jin.W.Kim (JK) (@el_trigo_JK) March 13, 2023

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.