What is split-plot design in agronomy research?
Split-plot design has been widely used particularly in the agronomy research. In split-plot design, the experimental units are divided into smaller units. Split-plot designs are useful when some factors are difficult or expensive to change or when the levels of the factors cannot be randomized (I’ll explain in detail later). Split-plot design consists of one whole plot and one subplot. The whole plot factor is randomly assigned to the experimental units, while the subplot factor is applied to a smaller subset of the experimental units, usually nested within the whole plot factor. Today I’ll explain about the definition of split-plot design, and statistical model.
What is a nested model in statistics?
1) What is split-plot design?
Here is one data. This is grain yield data about 4 different genotypes according to 4 different treatments. There are two factors (Genotypes and treatments) with 4 replicates as blocks.
Genotype | Block | Treatment 1 | Treatment 2 | Treatment 3 | Treatment 4 |
CV1 | I | 42.9 | 53.8 | 49.5 | 44.4 |
CV1 | II | 41.6 | 58.5 | 53.8 | 41.8 |
CV1 | III | 28.9 | 43.9 | 40.7 | 28.3 |
CV1 | IV | 30.8 | 46.3 | 39.4 | 34.7 |
CV2 | I | 53.3 | 57.6 | 59.8 | 64.1 |
CV2 | II | 69.6 | 69.6 | 65.8 | 57.4 |
CV2 | III | 45.4 | 42.4 | 41.4 | 44.1 |
CV2 | IV | 35.1 | 51.9 | 45.4 | 51.6 |
CV3 | I | 62.3 | 63.4 | 64.5 | 63.6 |
CV3 | II | 58.5 | 50.4 | 46.1 | 56.1 |
CV3 | III | 44.6 | 45.0 | 62.6 | 52.7 |
CV3 | IV | 50.3 | 46.7 | 50.3 | 51.8 |
CV4 | I | 75.4 | 70.3 | 68.8 | 71.6 |
CV4 | II | 65.6 | 67.3 | 65.3 | 69.4 |
CV4 | III | 54.0 | 57.6 | 45.6 | 56.6 |
CV4 | IV | 52.7 | 58.5 | 51.0 | 47.4 |
Download data>> 4_treatments_4_genotypes_with_4_blocks.csv
The treatment number is 16 [=4 (genotypes) * 4 (treatments)], and total experimental unit is 64 [= 4 (genotypes) * 4 (treatments) * 4 blocks].
If you’re a researcher, how will you set up the experimental design? The most common way is 2-way ANOVA with blocks. In this case, it’s called the Randomized Complete Block Design (RCBD), and the experimental design would be like below.
This is the most common way to set up an experimental design, but there would be some problems when
- treatment number will be increased (i.e., more factors or more levels of a factor), it would be difficult to obtain a homogeneous condition within a block.
- experimental factors have biological or physical barriers. For example, if CV1 is about virus inoculation, this randomized design would be dangerous because the virus would move to other non virus genotypes. Also, if specific treatment is about planting date, this randomized design cannot allow us to plant crops using a tractor at different times.
To overcome such barriers, split-plot design is suggested. In the first example, if we set up genotype as the main plot, and treatment as the sub-plot per block, the experimental design would be like below.
In split-plot design, compared with RCBD, significance of the main plot would be decreased, while significance of sub-plot would be increased. Also, the significance of interaction between main and sub-plot would be increased. I’ll explain why later at 3) F-ratio section.
The key point is
1) If we think a factor is more important to be see the significance, it would be better to set up as the sub-factor, and if we think the effect of a factor is already known, set up as the main factor.
2) If the main effect of one factor (A) is more large than another factor (B), set up a factor (A) as the main plot.
3) In the field, a factor which is not able to be easily differentiated, it would be better to set up as the main plot. For example, nitrogen amount, water amount, etc.
2) Source of variance
If you run a statistical program as RCBD or split-plot design, you’ll obtain different ANOVA tables.
1) RCBD
# to upload data
library (readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/4_treatments_4_genotypes_with_4_blocks.csv"
dataA= data.frame(read_csv(url(github),show_col_types = FALSE))
# 2-way ANOVA
anova= aov(Yield~Genotype+Treatment+Genotype:Treatment+Block, data=dataA)
summary(anova)
2) Split-plot design
# to upload data
library (readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/4_treatments_4_genotypes_with_4_blocks.csv"
dataA<-data.frame(read_csv(url(github),show_col_types = FALSE))
# split-plot design
install.packages('doebioresearch')
library (doebioresearch)
dataA_split_plot= splitplot (dataA[4], dataA$Block, dataA$Genotype, dataA$Treatment, 1)
dataA_split_plot
and the whole table would be below.
Let’s compare degree of freedom of RCBD with split-plot design.
What is different? In RCBD, degree of freedom (df) was 45, but in split-plot design, it was splitted as error a which is the error for main plot (9) and error b which is the error for sub-plot (36).
3) F-ratio
From now on, let’s focus on F-ratio in each ANOVA table. F-ratio is the ratio between each Mean square and mean square error (MSE).
For example, F-ratio of genotype in RCBD is calculated as 949.3/30.0 ≈ 31.65
. How about F-ratio of genotype in split-plot design? it was 949.3/68.70 ≈ 13.81
The mean square is the same as 949.3, and only MSE was changed.
MSE is calculated as sum of square in residual divided by degree of freedom in residual. In RCBD, MSE is 1349.5/45 ≈ 30.0
. However, in split-plot design, MSE at main plot is 618/9 ≈ 68.7
.
I said F-ratio is the ratio between each Mean square and mean square error (MSE). In split-plot design, MSE for genotype (main plot) was increased at the main plot (RCBD: 30.0, Split-plot design: 68.70). If MSE (denominator) is increased, F-ratio will be decreased. Low F-ratio indicates that the treatment effect (genotype) would be decreased (less significant). This is because when variance of errors (we call MSE) is increased, the genotypic effect would be unclear (We will not be able to exactly know the difference of yield is mainly due to the genotypic effect per se or just environmental errors).
I mentioned above,
“In split-plot design, compared with RCBD, significance of the main plot would be decreased, while significance of sub-plot would be increased. Also, the significance of interaction between main and sub-plot would be increased.
This is why significane of the main plot would be decreased. This is mainly due to increased variance of errors (MSE) in the main plot.
Now, let’s see the sub-plot. F-ratio of treatment in split-plot design was increased, 56.85/20.31 ≈ 2.79
, and F-ratio of interaction between treatment and genotype was also increased, 65.16/20.31 ≈ 3.20
, compare with RCBD (F-ratio in treatment and interaction is 1.89 and 2.17 respectively). This is because decreased MSE (split-plot: 20.31, RCBD: 30.0) in the sub-plot.
The main benefit of split-plot design is to split variance of errors (MSE) as error a and b, and it would increase significance of sub-plot, and the interaction between main and sub plot (This is due to less MSE, and therefore greater F-ratio).
4) Statistical model
This is statistical model for split-plot design about the experimental design.
Split-plot design
yijk= μ + αi + γk + ηik + βj + δij + εijk
yijk = each yield for treatment (ij, i=genotype, j=treatment) and replicate (k)
μ = grand mean of yield
αi = effect of genotype (i)
γk = block effect
ηik = error at main plot
βj= effect of treatment (j)
δij = interaction between genotype (i) x treatment (j)
εijk = error at sub plot
and if the experimental design is RCBD, statistical model will be
Two-way ANOVA
yijk = μ + αi + βj + δij + γk + εijk
yijk = each yield for treatment (ij, i=genotype, j=treatment) and replicate (k)
μ = grand mean of yield
αi = effect of genotype (i)
βj= effect of treatment (j)
δij = interaction between genotype (i) x treatment (j)
γk = block effect
εijk = residuals
Can you find what the difference between two models?
The only difference is that variance of error is divided into 2 (error a and b).
5) Practice to set up statistical model
Now, let’s practice to set up statistical model using a statistical program. In most programs, we can simply set up the statistical model for split-plot design. However, if we only accept the simple way, we never understand the definition of the statistical model.
In case of JMP, we should set up the model step by step. So I’ll explain how to set up the statistical model for split-plot design using JMP. It doesn’t matter whether you use JMP or not. Just see how the model is composed step by step.
Download data>> 4_treatments_4_genotypes_with_4_blocks.csv
If you already downloaded the data, please upload to JMP.
A) select Analyze > Fit model in menu
In Y, select ‘Yield’, and in Construct Model Effects, first select ‘Block’ and select ‘Random factor’ in Attributes.
Then Block will be shown as random factor. This is how to set up a factor as random factor in JMP. The whole model Construction will be below.
In the main plot, block and the interaction between genotype and block are random factors, and genotype is a fixed factor. In sub-plot, treatment and interaction between genotype and treatment are fixed factors. Let’s select Method as EMS (Traditional).
Now we can see the interaction between genotype and treatment. In split-plot design, the significance of interaction between two factors would be more increased than in RCBD, and I told you this is because due to splitted errors (MSE in error b will be less than MSE in RCBD).
If there are three factors, how can we divide plots? The answer is below