What is a nested model in statistics?

August 18, 2022 JK Comments 0 Comment

One tomato farmer is growing tomato seedlings, and all of sudden he wants to investigate the amount of calcium in leaves. So, he selected four tomato seedlings, and he randomly chose three leaves in each seedling and investigated the amount of calcium. He measured twice in each leaf.

This experimental design would be explained by below table.

y₁₁₁ means the amount of calcium in the 1^stseedling – 1^stleaf – 1^st replicate. Then, y₄₃₂ will mean the outcome of the 4^thseedling – 3^rdleaf – 2^nd replicate.

Now, I doubt whether this experimental design could be a factorial experiment or not.

The farmer did not have any rules to select leaves in each seedling. He just randomly selected 3 leaves. If he selected leaves with a certain rule (i.e. leaf at the most bottom, 3^rd leaf from the bottom, and leaf at the most top), and he applied this rule to select leaves in all seedlings, this experimental design will have two factors (seedlings and leaves) and we can regard this experimental design as a factorial experiment. In this case, however, leaves cannot be a fixed factor.

Instead, we say the leave factor is nested in regard to the seedling factor.

This is the linear model of nested treatment.

y_ijk = μ + τi + β_(i)j + ε_(ij)k

where y_ijk represents the calcium content of the rep k, leave j and seedling i. The subscript (i)j indicates that the level j of factor B, in this case leave j, is nested to the level i of factor A, in this case seedling. That is, it makes reference to leave j of seedling i.

The model could be explained by below equation.

It seems tricky, but if you follow me step by step, it’s just a simple arithmetic in elementary school.

# Download above data using R
Plant<- rep(c("P1","P2","P3","P4"), each=6)
Leaves<- rep(rep(c("L1","L2","L3"), each=2),4)
Rep<-rep(c(1,2),12)
Ca<- c(3.28, 3.09, 3.52, 3.48, 2.88, 2.80, 2.46, 2.44, 1.87, 1.92, 2.19, 2.19, 2.77, 2.66, 3.74, 3.44, 2.55, 2.55, 3.78, 3.87, 4.07, 4.12, 3.31, 3.31)
tomato<- data.frame(Plant,Leaves,Rep,Ca)

library(writexl)
write_xlsx (tomato,"C:/Users/Usuari/Desktop/tomato2.xlsx")
# check the pathway in your computer

Let’s say this is the data the farmer collected. For example, y₁₁₁ = 3.28 (1^st seedling, 1^st leaf, 1^st rep) and y₄₃₂ = 3.31 (4^th seedling, 3^rd leaf, 2^nd rep).

Now, I’ll summarize the mean of the data.

ȳ_… is the grand mean of total data. It’s 3.01.
ȳi_.. is the mean of treatment _i which is plant (seedling). The mean of each plant will be 3.18, 2.18, 2.95 and 3.74.
ȳ_(i)j is the mean of two values in leaves. This is because the leave factor (_j) is nested in regard to the the seedling factor (_j).

If we finished to calculate the mean of each case, now data partitioning is necessary.

Let’s go back to the below equation.

ȳ_… is the grain mean which is 3.01
(ȳ_i.. – ȳ_…) is the difference between mean of plant and grand mean, which indicates the effect of plant. It’s written as τ_i. For example, in ȳ₁₁₁, the calcium content was 3.28, the effect of plant (τ_i) is 0.16 (≈ 3.18 – 3.01)
(ȳ_ij. – ȳ_i..) is the difference between mean of leaves about two replicates (i.e. in ȳ₁₁₁, 3.28 +3.09) / 2 = 3.19 as the leave factor is nested in regard to the seedling factor) and mean of plants. For example, in ȳ₁₁₁, it’s 3.19 – 3.18 = 0.01. It’s β_(i)j
ȳ_ijk – ȳ_ij. is error which is the difference between each calcium value and mean of leaves about two replicates (β_(i)j). In ȳ₁₁₁, error is 3.28 – 3.19 = 0.10.

Therefore, ȳ₁₁₁ is partitioning as

3.28 = 3.01 (μ) + 0.16 (τ_i) + 0.01 (β_(i)j) + 0.10 (ε_(ij)k)
y_ijk = μ + τi + β_(i)j + ε_(ij)k

Now we need to square each partitioning data to avoid the sum of each partitioning data becomes zero (Remember!! the sum of deviation is always zero). Simply we can use =SUMSQ() in excel to calculate the sum of square in each case.

How to analyze a nested model in R?

Now, I’ll analyze the above data using R.

Plant<- rep(c("P1","P2","P3","P4"), each=6)
Leaves<- rep(rep(c("L1","L2","L3"), each=2),4)
Rep<-rep(c(1,2),12)
Ca<- c(3.28, 3.09, 3.52, 3.48, 2.88, 2.80, 2.46, 2.44, 1.87, 1.92, 2.19, 2.19, 2.77, 2.66, 3.74, 3.44, 2.55, 2.55, 3.78, 3.87, 4.07, 4.12, 3.31, 3.31)
tomato<- data.frame(Plant,Leaves,Rep,Ca)

library(lme4)
result <- lmer(Ca ~ (1|Plant)+ (1|Plant:Leaves), data=tomato)
summary(result)

Let’s check the variance in random effect. The total variance is ≈ 0.532 (=0.161170 + 0.364603 + 0.006654). Then the % of variance in each case is

plant ≈ 0.69
plant:leaves ≈ 0.30
residual ≈ 0.01

It means the most variability of the data is mainly due to plants (seedlings) ≈ 69%, and the environmental error generates variability less than 1%. Therefore, based on this result, we can suggest that if we have to increase replicates in the experiment, it might be better to increase number of plants rather than increasing replicate of trials.

For example, in this experiment, the farmer investigated calcium content twice per leaf in 4 plants (seedlings). However, based on this statistical result, if the farmer should increase replicates, it would be better to increase number of plants (i.e. 4 -> 8 plants) rather than increasing trials (i.e. 2 -> 4 times).

How to analyze a nested model in SAS?

In R, REML does not show statistical outcomes. Instead, it shows variance component. Now, I’ll introduce how to analyze a nested model in SAS. I uploaded Excel file to SAS and set up data name as WORK.TOMATO.

First, I’ll choose estimation method as Restricted maximum likelihood (REML).

proc mixed data=WORK.TOMATO method=reml alpha=0.05;
	class Plant Leaves;
	model Ca= /;
	random Plant Plant(Leaves) /;
run;

It shows Variance Components. It’s the same as in R.

Second, I’ll choose estimation method as Type 1.

proc mixed data=WORK.TOMATO method=type1 alpha=0.05;
	class Plant Leaves;
	model Ca= /;
	random Plant Plant(Leaves) /;
run;

When choosing type 1, we can see the statistical outcomes.

Let’s check the sum of squares.

SS of plant ≈ 7.56
SS of plant (leaves) ≈ 2.63
SS residual ≈ 0.08

This value is the same as what we calculated by Excel. SAS shows both plant and leaves (nested to plants) are significant.

Agronomy4future

Stories about cereals and statistics (plus coding)

What is a nested model in statistics?

August 18, 2022 JK Comments 0 Comment

Now, I doubt whether this experimental design could be a factorial experiment or not.

Instead, we say the leave factor is nested in regard to the seedling factor.

y_ijk = μ + τi + β_(i)j + ε_(ij)k

How to analyze a nested model in R?

How to analyze a nested model in SAS?

Leave a Reply Cancel reply

Now, I doubt whether this experimental design could be a factorial experiment or not.

Instead, we say the leave factor is nested in regard to the seedling factor.

yijk = μ + τi + β(i)j + ε(ij)k

How to analyze a nested model in R?

How to analyze a nested model in SAS?

Leave a Reply Cancel reply

y_ijk = μ + τi + β_(i)j + ε_(ij)k