What is Finlay-Wilkinson Regression Model?

What is Finlay-Wilkinson Regression Model?


The genotype is dependent on environmental changes. One genotype may strongly respond to certain environmental conditions, while another genotype may weakly respond to the same conditions. If some genotypes strongly respond under better conditions, they would be adaptable to the environment.

Adaptability refers to the flexibility of a genotype in its response to improved environments. 

If a certain genotype exhibits high performance across a wide range of environmental conditions, it would be considered to have broad adaptation.

To achieve this definition, two prerequisites are required.

  1. In general, greater performances
  2. Less variation across environmental ranges


I’ll create one dataset.

Env= c("High_inoc","High_NO_inoc","Low_inoc", "Low_NO_inoc")
CV1= c(30,150,20,100)
CV2= c(74,99,49,73)
CV3= c(78,106,56,69)
CV4= c(86,92,66,70)
CV5= c(74,98,57,79)
Data= data.frame(Env,CV1,CV2,CV3,CV4,CV5)
Data

Env	        CV1	CV2	CV3	CV4	CV5
High_inoc	30	74	78	86	74
High_NO_inoc	150	99	106	92	98
Low_inoc	20	49	56	66	57
Low_NO_inoc	100	73	69	70	79

There are five different genotypes (CV1 – CV5), and each genotype was subjected to four different treatments. Let’s assume that the combination of organic matter and virus inoculation was applied to each genotype to observe the differences in final yield.

High_inoc: High organic matter + virus inoculation
High_NO_inoc: High organic matter + virus free
Low_inoc: Low organic matter + virus inoculation
Low_NO_inoc : Low organic matter + virus free

Which treatment would affect yield the most negatively? Maybe under the low organic matter and inoculation (the worst condition), the yield would be the lowest.

Let’s see it’s true.

library(dplyr)
Data$Mean= rowMeans (Data %>% select(-Env))
Data= rbind(Data, c("Mean", colMeans(Data %>% select(-Env))))

Data
Env	        CV1	CV2	CV3	CV4	CV5	Mean
High_inoc	30	74	78	86	74	68.4
High_NO_inoc	150	99	106	92	98	109.0
Low_inoc	20	49	56	66	57	49.6
Low_NO_inoc	100	73	69	70	79	78.2
Mean	        75.0	73.75	77.25	78.5	77.0	76.3

We can calculate two different averages: each genotype across treatments, and each treatment (or environment) across genotypes. As we assumed, the condition of low organic matter and inoculation (Low_inoc) showed the lowest yield (49.6). For genotype, CV4 showed the greatest yield across treatments (78.5).

So, is it okay to say that CV4 is the best cultivar for yield? It would be the best cultivar for yield as it shows the greatest yield across environmental conditions. However, let’s focus on the worst condition (Low_inoc). If CV4 is the best cultivar, it should also show the greatest yield in Low_inoc. However, it seems not because it is 66 in Low_inoc, showing the lowest yield. As mentioned above, the average of Low_inoc was 49.6, and CV4 contributes to the lowest yield in Low_inoc. Therefore, we should not only consider yield but also adaptability when comparing genotypes across different environmental conditions.



Environmental Index will explain adaptability.

Now, I’ll calculate environmental index. This is the difference between the mean of each environment and the grand mean (X.. - X.j).

68.4 – 76.3 = -7.9
109 – 76.3 = 32.7
49.6 – 76.3 = -26.7
78.2 – 76.3 = 1.9

library(dplyr)
Data$Mean= as.numeric(Data$Mean)
Data$Env_index= Data$Mean - Data$Mean[nrow(Data)]

Data
Env       	CV1	CV2	CV3	CV4	CV5	Mean	Env_index
High_inoc	30	74	78	86	74	68.4	-7.9
High_NO_inoc	150	99	106	92	98	109.0	32.7
Low_inoc	20	49	56	66	57	49.6	-26.7
Low_NO_inoc	100	73	69	70	79	78.2	1.9
Mean	        75.0	73.75	77.25	78.5	77.0	76.3	0.0

I calculated the environmental index for four different environmental conditions. As expected, at Low_inoc (low organic matter + virus inoculation), it shows the lowest environmental index (-26.7). Conversely, at High_NO_inoc (high organic matter + virus-free), it shows the highest environmental index (32.7).



Environmental Index will be independent variable in linear regression.

Now, each genotype will be fitted by linear regression. Of course, yield will be the dependent variable (y). Then, what would be the independent variable (x) in the regression model? In the Finlay-Wilkinson Regression Model, the environmental index becomes the independent variable (x). Therefore, the environmental index should be stacked in rows.

First, I’ll delete the mean I calculated.

Data= Data [-5,-7]  # delete 5th row and 7th column

Data
        Env	        CV1	CV2	CV3	CV4	CV5
1	High_inoc	30	74	78	86	74
2	High_NO_inoc	150	99	106	92	98
3	Low_inoc	20	49	56	66	57
4	Low_NO_inoc	100	73	69	70	79

Then, I’ll stack data in rows using the below code.

library(tidyr)
df= data.frame(
               Data %>%
               pivot_longer(
               cols=c(CV1, CV2, CV3, CV4, CV5),
               names_to="Genotpye", values_to="Yield")
               )

df
Env	     Env_index	Genotype Yield
High_inoc	-7.9	CV1	 30
High_inoc	-7.9	CV2	 74
High_inoc	-7.9	CV3	 78
High_inoc	-7.9	CV4	 86
High_inoc	-7.9	CV5	 74
High_NO_inoc	32.7	CV1	 150
.
.
.

Now I’ll fit each genotype with an environmental index using the code below.

library(dplyr)
summary(lm (Yield ~ Env_index, data=dataA %>% filter (Genotype=="CV1")))
summary(lm (Yield ~ Env_index, data=dataA %>% filter (Genotype=="CV2")))
summary(lm (Yield ~ Env_index, data=dataA %>% filter (Genotype=="CV3")))
summary(lm (Yield ~ Env_index, data=dataA %>% filter (Genotype=="CV4")))
summary(lm (Yield ~ Env_index, data=dataA %>% filter (Genotype=="CV5")))

or

regression= dataA%>% group_by(Genotype) %>% do(model= lm(Yield~Env_index, data=.))

regression$model

[[1]]

Call:
lm(formula = Yield ~ Env_index, data = .)

Coefficients:
(Intercept)    Env_index  
      75.00         2.34  

[[2]]

Call:
lm(formula = Yield ~ Env_index, data = .)

Coefficients:
(Intercept)    Env_index  
    73.7500       0.8025  

[[3]]

Call:
lm(formula = Yield ~ Env_index, data = .)

Coefficients:
(Intercept)    Env_index  
     77.250        0.804  

[[4]]

Call:
lm(formula = Yield ~ Env_index, data = .)

Coefficients:
(Intercept)    Env_index  
    78.5000       0.3786  

[[5]]

Call:
lm(formula = Yield ~ Env_index, data = .)

Coefficients:
(Intercept)    Env_index  
    77.0000       0.6754  

However, I recommend using the following code.

library(dplyr)
regression1= df %>% group_by(Genotype) %>% do(tidy(lm(Yield ~ Env_index, data=.)))

regression1
Genotype	term	estimate	std.error	statistic	p.value
CV1	(Intercept)	75.0000000	12.1638994	6.165786	0.025309723
CV1	Env_index	2.3395736	0.5658852	4.134361	0.053823885
CV2	(Intercept)	73.7500000	2.7528814	26.790112	0.001390415
CV2	Env_index	0.8024564	0.1280687	6.265828	0.024537235
CV3	(Intercept)	77.2500000	4.3607461	17.714859	0.003171427
CV3	Env_index	0.8039714	0.2028693	3.963002	0.058171978
CV4	(Intercept)	78.5000000	5.0252942	15.620976	0.004073090
CV4	Env_index	0.3786387	0.2337852	1.619601	0.246746749
CV5	(Intercept)	77.0000000	1.1734503	65.618456	0.000232165
CV5	Env_index	0.6753598	0.0545909	12.371290	0.006470516

Now, let’s focus on the slope of the regression in each genotype. Which genotype shows the steepest slope? It’s CV1 (2.34).

The model equation of CV1 is y = 75.0 + 2.317 * Env_Index. This means that when the environmental index increases by 1, yield will increase by 2.317 times.

For a clearer visualization, I’ll draw a regression graph.

library(ggplot2)
ggplot(data=df, aes(x=as.numeric(Env_index), y=as.numeric(Yield), group=Genotype)) +
  geom_smooth(method = lm, level=0.95, se=FALSE, linetype=1, color="Dark red", linewidth=0.5, formula= y ~ x) +
  geom_point (aes(shape=Genotype, fill=Genotype), col="Black", size=5, stroke = 0.5) +
  scale_shape_manual(values = c(21, 22, 23, 24, 25)) +
  scale_fill_manual(values = c("Black","Red",'Blue',"Orange","Green")) +
  scale_x_continuous(breaks = seq(-40,40,10), limits = c(-40,40)) +
  scale_y_continuous(breaks = seq(0,200,20), limits = c(0,200)) +
  labs(x="Environmental Index", y="Yield") +
  theme_classic(base_size=18, base_family="serif")+
  theme(legend.position=c(0.89,0.17),
        legend.title=element_blank(),
        legend.key=element_rect(color="white", fill="white"),
        legend.text=element_text(family="serif", face="plain",
                                 size=15, color= "Black"),
        legend.background=element_rect(fill="white"),
        axis.line=element_line(linewidth=0.5, colour="black"))

Now it’s much clear. CV1 shows the steepest slope among genotypes. If we consider only yield across environmental conditions, CV4 showed the greatest yield, but it shows less slope than CV1.

CV1: y=75.0 + 2.317 * Env_Index
CV4: y=78.5 + 0.374 * Env_Index

Therefore, in terms of adaptability, we might be able to say CV4 is the best cultivar. That is, in Finlay-Wilkinson Regression Model, CV4 would have the greatest environmental adaptability.

full code: https://github.com/agronomy4future/r_code/blob/main/What_is_Finlay_Wilkinson_Regression_Model.ipynb


The model equation of Finlay-Wilkinson regression

p = G + βE + e

where 
G is intercept (genotypic effect)
β is slope (sensitivity to environment; adaptability))
e is error

I explained how to obtain β.

When we analyze our field data across environmental conditions, β would provide many more insights about our crops. Particularly, when we analyze genotypes under multi-environmental trials (METs), the Finlay-Wilkinson regression would be very useful to understand how genotypes respond to environments.



One thought on “What is Finlay-Wilkinson Regression Model?

  1. The model equation of Finlay-Wilkinson regression

    p = G + βE + e
    where
    G is intercept (genotypic effect)
    β is slope (sensitivity to environment; adaptability))
    e is error

Leave a Reply

If you include a website address in the comment section, I cannot see your comment as it will be automatically deleted and will not be posted. Please refrain from including website addresses.