[R package] Finlay-Wilkinson Regression model (feat. fwrmodel)

August 24, 2024 JK

■ What is Finlay-Wilkinson Regression Model?

In my previous post, I introduced what Finlay-Wilkinson Regression Model is and how to calculate adaptability (or stability). Actually, adaptability and stability are opposite concept with the same data. Have you ever heard heritability (h²)? Heritability is a key concept in genetics and breeding that measures how much of the variation in a trait within a population is due to genetic differences among individuals. In other words, it quantifies the proportion of phenotypic variation (observable traits) that can be attributed to genetic factors.

High Heritability means that a large portion of the trait’s variation is due to genetic differences, suggesting that the trait is more stable and less influenced by environmental factors. This implies that the trait is less plastic, or less responsive to changes in the environment. On the other hand, low heritability indicates that environmental factors play a larger role in the variation of the trait. Traits with low heritability are more plastic, meaning they show greater variability in response to different environmental conditions.

This is a ranking of #plasticity in wheat yield components. I discussed this topic during my guest lecture at University of Guelph. pic.twitter.com/qEDWzuXwih
— J.K Kim (@el_trigo_JK) March 23, 2023

Adaptability and stability are analogous to the inverse relationship between heritability and plasticity.

Adaptability refers to a trait’s ability to change or respond effectively to different environmental conditions. A trait with high adaptability will show significant variation in response to environmental changes. For instance, a crop variety with high adaptability may perform well across a wide range of environments, but its performance might vary significantly depending on the conditions. High adaptability generally means a trait has greater plasticity, or flexibility, to respond to environmental changes. This can be beneficial in environments that are highly variable but may come with trade-offs in terms of consistency.

On the other hand, stability reflects how consistent a trait’s performance is across varying environments. A trait with high stability will show little variation in response to different environmental conditions, indicating that it performs reliably regardless of changes in the environment. High stability indicates that a trait performs consistently across different environments, but it might not show as much variation or responsiveness to environmental changes. This can be advantageous for ensuring reliable performance but may not be as flexible in adapting to new or extreme conditions.

Therefore, the concepts of adaptability and stability are indeed complementary but inversely related: improving one can often come at the expense of the other. Understanding this trade-off is crucial in fields like crop breeding, where both adaptability and stability might be desired but need to be balanced according to specific goals and environmental contexts.

■ Quantifying Phenotypic Plasticity of Crops

To calculate adaptability or stability (hereafter referred to as stability), the first step is to calculate the environmental index.

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)

Condition= c("High_inoc", "High_NO_inoc", "Low_inoc", "Low_NO_inoc")
CV= matrix(c(30, 74, 78, 86, 74, 150, 99, 106, 92, 98, 20, 49, 56, 66, 57, 100, 73, 69, 70, 79), ncol= 5, byrow= TRUE)
colnames(CV)= paste0("CV", 1:5)
df= data.frame(Condition, CV)

df$Mean= rowMeans (df %>% select(-Condition))
df= rbind(df, c("Mean", colMeans(df %>% select(-Condition))))

Env	        CV1	CV2	CV3	CV4	CV5	Mean
High_inoc	30	74	78	86	74	68.4
High_NO_inoc	150	99	106	92	98	109.0
Low_inoc	20	49	56	66	57	49.6
Low_NO_inoc	100	73	69	70	79	78.2
Mean	        75.0	73.75	77.25	78.5	77.0	76.3

df$Mean= as.numeric(df$Mean)
df$Env_index= df$Mean - df$Mean[nrow(df)]

Env       	CV1	CV2	CV3	CV4	CV5	Mean	Env_index
High_inoc	30	74	78	86	74	68.4	-7.9
High_NO_inoc	150	99	106	92	98	109.0	32.7
Low_inoc	20	49	56	66	57	49.6	-26.7
Low_NO_inoc	100	73	69	70	79	78.2	1.9
Mean	        75.0	73.75	77.25	78.5	77.0	76.3	0.0

The code provided outlines the general process for calculating the environmental index. However, most datasets we work with will not follow the structure shown above. I combined two variables—nitrogen (high or low) and virus (inoculated or non-inoculated)—into a single factor and displayed the genotypes (CV1 to CV5) horizontally. In an actual dataset, the structure would be more like the example below.

Nitrogen Virus   Genotype Yield
High     inoc    CV1      30
High     inoc    CV2      74  
High     inoc    CV3      78  
High     inoc    CV4      86  
High     inoc    CV5      74
High     noinoc  CV1      150
High     noinoc  CV2      99 
High     noinoc  CV3      106  
High     noinoc  CV4      92  
High     noinoc  CV5      98
Low      inoc    CV1      20
Low      inoc    CV2      49  
Low      inoc    CV3      56  
Low      inoc    CV4      66  
Low      inoc    CV5      57
Low      noinoc  CV1      100
Low      noinoc  CV2      73 
Low      noinoc  CV3      69  
Low      noinoc  CV4      70  
High     noinoc  CV5      79

In this case, we need to reorganize the dataset and calculate the environmental index step by step.

■ `fwrmodel()` package

To solve this program, recently I developed R package to calculate environmental index and stability simultaneously. It’s fwrmodel() package.

Github: https://github.com/agronomy4future/fwrmodel

Let’s upload one dataset.

if(!require(readr)) install.packages("readr")
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/fwrm_package_data_practice.csv"
df= data.frame(read_csv(url(github), show_col_types=FALSE))

set.seed(100)
df[sample(nrow(df),5),]
    year variety nitrogen location   AGW   KN      GY
202 2023     cv2       N0 Nebraska 318.6 3764 11864.8
112 2024     cv2       N1     Iowa 313.6 4570 14177.8
206 2023     cv2       N0 Nebraska 338.9 3764 12619.8
4   2023     cv1       N1     Iowa 311.7 3476 10721.4
98  2024     cv1       N1     Iowa 263.9 5618 14669.2
.
.
.

Let’s assume this data represents grain weight (AGW), grain number per m² (KN), and grain yield (kg/ha, GY) of corn across different states in the U.S. for two different seasons. I want to calculate the stability of each yield component and grain yield to determine which genotypes show better stability for AGW, KN, and GY, respectively.

First, I’ll install the package, but before that, it’s necessary to install Rtools because my package was developed outside of CRAN.

https://cran.r-project.org/bin/windows/Rtools

Please install the version corresponding to your current R version, and install the fwrmodel() package as the following code.

if(!require(remotes)) install.packages("remotes")
remotes::install_github("agronomy4future/fwrmodel")

library(remotes)
library(fwrmodel)

and I’ll run the code as shown below to calculate the environmental index.

stability= fwrmodel(df, env_cols = c("year", "nitrogen", "location"), 
                    genotype_col= "variety", yield_cols= c("AGW","GY","KN"))

env_index_cal= data.frame(stability$env_index)

set.seed(100)
env_index_cal[sample(nrow(env_index_cal),5),]
      variety year nitrogen location     Environments Env_index_AGW   AGW Env_index_GY      GY Env_index_KN   KN
20170     cv2 2023       N0 Nebraska 2023_N0_Nebraska      3.443333 318.6    -435.7763 11810.8    -214.5600 3488
16887     cv1 2023       N0 Illinois 2023_N0_Illinois     -1.523333 307.0   -2300.4197  8130.4    -786.9267 2742
3430      cv2 2023       N1     Iowa     2023_N1_Iowa    -22.023333 337.4    -202.7063 11931.9     195.4400 3721
3696      cv2 2023       N1     Iowa     2023_N1_Iowa    -22.023333 289.0    -202.7063 10474.6     195.4400 3411
20474     cv2 2023       N0 Nebraska 2023_N0_Nebraska      3.443333 348.8    -435.7763 10947.9    -214.5600 3672

I want to calculate the environmental index for AGW, KN, and GY across year, nitrogen, and location, respectively. The code to do this is as shown above.

Note 1
Setting up environments to calculate environmental index is important because it directly affects stability (slope). For example, if you think the year is not an important factor for crop yield in this case, you might combine environments with only nitrogen and location. This could result in a completely different slope for each genotype. Therefore, the most important thing is to understand your own dataset and determine the best combination of environments to calculate stability.

Next, I’ll calculate stability.

coefficient_AGW= as.data.frame(stability$regression$AGW)
coefficient_KN= as.data.frame(stability$regression$KN)
coefficient_GY= as.data.frame(stability$regression$GY)

Let’s check the coefficient for AGW.

print(coefficient_AGW)
  variety          term    estimate  std.error  statistic  p.value
1     cv1   (Intercept) 297.9520000 3.95194803  75.393704  1.368777e-88
2     cv1 Env_index_AGW   0.7523172 0.14292608  5.263681   8.336224e-07
3     cv2   (Intercept) 318.8220000 2.58702388  123.238909 2.795793e-109
4     cv2 Env_index_AGW   0.8862399 0.09356226  9.472194   1.693111e-15
5     cv3   (Intercept) 312.6860000 2.64794053  118.086489 1.787034e-107
6     cv3 Env_index_AGW   1.3614429 0.09576537  14.216443  1.487695e-25

The slope represents stability: a flatter slope indicates greater stability across environments, while a steeper slope suggests less stability. In this data, cv1 show better stability.

Let’s visualize this!!

if(!require(dplyr)) install.packages("dplyr")
library(dplyr)

if(!require(ggplot2)) install.packages("ggplot2")
library(ggplot2)

data_summary= data.frame(env_index_cal %>%
                           group_by(Environments, variety) %>%
                           dplyr::summarize(across(c(AGW, Env_index_AGW),
                                                   .fns = list(Mean = mean,
                                                               n = length))))

ggplot(data=data_summary, aes(x=Env_index_AGW_Mean, y=AGW_Mean))+
  geom_smooth(aes(group=variety, color=variety), method='lm', linetype=1, se=FALSE,
              formula=y~x, linewidth=0.5)+
  geom_point(aes(fill=variety, shape=variety), color="black", size=4)+
  scale_color_manual(values=c("grey15","grey35","grey75"))+
  scale_fill_manual(values=c("grey15","grey35","grey75"))+
  scale_shape_manual(values=c(21,22,23))+
  scale_x_continuous(breaks=seq(-50,50,25), limits=c(-50,50))+
  scale_y_continuous(breaks=seq(200,400,50), limits=c(200,400))+
  labs(x="Environmental_Index", y="Average grain weight (mg)") +
  theme_classic(base_size=20, base_family="serif")+
  theme(legend.position=c(0.89,0.13),
        legend.title=element_blank(),
        legend.key=element_rect(color=alpha("white",.001),
                                fill=alpha("white",.001)),
        legend.background=element_rect(fill=alpha("white",.001)),
        panel.grid.major= element_line(color="grey90", linetype="dashed"),
        axis.line=element_line(linewidth=0.5, colour="black"))

The slope of cv1 is the flattest, while cv3 is the steepest among three genotypes, as indicated by the coefficient_AGW. Therefore, cv1 shows the most stability for grain weight, whereas cv3 is more variable or plastic across environments.

□ code summary: https://github.com/agronomy4future/r_code/blob/main/R_package__fwrmodel.ipynb

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.

[R package] Finlay-Wilkinson Regression model (feat. fwrmodel)

August 24, 2024 JK

■ What is Finlay-Wilkinson Regression Model?

■ Quantifying Phenotypic Plasticity of Crops

■ `fwrmodel()` package

■ What is Finlay-Wilkinson Regression Model?

■ Quantifying Phenotypic Plasticity of Crops

■ fwrmodel() package

■ `fwrmodel()` package