Calculating Predicted Values for Each Group in Basic Modeling

November 5, 2023 JK

□ The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)

In my previous post, I explained how to estimate dependent values from fitting models. Now I’ll explain how to add this predicted value to the original data using R.

First, let’s upload data to R.

library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/responsiveness.csv"
dataA=data.frame(read_csv(url(github),show_col_types=FALSE))

dataA
    tissue     row plot treatment  yield responsiveness
1     Head outside    5   Control 159.30           <NA>
2     Head outside    5       Tr1  59.70         -62.5%
3     Head outside    5       Tr2  97.50         -38.8%
4     Stem outside    5   Control  74.65           <NA>
5     Stem outside    5       Tr1  34.05         -54.4%
6     Stem outside    5       Tr2  76.65           2.7%
7     Head outside    6   Control  99.30           <NA>
8     Head outside    6       Tr1 109.00           9.8%
9     Head outside    6       Tr2  82.30         -17.1%
10    Stem outside    6   Control  53.95           <NA>
11    Stem outside    6       Tr1  52.95          -1.9%
12    Stem outside    6       Tr2  65.45          21.3%
.
.
.

Now, I’ll predict yield using the model. I believe that ‘row’ represents a random factor for each treatment, so I’d like to adjust the residuals using BLUP (Best Linear Unbiased Predictor), with the model specified as y ~ 1 + (1|row).

If we consider ‘row’ to be a random factor that represents variability due to different rows, and we want to account for this random variation in our model, we can use BLUP (Best Linear Unbiased Predictor) to estimate random effects associated with ‘row.’ The model y ~ 1 + (1|row) indicates that we are modeling the fixed effect (intercept) and the random effect associated with ‘row.’ The random effect term (1|row) allows for individual variations among different ‘row’ levels, treating ‘row’ as a random factor. The BLUPs for the ‘row’ levels will be estimated to account for this random variation. This approach is commonly used in linear mixed models to handle situations where we have random factors or grouping variables that introduce variability in our data that should be accounted for in our model. Using BLUP in this context helps us obtain the best linear predictors for these random effects while considering the overall fixed effect (intercept) in our model.

So, I created code for each combination of ’tissue’ and ‘treatment’ groups as shown below.

library(lme4)
library(Matrix)

head_control=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Head" & treatment=="Control"))

Stem_control=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Stem" & treatment=="Control"))

head_TR1=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Head" & treatment=="Tr1"))

Stem_TR2=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Stem" & treatment=="Tr2"))

When we use a BLUP model to obtain predicted values, it means that we’re using the model to estimate the values of random effects while taking into account the information from the fixed effects and the covariance structure of the data. These predicted values are adjusted for the residuals, which is important for reducing the influence of noise in the data. In a nutshell, BLUP predictions are obtained by combining information from both fixed and random effects, which results in more accurate and unbiased predictions compared to traditional least squares estimates. The residuals are typically included in this process to ensure that the predictions account for any variability or noise in the data. This makes BLUP a powerful tool for making predictions and estimating random effects in various statistical modeling scenarios.

For example, when using the model with the conditions of tissue as ‘head’ and treatment as ‘Control,’ we can obtain the predicted values.

predict(head_control)
114.24174, 114.24174, 114.24174, 114.24174, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878

predict(Stem_control)
predict(head_TR1)
predict(Stem_TR2)

I will then add these values to the original data. I’ll use the below code.

predicted_values1=predict(head_control, newdata = dataA[which(dataA$tissue == "Head" & dataA$treatment == "Control"),])

predicted_values2=predict(Stem_control, newdata=dataA[which(dataA$tissue == "Stem" & dataA$treatment == "Control"),])

predicted_values3=predict(head_TR1, newdata=dataA[which(dataA$tissue == "Head" & dataA$treatment == "Tr1"),])

predicted_values4=predict(Stem_TR2, newdata=dataA[which(dataA$tissue == "Stem" & dataA$treatment == "Tr2"),])

dataA$BLUP=rep(NA, nrow(dataA)) # Create a new column with NA values

dataA$BLUP[which(dataA$tissue=="Head" & dataA$treatment=="Control")]=predicted_values1

dataA$BLUP[which(dataA$tissue=="Stem" & dataA$treatment=="Control")]=predicted_values2

dataA$BLUP[which(dataA$tissue=="Head" & dataA$treatment=="Tr2")]=predicted_values3
dataA$BLUP[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2")]=predicted_values4

dataA
    tissue     row plot treatment  yield responsiveness prediction      BLUP
1     Head outside    5   Control 159.30           <NA>   73.98139 114.24174
2     Head outside    5       Tr1  59.70         -62.5%   73.98139        NA
3     Head outside    5       Tr2  97.50         -38.8%   73.98139 119.52771
4     Stem outside    5   Control  74.65           <NA>   73.98139  64.37642
5     Stem outside    5       Tr1  34.05         -54.4%   55.43495        NA
6     Stem outside    5       Tr2  76.65           2.7%   55.43495  73.98139
7     Head outside    6   Control  99.30           <NA>   63.21595 114.24174
8     Head outside    6       Tr1 109.00           9.8%   56.49138        NA
9     Head outside    6       Tr2  82.30         -17.1%   56.49138 119.52771
10    Stem outside    6   Control  53.95           <NA>   55.43495  64.37642
11    Stem outside    6       Tr1  52.95          -1.9%   55.43495        NA
12    Stem outside    6       Tr2  65.45          21.3%   63.21595  73.98139
13    Head outside    7   Control 119.40           <NA>   56.49138 114.24174
14    Head outside    7       Tr1  90.30         -24.4%   56.49138        NA
15    Head outside    7       Tr2  95.60         -19.9%   55.43495 119.52771
.
.
.

Let’s add one more variable; plant number.

dataA$plant_no= c(9, 9, 9, 9, 9, 9, 18, 18, 18, 18, 18, 18, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 12, 12, 12, 12, 12, 12, 16, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 11, 14, 14, 14, 14, 14, 14, 25, 25, 25, 25, 25, 25, 13, 13, 13, 13, 13, 13, 16, 16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 10, 10, 10, 10, 10, 10, 14, 14, 14, 14, 14, 14, 12, 12, 12, 12, 12, 12, 23, 23, 23, 23, 23, 23, 14, 14, 14, 14, 14, 14, 18, 18, 18, 18, 18, 18)

Now, I think both planting ‘row and plant number is a random factor.

head_control=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Head" & treatment=="Control"))

stem_control=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Stem" & treatment=="Control"))

head_Tr1=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Head" & treatment=="Tr1"))

stem_Tr2=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Stem" & treatment=="Tr2"))

predicted_values1=predict(head_control, newdata=dataA[which(dataA$tissue=="Head" & dataA$treatment=="Control"),])

predicted_values2=predict(stem_control, newdata=dataA[which(dataA$tissue== "Stem" & dataA$treatment=="Control"),])
predicted_values3=predict(head_Tr1, newdata=dataA[which(dataA$tissue=="Head" & dataA$treatment=="Tr1"),])
predicted_values4=predict(stem_Tr2, newdata=dataA[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2"),])

dataA$BLUP2=rep(NA, nrow(dataA))  # Create a new column with NA values

dataA$BLUP2[which(dataA$tissue=="Head" & dataA$treatment=="Control")]= predicted_values1
dataA$BLUP2[which(dataA$tissue=="Stem" & dataA$treatment=="Control")]= predicted_values2
dataA$BLUP2[which(dataA$tissue=="Head" & dataA$treatment=="Tr2")]= predicted_values3
dataA$BLUP2[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2")]= predicted_values4

dataA
    tissue     row plot treatment  yield responsiveness      BLUP plant_no     BLUP2
1     Head outside    5   Control 159.30           <NA> 114.24174        9 129.61517
2     Head outside    5       Tr1  59.70         -62.5%        NA        9        NA
3     Head outside    5       Tr2  97.50         -38.8% 119.52771        9 111.87921
4     Stem outside    5   Control  74.65           <NA>  64.37642        9  69.45422
5     Stem outside    5       Tr1  34.05         -54.4%        NA        9        NA
6     Stem outside    5       Tr2  76.65           2.7%  73.98139        9  80.37009
7     Head outside    6   Control  99.30           <NA> 114.24174       18 107.84756
8     Head outside    6       Tr1 109.00           9.8%        NA       18        NA
9     Head outside    6       Tr2  82.30         -17.1% 119.52771       18 116.54366
10    Stem outside    6   Control  53.95           <NA>  64.37642       18  56.49917
11    Stem outside    6       Tr1  52.95          -1.9%        NA       18        NA
12    Stem outside    6       Tr2  65.45          21.3%  73.98139       18  66.13558
13    Head outside    7   Control 119.40           <NA> 114.24174       13 114.59756
14    Head outside    7       Tr1  90.30         -24.4%        NA       13        NA
15    Head outside    7       Tr2  95.60         -19.9% 119.52771       13 117.12605

Agronomy4future

Stories about cereals and statistics (plus coding). We aim to develop open-source code for agronomy.

Calculating Predicted Values for Each Group in Basic Modeling

November 5, 2023 JK

□ The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)