Calculating Predicted Values for Each Group in Basic Modeling
□ The Best Linear Unbiased Estimator (BLUE): Step-by-Step Guide using R (with AllInOne Package)
In my previous post, I explained how to estimate dependent values from fitting models. Now I’ll explain how to add this predicted value to the original data using R.
First, let’s upload data to R.
library(readr)
github="https://raw.githubusercontent.com/agronomy4future/raw_data_practice/main/responsiveness.csv"
dataA=data.frame(read_csv(url(github),show_col_types=FALSE))
dataA
tissue row plot treatment yield responsiveness
1 Head outside 5 Control 159.30 <NA>
2 Head outside 5 Tr1 59.70 -62.5%
3 Head outside 5 Tr2 97.50 -38.8%
4 Stem outside 5 Control 74.65 <NA>
5 Stem outside 5 Tr1 34.05 -54.4%
6 Stem outside 5 Tr2 76.65 2.7%
7 Head outside 6 Control 99.30 <NA>
8 Head outside 6 Tr1 109.00 9.8%
9 Head outside 6 Tr2 82.30 -17.1%
10 Stem outside 6 Control 53.95 <NA>
11 Stem outside 6 Tr1 52.95 -1.9%
12 Stem outside 6 Tr2 65.45 21.3%
.
.
.
Now, I’ll predict yield using the model. I believe that ‘row’ represents a random factor for each treatment, so I’d like to adjust the residuals using BLUP (Best Linear Unbiased Predictor), with the model specified as y ~ 1 + (1|row)
.
If we consider ‘row’ to be a random factor that represents variability due to different rows, and we want to account for this random variation in our model, we can use BLUP (Best Linear Unbiased Predictor) to estimate random effects associated with ‘row.’ The model y ~ 1 + (1|row)
indicates that we are modeling the fixed effect (intercept) and the random effect associated with ‘row.’ The random effect term (1|row)
allows for individual variations among different ‘row’ levels, treating ‘row’ as a random factor. The BLUPs for the ‘row’ levels will be estimated to account for this random variation. This approach is commonly used in linear mixed models to handle situations where we have random factors or grouping variables that introduce variability in our data that should be accounted for in our model. Using BLUP in this context helps us obtain the best linear predictors for these random effects while considering the overall fixed effect (intercept) in our model.
So, I created code for each combination of ’tissue’ and ‘treatment’ groups as shown below.
library(lme4)
library(Matrix)
head_control=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Head" & treatment=="Control"))
Stem_control=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Stem" & treatment=="Control"))
head_TR1=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Head" & treatment=="Tr1"))
Stem_TR2=lmer(yield ~ 1 + (1|row), data=subset(dataA, tissue=="Stem" & treatment=="Tr2"))
When we use a BLUP model to obtain predicted values, it means that we’re using the model to estimate the values of random effects while taking into account the information from the fixed effects and the covariance structure of the data. These predicted values are adjusted for the residuals, which is important for reducing the influence of noise in the data. In a nutshell, BLUP predictions are obtained by combining information from both fixed and random effects, which results in more accurate and unbiased predictions compared to traditional least squares estimates. The residuals are typically included in this process to ensure that the predictions account for any variability or noise in the data. This makes BLUP a powerful tool for making predictions and estimating random effects in various statistical modeling scenarios.
For example, when using the model with the conditions of tissue as ‘head’ and treatment as ‘Control,’ we can obtain the predicted values.
predict(head_control)
114.24174, 114.24174, 114.24174, 114.24174, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878, 72.32756, 72.32756, 64.82558, 74.28878, 74.28878
predict(Stem_control)
predict(head_TR1)
predict(Stem_TR2)
I will then add these values to the original data. I’ll use the below code.
predicted_values1=predict(head_control, newdata = dataA[which(dataA$tissue == "Head" & dataA$treatment == "Control"),])
predicted_values2=predict(Stem_control, newdata=dataA[which(dataA$tissue == "Stem" & dataA$treatment == "Control"),])
predicted_values3=predict(head_TR1, newdata=dataA[which(dataA$tissue == "Head" & dataA$treatment == "Tr1"),])
predicted_values4=predict(Stem_TR2, newdata=dataA[which(dataA$tissue == "Stem" & dataA$treatment == "Tr2"),])
dataA$BLUP=rep(NA, nrow(dataA)) # Create a new column with NA values
dataA$BLUP[which(dataA$tissue=="Head" & dataA$treatment=="Control")]=predicted_values1
dataA$BLUP[which(dataA$tissue=="Stem" & dataA$treatment=="Control")]=predicted_values2
dataA$BLUP[which(dataA$tissue=="Head" & dataA$treatment=="Tr2")]=predicted_values3
dataA$BLUP[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2")]=predicted_values4
dataA
tissue row plot treatment yield responsiveness prediction BLUP
1 Head outside 5 Control 159.30 <NA> 73.98139 114.24174
2 Head outside 5 Tr1 59.70 -62.5% 73.98139 NA
3 Head outside 5 Tr2 97.50 -38.8% 73.98139 119.52771
4 Stem outside 5 Control 74.65 <NA> 73.98139 64.37642
5 Stem outside 5 Tr1 34.05 -54.4% 55.43495 NA
6 Stem outside 5 Tr2 76.65 2.7% 55.43495 73.98139
7 Head outside 6 Control 99.30 <NA> 63.21595 114.24174
8 Head outside 6 Tr1 109.00 9.8% 56.49138 NA
9 Head outside 6 Tr2 82.30 -17.1% 56.49138 119.52771
10 Stem outside 6 Control 53.95 <NA> 55.43495 64.37642
11 Stem outside 6 Tr1 52.95 -1.9% 55.43495 NA
12 Stem outside 6 Tr2 65.45 21.3% 63.21595 73.98139
13 Head outside 7 Control 119.40 <NA> 56.49138 114.24174
14 Head outside 7 Tr1 90.30 -24.4% 56.49138 NA
15 Head outside 7 Tr2 95.60 -19.9% 55.43495 119.52771
.
.
.
Let’s add one more variable; plant number.
dataA$plant_no= c(9, 9, 9, 9, 9, 9, 18, 18, 18, 18, 18, 18, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 14, 14, 14, 14, 14, 14, 13, 13, 13, 13, 13, 13, 6, 6, 6, 6, 6, 6, 12, 12, 12, 12, 12, 12, 16, 16, 16, 16, 16, 16, 11, 11, 11, 11, 11, 11, 14, 14, 14, 14, 14, 14, 25, 25, 25, 25, 25, 25, 13, 13, 13, 13, 13, 13, 16, 16, 16, 16, 16, 16, 9, 9, 9, 9, 9, 9, 19, 19, 19, 19, 19, 19, 18, 18, 18, 18, 18, 18, 19, 19, 19, 19, 19, 19, 10, 10, 10, 10, 10, 10, 14, 14, 14, 14, 14, 14, 12, 12, 12, 12, 12, 12, 23, 23, 23, 23, 23, 23, 14, 14, 14, 14, 14, 14, 18, 18, 18, 18, 18, 18)
Now, I think both planting ‘row and plant number is a random factor.
head_control=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Head" & treatment=="Control"))
stem_control=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Stem" & treatment=="Control"))
head_Tr1=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Head" & treatment=="Tr1"))
stem_Tr2=lmer(yield ~ 1 + (1|row) + (1|plant_no), data=subset (dataA, tissue=="Stem" & treatment=="Tr2"))
predicted_values1=predict(head_control, newdata=dataA[which(dataA$tissue=="Head" & dataA$treatment=="Control"),])
predicted_values2=predict(stem_control, newdata=dataA[which(dataA$tissue== "Stem" & dataA$treatment=="Control"),])
predicted_values3=predict(head_Tr1, newdata=dataA[which(dataA$tissue=="Head" & dataA$treatment=="Tr1"),])
predicted_values4=predict(stem_Tr2, newdata=dataA[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2"),])
dataA$BLUP2=rep(NA, nrow(dataA)) # Create a new column with NA values
dataA$BLUP2[which(dataA$tissue=="Head" & dataA$treatment=="Control")]= predicted_values1
dataA$BLUP2[which(dataA$tissue=="Stem" & dataA$treatment=="Control")]= predicted_values2
dataA$BLUP2[which(dataA$tissue=="Head" & dataA$treatment=="Tr2")]= predicted_values3
dataA$BLUP2[which(dataA$tissue=="Stem" & dataA$treatment=="Tr2")]= predicted_values4
dataA
tissue row plot treatment yield responsiveness BLUP plant_no BLUP2
1 Head outside 5 Control 159.30 <NA> 114.24174 9 129.61517
2 Head outside 5 Tr1 59.70 -62.5% NA 9 NA
3 Head outside 5 Tr2 97.50 -38.8% 119.52771 9 111.87921
4 Stem outside 5 Control 74.65 <NA> 64.37642 9 69.45422
5 Stem outside 5 Tr1 34.05 -54.4% NA 9 NA
6 Stem outside 5 Tr2 76.65 2.7% 73.98139 9 80.37009
7 Head outside 6 Control 99.30 <NA> 114.24174 18 107.84756
8 Head outside 6 Tr1 109.00 9.8% NA 18 NA
9 Head outside 6 Tr2 82.30 -17.1% 119.52771 18 116.54366
10 Stem outside 6 Control 53.95 <NA> 64.37642 18 56.49917
11 Stem outside 6 Tr1 52.95 -1.9% NA 18 NA
12 Stem outside 6 Tr2 65.45 21.3% 73.98139 18 66.13558
13 Head outside 7 Control 119.40 <NA> 114.24174 13 114.59756
14 Head outside 7 Tr1 90.30 -24.4% NA 13 NA
15 Head outside 7 Tr2 95.60 -19.9% 119.52771 13 117.12605