How to randomize treatments using R?

How to randomize treatments using R?


Setting up experimental design according to your experiment goal is the first step to achieve your experiment’s success. In Agronomy studies, experimental design involves the combination of treatments deployed in the field, and these treatments should be randomized. Randomization is important in experimental design as it helps our experiments avoid biases due to physical or biological factors. Of course, there are no specific, unconditional rules for randomization. In a very old-fashioned way, you can write treatment numbers on paper, and then choose each paper as if drawing lots. However, we’re researchers and we can use more advanced ways to randomize our treatments. Today, I’ll introduce how to randomize treatments using R according to each experimental design.


1) One factor without block (Completely Randomized Design)

In Completely Randomized Design (CRD), experimental units are randomly assigned to different treatment groups. Each treatment group represents a level of the independent variable being tested. In a CRD, replicates are not considered as blocks. If replicates are considered as blocks, it will be a Randomized Complete Block Design (RCBD). The differences between CRD and RCBD will be summarized below my post.


How to calculate pooled variance when including block in the experimental design?


Let’s say there is one factor, ‘nitrogen’, and the experimental design was Completely Randomized Design (CRD), and there were 4 replicates.

df = tibble::tibble(
     N_rate=c("0N", "50N", "100N", "150N"),
     `Rep 1`=seq(101, 104, by=1),
     `Rep 2`=seq(201, 204, by=1),
     `Rep 3`=seq(301, 304, by=1),
     `Rep 4`=seq(401, 404, by=1),
)
df
N_rate	Rep 1	Rep 2	Rep 3	Rep 4
0N	101	201	301	401
50N	102	202	302	402
100N	103	203	303	403
150N	104	204	304	404

I’ll randomize treatments using CRD.

library(dplyr)
df_randomized= df %>%
               pivot_longer(-N_rate, names_to= "Replicate", 
               values_to= "Value") %>%
               mutate(Value= sample(Value)) %>%
               pivot_wider(names_from= Replicate, values_from= Value)

df_randomized
N_rate	Rep 1	Rep 2	Rep 3	Rep 4
0N	203	201	402	403
50N	202	104	103	204
100N	101	301	303	404
150N	302	401	304	102

Based on this randomization, the experimental design will be as follows: all treatments will be randomized within the entire plot.



2) Two factors without block (Completely Randomized Design)

Now, while maintaining CRD, there are two factors to consider: variety and nitrogen.

df = tibble::tibble(
     Variety= rep(c("CV1", "CV2"), each= 4L),
     N_rate= rep(c("0N", "50N", "100N", "150N"), 2),
     `Rep 1`= seq(101, 108, by= 1),
     `Rep 2`= seq(201, 208, by= 1),
     `Rep 3`= seq(301, 308, by= 1),
     `Rep 4`= seq(401, 408, by= 1),
)
df
Variety	N rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	101	201	301	401
CV1	50N	102	202	302	402
CV1	100N	103	203	303	403
CV1	150N	104	204	304	404
CV2	0N	105	205	305	405
CV2	50N	106	206	306	406
CV2	100N	107	207	307	407
CV2	150N	108	208	308	408

I’ll randomize treatment combination (i.e., CV1 N0) using CRD, employing the following code.

library(dplyr)
library(tidyr)
df_randomized= df %>%
               pivot_longer(-c(Variety, N_rate), names_to= "Replicate", 
               values_to= "Value") %>%
               mutate(Value= sample(Value)) %>%
               pivot_wider(names_from= Replicate, values_from= Value)

df_randomized
Variety	N_rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	108	407	307	206
CV1	50N	103	308	208	408
CV1	100N	406	303	302	101
CV1	150N	207	202	204	306
CV2	0N	104	205	402	301
CV2	50N	401	404	105	403
CV2	100N	203	305	107	102
CV2	150N	304	405	106	201

Based on this randomization, the experimental design will be as follows: all treatment combinations will be randomized within the entire plot.



3) One factor with block (Randomized Complete Block Design)

Since we are transitioning to RCBD, the experimental design will involve replicates as blocks.

df = tibble::tibble(
     N_rate = c("0N", "50N", "100N", "150N"),
     `Rep 1` = seq(101, 104, by = 1),
     `Rep 2` = seq(201, 204, by = 1),
     `Rep 3` = seq(301, 304, by = 1),
     `Rep 4` = seq(401, 404, by = 1),
)
df
N_rate	Rep 1	Rep 2	Rep 3	Rep 4
0N	101	201	301	401
50N	102	202	302	402
100N	103	203	303	403
150N	104	204	304	404

In contrast to CRD, in RCBD, randomization occurs within each block, whereas previously it was randomized within the entire plot.

library(dplyr)
set.seed(100)
df_randomized=df %>%
              group_by() %>%
              mutate(across(starts_with("Rep"), ~sample(.))) %>%
ungroup()

df_randomized
N_rate	Rep 1	Rep 2	Rep 3	Rep 4
0N	102	203	304	402
50N	103	201	303	403
100N	104	202	302	404
150N	101	204	301	401


4) Two factors with block (Randomized Complete Block Design)

Now, while maintaining RCBD, there are two factors to consider: variety and nitrogen.

df = tibble::tibble(
     Variety= rep(c("CV1", "CV2"), each= 4L),
     N_rate= rep(c("0N", "50N", "100N", "150N"), 2),
     `Rep 1`= seq(101, 108, by= 1),
     `Rep 2`= seq(201, 208, by= 1),
     `Rep 3`= seq(301, 308, by= 1),
     `Rep 4`= seq(401, 408, by= 1),
)
df
Variety	N rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	101	201	301	401
CV1	50N	102	202	302	402
CV1	100N	103	203	303	403
CV1	150N	104	204	304	404
CV2	0N	105	205	305	405
CV2	50N	106	206	306	406
CV2	100N	107	207	307	407
CV2	150N	108	208	308	408

I’ll randomize treatment combination (i.e., CV1 N0) using RCBD, employing the following code.

library(dplyr)
set.seed(100)
df_randomized=df %>%
              group_by() %>%
              mutate(across(starts_with("Rep"), ~sample(.))) %>%
              ungroup()

df_randomized
Variety	N rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	102	207	307	402
CV1	50N	107	206	308	401
CV1	100N	106	208	303	408
CV1	150N	103	204	302	403
CV2	0N	101	203	301	404
CV2	50N	108	202	306	406
CV2	100N	105	205	304	405
CV2	150N	104	201	305	407

Based on this randomization, the experimental design will be as follows: all treatment combinations will be randomized within the block.



5) Split-plot design

df = tibble::tibble(
     Variety= rep(c("CV1", "CV2"), each= 4L),
     N_rate= rep(c("0N", "50N", "100N", "150N"), 2),
     `Rep 1`= seq(101, 108, by= 1),
     `Rep 2`= seq(201, 208, by= 1),
     `Rep 3`= seq(301, 308, by= 1),
     `Rep 4`= seq(401, 408, by= 1),
)
df
Variety	N rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	101	201	301	401
CV1	50N	102	202	302	402
CV1	100N	103	203	303	403
CV1	150N	104	204	304	404
CV2	0N	105	205	305	405
CV2	50N	106	206	306	406
CV2	100N	107	207	307	407
CV2	150N	108	208	308	408
library(dplyr)
set.seed(100)
df_randomized= df %>%
               mutate(across(starts_with("Rep"),
               ~ ave(.x[order(match(Variety,sample(unique(Variety))))],
               Variety,FUN=sample)))

df_randomized
Variety	N_rate	Rep 1	Rep 2	Rep 3	Rep 4
CV1	0N	106	207	308	406
CV1	50N	107	206	307	405
CV1	100N	105	208	305	407
CV1	150N	108	205	306	408
CV2	0N	102	203	302	401
CV2	50N	103	204	301	403
CV2	100N	104	201	304	402
CV2	150N	101	202	303	404


Leave a Reply

If you include a website address in the comment section, I cannot see your comment as it will be automatically deleted and will not be posted. Please refrain from including website addresses.