Creating Visual Emphasis: Adding Dotted Boxes to Graphs in R Studio

Creating Visual Emphasis: Adding Dotted Boxes to Graphs in R Studio


I’ll explain how to insert a box in a graph to highlight it. I’ll generate some data.

Genotype= c("A","B","C","D","E")
Yield= c(70, 75, 71, 88, 90)
se= c(2,3,2,3,4)
dataA= data.frame(Genotype,Yield,se)

  Genotype Yield se
1        A    70  2
2        B    75  3
3        C    71  2
4        D    88  3
5        E    90  4

“This data pertains to the yield and standard error for five different genotypes. I’ll create a bar chart to visualize it.

library(ggplot2)
ggplot(data=dataA, aes(x=Genotype, y=Yield, fill=Genotype))+
  scale_fill_manual(values= c("azure4","darkolivegreen4","cadetblue","Dark red",
                               "Blue")) +
  geom_bar(stat="identity", position="dodge", width = 0.7, size=1) +
  geom_errorbar(aes(ymin= Yield-se, ymax=Yield + se), position=position_dodge(0.7),
                width=0.2, color='Black') +
  scale_y_continuous(breaks = seq(0,150,10), limits = c(0,150)) +
  labs(x="Genotype", y="Yield (g/m2)") +
  theme(legend.position="none",
        legend.title=element_blank(),
        legend.key.size=unit(0.5,'cm'),
        legend.key=element_rect(color=alpha("white",.05), 
                                fill=alpha("white",.05)),
        legend.text=element_text(size=11),
        legend.background= element_rect(fill=alpha("white",.05)),
        panel.grid.major=element_line(colour="grey90", linewidth=0.5),
        axis.line=element_line(linewidth=0.5, colour="black")) +
  windows(width=6, height=5)

In this graph, genotypes D and E exhibit greater yields compared to the other genotypes. My current objective is to emphasize genotypes D and E by adding a dotted box. To achieve this, we can utilize the geom_rect().

ggplot(data=dataA, aes(x=Genotype, y=Yield, fill=Genotype))+
  scale_fill_manual(values= c ("azure4","darkolivegreen4","cadetblue","Dark red",
                               "Blue")) +
  geom_bar(stat="identity", position="dodge", width = 0.7, size=1) +
  geom_errorbar(aes(ymin= Yield-se, ymax=Yield + se), position=position_dodge(0.7),
                width=0.2, color='Black') +
  geom_rect(aes(xmin= as.numeric(dataA$Genotype[[4]]) - 0.5,
                xmax= as.numeric(dataA$Genotype[[5]]) + 0.5), ymin= -1, ymax= 100, 
                fill= NA, color= "Dark green", linetype= 8, size=1) +
  scale_y_continuous(breaks = seq(0,150,10), limits = c(0,150)) +
  labs(x="Genotype", y="Yield (g/m2)") +
  theme(legend.position="none",
        legend.title=element_blank(),
        legend.key.size=unit(0.5,'cm'),
        legend.key=element_rect(color=alpha("white",.05), 
                                fill=alpha("white",.05)),
        legend.text=element_text(size=11),
        legend.background= element_rect(fill=alpha("white",.05)),
        panel.grid.major=element_line(colour="grey90", linewidth=0.5),
        axis.line=element_line(linewidth=0.5, colour="black")) +
  windows(width=6, height=5)

For geom_rect(), I set up the x-axis (ranging from genotype A to E) as numeric. To achieve this, we utilize as.numeric() and set 4 as the minimum value (along with a margin of -0.5). Similarly, for the maximum value, we also use as.numeric() and set 5 as the upper limit (along with a margin of +0.5). However, upon executing this code, an error message appears.

Warning messages:
1: In FUN(X[[i]], …) : NAs introduced by coercion
2: In FUN(X[[i]], …) : NAs introduced by coercion
3: Removed 4 rows containing missing values (geom_rect).

Let’s examine the structure of our data.

str(dataA)

'data.frame':	5 obs. of  3 variables:
 $ Genotype: chr  "A" "B" "C" "D" ...
 $ Yield   : num  70 75 71 88 90
 $ se      : num  2 3 2 3 4

As you can observe, the genotypes are currently set as characters. Converting characters directly to numerics might not be feasible. To address this, let’s convert the characters into factors first. Once the data is in factor format, we can subsequently transform it into numerics.

Genotype= as.factor(c("A","B","C","D","E"))

Like the below code, let’s set up genotype as factor. Then, make a new data frame.

Genotype= c("A","B","C","D","E")
Genotype= as.factor(c("A","B","C","D","E"))
Yield= c(70, 75, 71, 88, 90)
se= c(2,3,2,3,4)
dataA= data.frame(Genotype,Yield,se)
str(dataA)

'data.frame':	5 obs. of  3 variables:
 $ Genotype: Factor w/ 5 levels "A","B","C","D",..: 1 2 3 4 5
 $ Yield   : num  70 75 71 88 90
 $ se      : num  2 3 2 3 4

Starting from now, the genotype variable has been transformed into a factor. With this adjustment in place, let’s proceed to execute the same code to generate a bar graph.

Look at that! We’ve successfully generated a bar graph with a dotted box! Now, what if we wish to emphasize the graph using color instead of a box? In such situations, we can employ the fill= alpha("Color", transparency) approach.

ggplot(data=dataA, aes(x=Genotype, y=Yield, fill=Genotype))+
  scale_fill_manual(values= c ("azure4","darkolivegreen4","cadetblue","Dark red",
                               "Blue")) +
  geom_bar(stat="identity", position="dodge", width = 0.7, size=1) +
  geom_errorbar(aes(ymin= Yield-se, ymax=Yield + se), position=position_dodge(0.7),
                width=0.2, color='Black') +
  geom_rect(aes(xmin= as.numeric(dataA$Genotype[[4]]) - 0.5,
                xmax= as.numeric(dataA$Genotype[[5]]) + 0.5), ymin= -1, ymax= 100, 
                fill= alpha("Green",.05), color= "Dark green", linetype= 0, size=1) +
  scale_y_continuous(breaks = seq(0,150,10), limits = c(0,150)) +
  labs(x="Genotype", y="Yield (g/m2)") +
  theme(legend.position="none",
        legend.title=element_blank(),
        legend.key.size=unit(0.5,'cm'),
        legend.key=element_rect(color=alpha("white",.05), 
                                fill=alpha("white",.05)),
        legend.text=element_text(size=11),
        legend.background= element_rect(fill=alpha("white",.05)),
        panel.grid.major=element_line(colour="grey90", linewidth=0.5),
        axis.line=element_line(linewidth=0.5, colour="black")) +
  windows(width=6, height=5)

If you decide to apply color filling to other genotypes as well, you can easily do so by adding additional instances of the geom_rect()

ggplot(data=dataA, aes(x=Genotype, y=Yield, fill=Genotype))+
  scale_fill_manual(values= c ("azure4","darkolivegreen4","cadetblue","Dark red",
                               "Blue")) +
  geom_bar(stat="identity", position="dodge", width = 0.7, size=1) +
  geom_errorbar(aes(ymin= Yield-se, ymax=Yield + se), position=position_dodge(0.7),
                width=0.2, color='Black') +
  geom_rect(aes(xmin= as.numeric(dataA$Genotype[[4]]) - 0.5,
                xmax= as.numeric(dataA$Genotype[[5]]) + 0.5), ymin= -1, ymax= 100, 
                fill= alpha("Green",.05), color= "Dark green", linetype= 0, size=1) +
  geom_rect(aes(xmin= as.numeric(dataA$Genotype[[1]]) - 0.5,
                xmax= as.numeric(dataA$Genotype[[3]]) + 0.5), ymin= -1, ymax= 100, 
                fill= alpha("Red",.05), color= "Dark green", linetype= 0, size=1) +
  scale_y_continuous(breaks = seq(0,150,10), limits = c(0,150)) +
  labs(x="Genotype", y="Yield (g/m2)") +
  theme(legend.position="none",
        legend.title=element_blank(),
        legend.key.size=unit(0.5,'cm'),
        legend.key=element_rect(color=alpha("white",.05), 
                                fill=alpha("white",.05)),
        legend.text=element_text(size=11),
        legend.background= element_rect(fill=alpha("white",.05)),
        panel.grid.major=element_line(colour="grey90", linewidth=0.5),
        axis.line=element_line(linewidth=0.5, colour="black")) +
  windows(width=6, height=5)


Comments are closed.