Defended to the Nines: 25 years of Resistance Gene Cloning Identifies Nine Mechanisms for R Protein Function. The R code and some additional graphs

Background

We recently published a review on the molecular mechanisms underlying the function of R genes in The Plant Cell. The graphs in the first figure were made in R. Here I will show the R code used to generate these graphs, and a couple of additional graphs that can be made with the supplemental dataset. The dataset used to make the graphs is Supplemental dataset 1.

This code is probably not particularly elegant, but it does the job. Feel free to play around with the data and generate more graphs!

Data

The code requires the tidyverse and readxl from the tidyverse to be loaded.

# Load in the required libraries
library(readxl)
library(tidyverse)

# Load in the supplemental data from The Plant Cell paper (make sure to remove the first row containing the article header)
Data <- read_excel("data/TPC2017-REV-00579R2_Supplemental_Data_Set_1.xlsx", sheet = 1, na = "-")

A little bit about the columns:

Table 1: explanation of some of the columns
Column Description Comment
Kingdom Pathogen type For the review the category auto-immunity and DAMP were dropped
Host Host species in which R gene is cloned For the review several species were grouped
DateReceptor Date R gene was cloned For the review one date was chosen in case of multiple components; Column Timeline
DateCoreceptor Date guard/decoy was cloned For the review one date was chosen in case of multiple components; Column Timeline
Status Whether or not the cloned genes have been shown to be involved in disease resistance For the review only the category Confirmed was kept, and the category ETIP was counted as loss-of-susceptibility
Proposed mechanism Proposed mechanism Not always the same mechanism as used in the review, more speculative
Review Mechanism used in the review Mechanism used in the review, duplicates were removed and only confirmed R genes were kept
Timeline The date the R gene/receptor or the guard/decoy was cloned was used The first date was chosen, duplicates were removed and only confirmed R genes were kept

Unpublished figures

There is a lot more that can be done with the supplemental dataset. For example is these two graphs that did not make the paper.

Host vs pathogen

Again we grouped the same species together as before, and then plot the type of pathogen against which a cloned R gene is present against the host plant. From this data we can see either research bias for certain pathosystems, or a reflection of actual pathogen pressure (or both).

# Create a new dataframe
Species_Kingdom <- Data %>%
  filter(!is.na(Review)) %>% # Remove everything that is duplicated
  group_by(Host, Kingdom) %>%
  summarize(counts = n()) 

# Rename the species to reduce the number of categories
Species_Kingdom$Host[which(Species_Kingdom$Host == "Nicotiana benthamiana")] <- "Nicotiana"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Nicotiana glutinosa")] <- "Nicotiana"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Tobacco")] <- "Nicotiana"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Eggplant")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Solanum americanum")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Cabbage")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Oilseed rape")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Pigeonpea")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Pea")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Soybean")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Medicago truncatula")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "French Bean")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Cowpea")] <- "Fabaceae"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Apple")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Grapevine")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Hop")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Myrobalan Plum")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Black Pepper")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Norway spruce")] <- "Other"
Species_Kingdom$Host[which(Species_Kingdom$Host == "Sorghum")] <- "Other"

# Order the pathogen types
Species_Kingdom <- transform(Species_Kingdom, 
                      Kingdom  = factor(
                        Kingdom ,
                        levels=c("Viral", "Bacteria", "Oomycete", "Plant", "Fungi", "Nematode", "Insect", "Other"),
                        ordered =TRUE))

#Plot the data
ggplot(data = Species_Kingdom, aes(x = Host, y = counts, fill = Kingdom)) + 
  geom_bar(stat = "identity") +
  ggtitle("R genes / host / pathogen-type") +
  scale_x_discrete(limits = c("Arabidopsis", "Fabaceae", "Flax", "Melon", "Lettuce", "Sugar beet", "Nicotiana", "Pepper", "Potato","Tomato", "Barley", "Maize", "Rice", "Wheat", "Other")) + 
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02", "#A6761D", "#666666")) + 
  scale_y_continuous(breaks = c(0, 10, 20, 30, 40, 50, 60)) +
  guides(fill = guide_legend(title = NULL)) + 
  ylab("Cloned R genes") + 
  theme(panel.background = element_rect(fill = NA),
        panel.grid.major = element_line(colour = "#BFBFBF", size = 0.8),
        panel.grid.major.x = element_line(colour = NA),
        axis.line.y = element_line(color = "#BFBFBF", size = 0.8),
        axis.ticks.y = element_line(color = "#BFBFBF", size = 0.8),
        axis.ticks.x = element_line(color = "#BFBFBF", size = 0.8),
        axis.text.x = element_text(angle = 60, hjust = 1, size = 12),
        axis.title.x=element_blank())

Host vs pathogen timeline

Likewise, in this timeline of cloned R genes against different types of pathogens we may see a reflection of which pathogens have been important to study, or a research bias.

# Create a new dataframe
data.count <- count(Data, Timeline, Kingdom)

year <- c(1992:2017)
kingdom <- unique(data.count$Kingdom)[1:8]
year.list <- c()
kingdom.list <- c()
n.list <- c()


for (y in year){
  for (k in kingdom){
    year.list = c(year.list, y)
    kingdom.list = c(kingdom.list, k)
    if (length(data.count$n[which(data.count$Timeline==y&data.count$Kingdom==k)])==0){
      n.list = c(n.list,0)
    } else{
      n.list = c(n.list, data.count$n[which(data.count$Timeline==y&data.count$Kingdom==k)])
    }
    
  }
}

TimelineKingdom <- data.frame(Timeline = year.list, Kingdom = kingdom.list, count = n.list)

rm(data.count)
rm(n.list)
rm(k)
rm(kingdom)
rm(kingdom.list)
rm(y)
rm(year)
rm(year.list)

TimelineKingdom <- TimelineKingdom %>% 
  group_by(Kingdom) %>%
  mutate(running_total = cumsum(count))

# Order the pathogen types
TimelineKingdom <- transform(TimelineKingdom, 
                    Kingdom  = factor(
                      Kingdom ,
                      levels = c("Viral", "Bacteria", "Oomycete", "Plant", "Fungi", "Nematode", "Insect", "Other"),
                      ordered = TRUE))

#Plot the data
ggplot(data = TimelineKingdom, aes(x = Timeline, y = running_total, fill = Kingdom)) + 
  geom_area(position = "stack") +
  ggtitle("Timeline of cloned R genes") + 
  scale_x_continuous(breaks = c(1995, 2000, 2005, 2010, 2015), 
                     labels = c("1995", "2000", "2005", "2010", "2015")) + 
  guides(fill = guide_legend(title = NULL)) + 
  scale_fill_manual(values = c("#1B9E77", "#D95F02", "#7570B3", "#E7298A", "#66A61E", "#E6AB02", "#A6761D", "#666666")) + 
  ylab("Cloned R genes") + 
  scale_y_continuous(breaks = c(0, 50, 100, 150, 200, 250, 300)) + 
  theme(panel.background = element_rect(fill = NA),
        panel.grid.major = element_line(colour = "#BFBFBF", size = 0.8),
        panel.grid.major.x = element_line(colour = NA),
        axis.line.y = element_line(color = "#BFBFBF", size = 0.8),
        axis.ticks.y = element_line(color = "#BFBFBF", size = 0.8),
        axis.ticks.x = element_line(color = "#BFBFBF", size = 0.8),
        axis.text.x = element_text(size = 12),
        axis.title.x=element_blank())