This package provides a simple way to make positionally accurate plots of gene models similar to those found on The Arabidopsis Information Resource (TAIR). These plots are suitable for presentations and publications where showing gene models is appropriate to comunicate genetic research.
To install the package:
install_bitbucket("greymonroe/genemodel")
library(genemodel)
Now lets look at an example gene that we are going to model. This is gene is AT5G62640
. The original gene model information can be found here http://www.arabidopsis.org/servlets/TairObject?type=gene&id=1000654517. It is stored in the Gene Feature
table on that page.
…
Once this table is extracted and saved as .csv or .txt table file, it can be loaded into R as a data.frame object. The package genemodel has this data already stored and can be loaded like this:
data("AT5G62640")
When we look at the strucutre of this data.frame,
head(AT5G62640, 15)
## type coordinates
## 1 ORF 191-2958
## 2 5' utr 1-190
## 3 coding_region 191-271
## 4 coding_region 551-625
## 5 coding_region 689-782
## 6 coding_region 959-1029
## 7 coding_region 1155-1210
## 8 coding_region 1321-1372
## 9 coding_region 1449-1530
## 10 coding_region 1631-2004
## 11 coding_region 2124-2633
## 12 coding_region 2731-2958
## 13 exon 1-271
## 14 intron 272-550
## 15 exon 551-625
we see that it is a two column data.frame with the first column, “type,” containing the name of the feature type and the second,“coordinates,” specifying the coordinate position within the gene model that the feature occupies.
The feature types that TAIR provides in the Gene Feature
table for gene models are:
Before we can plot the gene, we need to extract some other information from the TAIR gene model description. First we need the start and stop base pair positions for the gene. These can be found in the Map Locations
section of the TAIR gene model page and correspond to the Coordinates
of Map Type
‘nuc_sequence.’ Next weed the direction of transcription which can also be found in the Map Locations
section under Orientation
. Again we want the value for corresponding the Map Type
that equals ‘nuc_sequence’ See image below from TAIR.
We now have the information necessary to plot the gene with genemodel.plot
genemodel.plot(model=AT5G62640, start=25149433, bpstop=25152541, orientation="reverse", xaxis=T)
genemodel.plot
automatically recongizes the types of gene features found in TAIR gene models and plots them in accurate positions and orientation. By default, UTRs are colored light blue, exons are colored dark blue and introns are indicated by the bent line. The direction of transcripton is also marked in way consitent with TAIR notation by the pointed end of the gene model. By only plotting UTRs, coding region and introns, genemodel.plot
ignores the ‘ORF’ and “exon’ feature types as they are redundant.
With a little creativity, it is easy to imagine using genemodel to plot such things as alternative splicing. For example, an exon and it’s neighboring introns can be removed and replaced by a sinlge intron to create a plot showing a different splice variant.
spl1<-data.frame(
type=c("5' utr", "coding_region", "intron", "coding_region", "intron", "coding_region","3' utr"),
coordinates=c("1-50", "50-100", "100-150", "150-200", "200-250", "250-300","300-350"))
spl2<-data.frame(
type=c("5' utr", "coding_region", "intron","coding_region","3' utr"), coordinates=c("1-50", "50-100", "100-250", "250-300","300-350"))
par(mfrow=c(2,1))
genemodel.plot(model=spl1, start=1, bpstop=350, orientation="reverse", xaxis=T)
genemodel.plot(model=spl2, start=1, bpstop=350, orientation="reverse", xaxis=F)
The next function we will look at is the mutation.plot
function which plots mutations at correct positions on an already plotted gene model.
genemodel.plot(model=AT5G62640, start=25149433, bpstop=25152541, orientation="reverse", xaxis=T)
mutation.plot(25150214, 25150214, text="P->S", col="black", drop=-.15, haplotypes=c("red", "blue"))
mutation.plot(25150659, 25150659, text="V->S", col="black", drop=-.15, haplotypes=c("red"))
mutation.plot(25150639, 25150639, text="L->P", col="black", drop=-.35, haplotypes=c("blue"))
mutation.plot
adds mutations to a prexisting gene model plot. In this example, amino acid substitutions are shown at exact positions. The colored dots correspond to the hapolotype group that has this mutation. The drop
parameter can be used to offset the positionin of close mutation for easy visualization.