The effectR
package is an R package designed to call oomycete RxLR and CRN effectors by searching for the motifs of interest using regular expression searches and hidden markov models (HMM).
The effectR
packages searches for the motifs of interest (RxLR-EER motif for RxLR effectors and LFLAK motif for CRN effectors) using a regular expression search (REGEX
). These motifs used by the REGEX effectR
search have been reported in the literature (Haas et al., 2009, Stam et al., 2004).
The effectR
package aligns the REGEX search results using MAFFT
, and builds a HMM profile based on the multiple sequence alignment result using the hmmbuild
program from HMMER
. The HMM profile is used to search across ORF of the genome of interest using the hmmsearch
binary from HMMER
. The search step will retain sequences with significant hits to the profile of interest. effectR
also combines the redundant sequences found in both REGEX and HMM searches into a single dataset that can be easily exported. In addition, effectR
reads and returns the HMM profile to the user and allows for the creation of a MOTIF logo-like plot using ggplot2
.
The effectR
package uses MAFFT
and HMMER3
to perform the hidden markov model seach across the results from the REGEX step. These two packages should be installed before running any of the effectR
functions.
MAFFT is a multiple sequence alignment program that uses Fourier-transform algorithms to align multiple sequences. We recommend downloading and installing MAFFT by following the instructions and steps in the MAFFT installation web site.
Make sure that you remember the directory in which MAFFT
is installed, of if the installation is sucessful, make sure to obtain the path via bash/tsh/console:
which mafft
/usr/local/bin/mafft
For more information about MAFFT go to the MAFFT website: http://mafft.cbrc.jp/
MAFFT comes in two main distributions for windows:
Please, download and install the all-in-one version. We recommend that you download and save MAFFT in your Desktop, as it will make yyour path easily accesible.
HMMER is used for searching sequence databases for sequence homologs. It uses hidden Markov models (profile HMMs) to search for sequences with hits to similar patterns than the profile. We use three main HMMER tools:
hmmbuild
to create the HMM database from the sequences ontained in the REGEX step of effectR
hmmpress
converts the HMM database into a format usable by other HMMER
programshmmsearch
to excecute the HMM search in our sequence queries basde on the HMM profileThe effectR
package requires all of these tools. A correct HMMER
installation will install all three programs.
We recommend downloading and installing HMMER by following the instructions and steps in the HMMER installation web site. Make sure that you remember the directory in which HMMER
is installed, of if the installation is sucessful, make sure to obtain the path via bash/tsh/console:
which hmmbuild
which hmmpress
which hmmsearch
/usr/local/bin/hmmbuild
/usr/local/bin/hmmpress
/usr/local/bin/hmmsearch
For more information about HMMER go to the HMMER website: http://hmmer.org/
To use the effectR
package in Windows, the user must download the Windows binaries of HMMER. effectR
will not work with any other version of HMMER.
The effectR
package is designed to work with amino acid sequences in FASTA format representing the six-frame translation of every open reading frame (ORF) of an oomycete genome. Using the six-frame translation of all ORF’s in a genome is recommended in order to obtain as many effectors as possible from a proteome. To obtain the ORF for a genome, we recommend the use of EMBOSS’ getorf
.
effectR
uses a list of sequences of the class SeqFastadna
in order to perform the effector searches. The function read.fasta
from the seqinr
package reads the FASTA amino acid file into R, creating a list of SeqFastadna
objects that represent each of the translated ORF’s from the original FASTA file. We will begin our example using a subset of translated ORF’s from the P. infestans genome sequenced by Haas et al., (2009):
library(effectR)
pkg <- "effectR"
fasta.file <- system.file("extdata", "test_infestans.fasta", package = pkg)
library(seqinr)
ORF <- read.fasta(fasta.file)
head(ORF, n = 2)
## $PITG_23132_PexRD36_RxLR
## [1] "m" "r" "a" "n" "v" "i" "v" "v" "a" "l" "a" "l" "y" "i" "a" "c" "a"
## [18] "n" "v" "t" "l" "a" "a" "c" "a" "k" "s" "r" "h" "l" "r" "v" "n" "g"
## [35] "k" "d" "a" "l" "w" "n" "y" "d" "t" "s" "g" "g" "i" "n" "s" "i" "v"
## [52] "a" "d" "d" "e" "e" "r" "v" "v" "s" "f" "s" "g" "i" "k" "r" "w" "l"
## [69] "k" "e" "l" "f" "k" "n" "w" "s"
## attr(,"name")
## [1] "PITG_23132_PexRD36_RxLR"
## attr(,"Annot")
## [1] ">PITG_23132_PexRD36_RxLR"
## attr(,"class")
## [1] "SeqFastadna"
##
## $PITG_15287_PexRD1_RxLR
## [1] "m" "r" "a" "c" "n" "t" "l" "l" "p" "t" "a" "i" "v" "l" "t" "s" "c"
## [18] "d" "a" "l" "s" "a" "h" "r" "a" "q" "i" "m" "n" "v" "a" "t" "s" "d"
## [35] "l" "i" "s" "p" "i" "e" "s" "t" "v" "q" "d" "d" "n" "y" "d" "r" "q"
## [52] "l" "r" "g" "f" "y" "a" "t" "e" "n" "t" "d" "p" "v" "n" "n" "q" "d"
## [69] "t" "a" "h" "e" "d" "g" "e" "e" "r" "v" "n" "v" "a" "t" "v" "l" "g"
## [86] "k" "g" "d" "e" "a" "w" "d" "d" "a" "l" "m" "r" "l" "a" "y" "q" "h"
## [103] "w" "f" "d" "g" "g" "k" "t" "s" "d" "g" "m" "r" "l" "i" "m" "d" "l"
## [120] "p" "a" "k" "g" "e" "a" "l" "r" "h" "p" "n" "w" "g" "k" "y" "i" "k"
## [137] "y" "l" "e" "f" "v" "k" "e" "k" "k" "k" "e" "a" "a" "d" "a" "a" "a"
## [154] "v" "a" "a" "l" "k" "r" "r" "r" "t" "y" "r" "g" "w" "y" "v" "d" "g"
## [171] "k" "t" "e" "k" "d" "v" "r" "k" "i" "f" "g" "l" "p" "a" "t" "g" "k"
## [188] "a" "k" "n" "h" "p" "n" "w" "a" "d" "f" "q" "e" "y" "l" "n" "v" "v"
## [205] "r" "e" "y" "s" "k" "v" "v" "f" "k"
## attr(,"name")
## [1] "PITG_15287_PexRD1_RxLR"
## attr(,"Annot")
## [1] ">PITG_15287_PexRD1_RxLR"
## attr(,"class")
## [1] "SeqFastadna"
We have created a ORF
object that includes the list of translated ORF’s from the subset of XX ORF’s from the P. infestans genome. For more information on the SeqFastadna
objects please read the seqinr manual
.
To perform the effector search, effectR
searches for the motifs of interest found in RxLR and CRN motifs. We have created the function regex.search
to perform the seach of the motif of interest. The function regex.search
requires the list of SeqFastadna
objects and the gene family of interest. Here we show an example to search for sequences with RxLR-EER motifs from the 27 ORF subset of P. infestans. This ORF example data set contains 17 sequences with RxLR-EER motifs and 27 sequences with the LFLAK motifs found in CRN effectors. We expect to find, then, 17 sequences after using the regex.search
function with the motif='RxLR'
parameter:
REGEX <- regex.search(sequence = ORF, motif = "RxLR")
length(REGEX)
## [1] 15
We observe that the REGEX
object has 27 sequences with the RxLR motif. These sequences will be aligned using MAFFT
, and used to build a HMM profile to search for similar sequences.
In addition to the basic functionality of regex.search
to obtain both RxLR and CRN candidate genes, we have added the possibility of using a custom regex motif in order to search for non-canonical effectors or other protein motifs from different families of interest. The option motif = "custom"
is couple with the reg.pat
option, wich allows for the inclusion of any regular expression in the format specified by the regex
function in R. For example, if we want to obtain all candidates with a WV motif in the positions 60 to 90 of the aminoacid, we use the following commands:
reg.pat <- "^\\w{50,60}[w,v]"
REGEX <- regex.search(sequence = ORF, motif = "custom", reg.pat = reg.pat)
length(REGEX)
## [1] 17
To perform the HMM search and obtain all possible effector candidates from a proteome, effectR
uses the REGEX
results as a template to create a HMM profile and perform a search across the proteome of interest. We have created the hmm.search
function in order to perfomr this search. The hmm.search
function requires a local installation of MAFFT
and HMMER
in order to perform the searches. The absolute paths of the binaries must be specified in the mafft.path
and hmmer.path
options of the hmm.search
function.
Note for Windows users: Please use the ABSOLUTE PATH for HMMER and MAFFT or effectR will not work (e.g.
mafft.path ="C:/User/Banana/Desktop/mafft/"
)
In addition, the hmm.function
requires the path of the original FASTA file containing the translated ORF’s in the original.seq
parameter of the function. hmm.search
will use this file as a query in the hmmsearch
software from HMMER, and search for all sequences with hits against the HMM profile created with the REGEX results.
We will continue or example by performing a hmm.search
in our example data set. We will include the original example FASTA file location (stored in the fasta.file
object), the location of the MAFFT
binary and the location of the HMMER binaries:
candidate.rxlr <- hmm.search(original.seq = fasta.file, regex.seq = REGEX, mafft.path = "/usr/local/bin/", hmm.path = "/usr/local/bin/")
The hmm.search
function has resulted in 19 effector candidates. As a reminder, we used the REGEX
results of an RxLR motif search, so we can consider this hmm.search
results as RxLR candidate effectors. To obtain the CRN candidate effectors we should go back to the regex.search
step and modify the motif
parameter to motif="CRN
and perform the hmm.search
again.
The hmm.search
object returns a list of 3 elements:
SeqFastadna
classSeqFastadna
classhmmbuild
as a data frameWe can access each one of these elements by using the $
operator in the object obtained from hmm.search
:
head(candidate.rxlr$REGEX, n = 2)
head(candidate.rxlr$HMM, n = 2)
head(candidate.rxlr$HMM_Table)
We have included each of these elements to provide the user with the most complete information possible from each of the steps performed until here.
The user can extract all of the non-redundant sequences and a summary table with the information about the motifs using the effector.summary
function. This function uses the results from either hmm.seach
or regex.search
functions to generate a table that includes the name of the candidate effector sequence, the number of motifs of interest (RxLR-EER or LFLAK-HVLV) per sequence and its location within the sequence. In addition, when the effector.summary
function is used in an object that contains the results of hmm.search
, the user will obtain a list of the non-reduntant sequences. If the user provides the results from regex.search
, the function will return the motif summary table.
We will use the effector.summary
function with our hmm.search
results (the candidate.rxlr
object):
summary.list <- effector.summary(candidate.rxlr)
knitr::kable(summary.list$motif.table)
The motif table has a column called MOTIF. This column summarizes the candidate ORF into one of 4 categories:
head(summary.list$motif.table, n = 2)
length(summary.list$consensus.sequences)
The summary.list$consensus.sequences
has all 27 RxLR candidate genes found in our searches.
To export the non-redundant effector candidates that resulted from the hmm.search
or regex.search
functions, we use the write.fasta
function of the seqinr
package. We recomend the users to read the documentation of the seqinr
package Since the objects that result from the hmm.search
or regex.search
function are of the SeqFastadna
class, we can use any of the function of the seqinr
package that use this class as well.
To save the results from our example file, we would use the following command:
write.fasta(sequences = getSequence(summary.list$consensus.sequences), names = getName(summary.list$consensus.sequences), file.out = "RxLR_candidates.fasta")
To determine if the HMM profile includes the motifs of interest, we have created the function hmm.logo
. The function hmm.logo
reads the HMM profile (obtained from the hmm.search
step) and uses ggplot2
to create a bar-plot. The bar-plot will illustrate the bits (amino acid scores) of each amino acid used to construct the HMM profile according to its consensus position in the HMM profile. To learn more about sequence logo plots visit this wikipedia article.
The hmm.logo
is a wrapper that parses the HMM profile table and plots the parsed table results in ggplot2
.
To visualize the sequence logo-like plot in our example data set, we use our candidate.rxlr$HMM_Table
object in the hmm.search
function:
hmm.logo(hmm.table = candidate.rxlr$HMM_Table)