![]() |
emast |
Usage:
ememe [options] mfile outfile
The outfile parameter is new to EMBASSY MEME. The output is always written to MAST: Motif Alignment and Search Tool
MAST is a tool for searching biological sequence databases for sequences
that contain one or more of a group of known motifs.
A motif is a sequence pattern that occurs repeatedly in a group of related
protein or DNA sequences. Motifs are represented as position-dependent
scoring matrices that describe the score of each possible letter at each
position in the pattern. Individual motifs may not contain gaps. Patterns with
variable-length gaps must be split into two or more separate motifs before
being submitted as input to MAST.
MAST takes as input a file containing the descriptions of one or more motifs
and searches a sequence database that you select for sequences that match
the motifs. The motif file can be the output of the MEME motif discovery tool
or any file in the appropriate format.
MAST outputs three things:
MAST works by calculating match scores for each sequence in the database
compared with each of the motifs in the group of motifs you provide. For each
sequence, the match scores are converted into various types of p-values and
these are used to determine the overall match of the sequence to the group of
motifs and the probable order and spacing of occurrences of the motifs in the
sequence.
Go to the input files for this example
Please note the examples below are unedited excerpts of the original MEME documentation. Bear in mind the EMBASSY and original MEME options may differ in practice (see "1. Command-line arguments").
The following examples assume that file "meme.results" is the
output of a MEME run containing at least 3 motifs and file
SwissProt is a copy of the Swiss-Prot database on your local disk.
DNA_DB is a copy of a DNA database on your local disk.
1) Annotate the training set:
2) Find sequences matching the motif and annotate them in the SwissProt database:
3) Show sequences with weaker combined matches to motifs.
4) Indicate weaker matches to single motifs in the annotation so that sequences with weak matches to the motifs (but perhaps with the "correct" order and spacing) can be seen:
5) Include a nominal order and spacing of the first three motifs in the calculation of the sequence p-values to increase the sensitivity of the search for matching sequences:
6) Use only the first and third motifs in the search:
7) Use only the first two motifs in the search:
8) Search DNA sequences using protein motifs, adjusting p-values and E-values for each sequence by that sequence's composition:
Most of the options in the original mast are given in ACD as "advanced" or
"additional" options. -options must be specified on the command-line in order
to be prompted for a value for "additional" options but "advanced" options
will never be prompted for.
Please note that one only of -stdin or -d should be specified. If you set both, then -d will be used. This behaviour could have been enforced at the level of the ACD file by using an ACD select: or list: type but this would have been inconsistent with the original meme, which has two separate options.
Algorithm
Please read the file README distributed with the original MEME.
Usage
Here is a sample session with emast
% emast ex1.html ex1.out
Motif detection
Print results for sequences with E-value [10]:
Show motif matches with p-value < mt [0.0001]:
Go to the output files for this example EXAMPLES:
mast meme.results
mast meme.results -d SwissProt
mast meme.results -d SwissProt -ev 200
mast meme.results -d SwissProt -w
mast meme.results -d SwissProt -diag "9-[2]-61-[1]-62-[3]-91"
mast meme.results -d SwissProt -m 1 -m 3
mast meme.results -d SwissProt -c 2
mast meme.results -d DNA_DB -dna -comp
Command line arguments
Where possible, the same command-line qualifier names and parameter order is used as in the original mast. There are however several unavoidable differences and these are clearly documented in the "Notes" section below.
Standard (Mandatory) qualifiers
Allowed values
Default
[-mfile]
(Parameter 1)If -d
Input file
Required
-ev
Print results for sequences with E-value
Any numeric value
10
-mt
Show motif matches with p-value < mt
Any numeric value
0.0001
[-outfile]
(Parameter 2)MAST program output file
Output file
Additional (Optional) qualifiers
Allowed values
Default
-d
If -d
Input file
Required
-a
Input file
Input file
Required
-bfile
The random model uses the letter frequencies given in
Input file
Required
-smax
Print results for no more than
Any integer value
-1
-stdin
The default is to read the database specified inside
Boolean value Yes/No
No
-text
Default is hypertext (HTML) format
Boolean value Yes/No
No
-dna
Translate DNA sequences to protein
Boolean value Yes/No
No
-comp
The random model uses the letter frequencies in the current target sequence instead of the non-redundant database frequencies. This causes p-values and E-values to be compensated individually for the actual composition of each sequence in the database. This option can increase search time substantially due to the need to compute a different score distribution for each high-scoring sequence.
Boolean value Yes/No
No
-rank
Print results starting with
Any integer value
-1
-best
Include only the best motif in diagrams
Boolean value Yes/No
No
-remcorr
Remove highly correlated motifs from query
Boolean value Yes/No
No
-brief
Brief output: do not print documentation.
Boolean value Yes/No
No
-b
Print only sections I and II
Boolean value Yes/No
No
-nostatus
Do not print progress report
Boolean value Yes/No
No
-hitlist
If you specify the -hitlist switch to MAST, the motif 'diagram' takes the form of a comma separated list of motif occurrences ('hits'). Each 'hit' has the format:
Boolean value Yes/No
No
Advanced (Unprompted) qualifiers
Allowed values
Default
-c
Only use the first
Any integer value
-1
-sep
Score reverse complement DNA strand as a separate sequence
Boolean value Yes/No
No
-norc
Do not score reverse complement DNA strand
Boolean value Yes/No
No
-w
Show weak matches (mt Boolean value Yes/No
No
-seqp
The default is to use POSITION p-values.
Boolean value Yes/No
No
-mf
Print
Any string is accepted
An empty string is accepted
-df
Print
Any string is accepted
An empty string is accepted
-minseqs
Lower bound on number of sequences in db
Any integer value
-1
-mev
Use only motifs with E-values less than
Any numeric value
-1
-m
Overrides value set by using -mev.
Any integer value
-1
-diag
See on-line documentation for a valid example.
Any string is accepted
An empty string is accepted
Input file format
emast reads any normal sequence USAs.
Input files for usage example
File: ex1.html