scrm is uses a syntax compatible with the popular program ms. There are, however, a few differences to ms:
-c
in ms) and-s
),-L
produces a slightly different output and-l
(approximation),-sr
(changing recombination rate),-st
(changing mutation rate),-eI
(sampling haplotypes at multiple time points) and-oSFS
(generates frequency spectra).-ema
. Our version of the command is just -ema <t> <M11> <M12> ...
instead of -ema <t> <npop> <M11> <M12> ...
.For all other options, you can also refer to ms’ manual to get a detailed description of what the commands are doing. scrm should happily execute any ms command that does not contain -c
, -s
and -ema
. Also, scrm has somewhat stricter requirements regarding the order of arguments if population admixture (-es
) is involved.
The arguments for calling scrm are
scrm <nhap> <nrep> [...]
where nhap is the total number of haplotypes (in all populations and at all times) that are simulated at each locus, and nrep is the number of independent loci that will be produced. The [...]
is an optional placeholder for an arbitrary number of command line flags described below.
-r <R> <L>
: Set the recombination rate to R = 4N0r and the length of all loci to L base pairs. r is expected number of recombinations on the locus per generation.-l <l>
: Use approximation rather than simulating the exact ARG. Within a sliding window of length l base pairs all linkage information is considered when building the genealogy. To positions outside of this window, some linkage is ignored. Setting l=0 produces the SMC’ and l=-1 deactivates the approximation. Since v1.6.0, it’s also possible to specify the window’s length in number of recombinations. To do so, use -l <x>r
, where x is the number of recombinations (e.g. -l 100r
for a window spanning 100 recombinations). Also starting with version 1.6.0 approximation is turned on by default using a conservative window length of 500 recombinations. For most applications, it should be fine to reduce this value to 100 - 250 recombinations if runtime is a critical factor.In all commands, migrations rates M = 4N0m, where m is the fraction of a population that is replaced with migrants from other populations each generation (looking forwards in time).
-I <npop> <s1> ... <sn> [<M>]
: Use an island model with npop populations, where s1 to sn haplotypes are sampled from population 1 to n, respectively. Optionally assume a symmetric migration rate of M.-M <M>
: Assume a symmetric migration rate of M/(npop-1).-m <i> <j> <M>
: Set the migration rate from population j to population i to M (looking forward in time) [since v1.3.1].-ma <M11> <M21> ... <M21> ...
: Set the migration matrix (Dimension is npop x npop). Diagonals elements are ignored but required (you can use x
or 0
).For exponential growth/decline of a population, the parameter a changes the size of a population according to the formula N(s) = N(0)exp(-as), where N(0) is the population’s size at the time of the command (e.g. 0 for -g <a>
and -G <a>
and t for -eg <t> <a>
and -eG <t> <a>
) and N(s) is the size of the population s time units in the past. Looking forwards in time, a positive a leads to population growth, while a negative one generates a decline in population sizes.
-n <i> <n>
: Set the present day size of population i to _n*N0_.-G <a>
: Set the exponential growth rate of all populations to a.-g <i> <a>
: Set the exponential growth rate of population i to a.-t < $\theta$ >
: Set the mutation rate to \(\theta = 4N_0u\), where u is the neutral mutation rate per locus. If this options is given, scrm generates the segregating sites output.-transpose-segsites
or --transpose-segsites
: If given, the segregating sites are printed with each row representing a mutation and each column representing a haplotype, rather than the other way round. Additionally, the time at which a mutation occurred is reported (in units of 4 * N0 generations) [since v1.7.0].-T
: Print the local genealogies in newick format.-O
: Print the local genealogies in the oriented forest
format as described in Kelleher et al. (2014) [since v1.2].-L
: Print the TMRCA and the local tree length for each segment (behaves different to ms). Both values are scaled in coalescent time units, e.g. in 4 * N0 generations.-oSFS
: Print the site frequency spectrum. Requires that the mutation rate \(\theta\) is given with the ‘-t’ option.-SC [ms|rel|abs]
: Scaling of sequence positions. Either relative to the locus length between 0 and 1 (rel
), absolute in base pairs (abs
) or ms
’s scaling (default) where the positions in the segregating sites output are relative, and the positions in the trees output are absolute (ms
) [since v1.3.0].-seed <SEED> [<SEED2> <SEED3>]
: Specifies a seed for the simulation. You can input up to three non-negative numbers. If no seed is given, scrm generates one using entropy provided by the operating system. To reproduce a previous simulation, use the single number in the second line of the output.-print-model, --print-model
: Prints information about the model defined by the command line arguments, including calculated population sizes. Can be useful for debugging or verifying the model [since v1.5.0].-p <digits>
: Sets the number of significant digits used in the output [since v1.4.0].-h
, --help
: Prints a help text.-v
, --version
: Prints version information.The command this section all have a time t as first parameter. Changes made by the commands affect the time from t further back into the past. All times in units of _4*N0_ generations.
-eI <t> <s1> ... <sn>
: Sample s1 to sn haplotypes are from population 1 to n, respectively, at time t.-eM <t> <M>
: Assume a symmetric migration rate of M/(npop-1) at time t.-em <t> <i> <j> <M>
: Set the migration rate from population j to population i to M (looking forward in time) at time t [since v1.3.1].-ema <t> <M11> <M12> ... <M21> ...
: Set the migration matrix at time t (Dimension is npop x npop). Diagonals elements are ignored but required (use ‘x’ or 0). The rates apply pastwards from time t.-eN <t> <n>
: Set the size of all populations to _n*N0_ at time t.-en <t> <i> <n>
: Set the size of population i to _n*N0_ at time t.-eg <t> <i> <a>
: Set the exponential growth rate of population i to a at time t.-eG <t> <a>
: Set the exponential growth rate of all populations to a at time t.-es <t> <i> <p>
: Population admixture. Replaces a fraction 1-p of population i with haplotypes from a population npop + 1. Technically (and looking backwards in time), a new population n+1 with size N0 is created at time t. Migration (to & from) and growth rates for this population are initially 0. Each lines in population i is moved to the new population with probability 1-p. Please sort multiple -es
arguments by their time to avoid confusion about the numbering of populations. Please give the arguments that affect the whole population (-M
, -N
, -G
& -ma
) before giving the first -es
. Also, their timed equivalent’s (-eM
, -eN
, -eG
, -eI
& -ema
) position on the command line events must also be sorted by time, at least relative to the -es
argument. scrm
throws an error if any of these conditions is not met. In doubt, just sort all command line arguments by their time.-eps <t> <i> <j> <p>
: Partial admixture. Similar to -es
but replaces a fraction 1-p
of population i with haploids from population j at time t. Different to -es
, population j is a normal population that continues to exist at times more recent than t. Viewed backwards in time, this moves a fraction 1-p of the linages in population i to population j. This does not change the number of populations, population sizes, growth or migration rates in any way [since v1.5.0].-ej <t> <j> <i>
: Adds a specialization event in population i that creates population j (forwards in time). Technically (and looking backwards in time), it moves all lines from population j into population i at time t. Migration rates into population j are set to 0 for the time further back into the past.When multiple es
, eps
or ej
arguments are given for the same time t, the migrations are executed in the order in which the commands are given. For example if we have -es 0.08 2 .2 -ej 0.08 3 1
, first 80% of pop 2 move to a newly created pop 3 (viewed backwards in time), then everyone that just moved to pop 3 moves on to pop 1. This is equivalent to -eps 0.08 2 1 .2
, except that the latter does not create the empty population 3.
The following commands change the model parameters from at a sequence position s. You should still set the initial rate with -r
or -t
, respectively, and then use the commands prefixed with s
for all changes. Note that -r
also takes the total length of the sequence as second argument, while -sr
just has the rate as argument.
-sr <s> <R>
: Set the recombination rate to R starting at position s.-st <s> <$\theta$>
: Set the mutation rate to \(\theta\) starting at position s.