Parallel Computing

Andrew Marx

2022-06-07

Introduction

As of v2.0.0, the samc package does support the use of multiple cores for parallelization. Currently, this feature is targeted towards the dispersal() function for the dispersal(samc, occ) option. This function is particularly useful, but has a severe performance limitation explained in the Performance vignette. Fortunately, this particular case is suitable for parallelization, which means that multiple cores can be used to drastically reduce the runtime of the analysis.

Until this feature has been well established, it is recommended that users compare the results of the non-parallel version with the parallel version. This can be done efficiently using a scaled-down version of input data. Once the results for the scaled-down data have been verified, then the parallel version can be run for the larger full dataset. When doing this testing, the samc object (create using samc()) will have to be recreated. If you do not, then the analysis will use cached results from the previous run.

It’s also important to be aware of the distinction between cores and threads. Roughly, cores are the physical units of a processor that perform computations. A thread can be thought of as a set of instructions that will be run on a core. In general, a core will always be able to run one thread at a time. In this case, for example, a 4-core processor can run 4 threads. Sometimes, depending on the processor features (e.g., Intel Hyperthreading), a core can execute two threads at once. In this case, a 4-core processor could be used to run 8 threads simultaneously. This means you can potentially break up a workload into more threads than cores. However, it only works well in certain types of workloads and may not scale well in other situations.

Enabling Parallelization

To enable parallelization, we just set the number of threads to a positive integer greater than one:

# Assume samc_obj was created using samc()

samc_obj$threads <- 4
#> Important: When setting the number of threads, make sure to use a reasonable number for the machine the code will run on.
#> Using the maximum number of threads supported by your hardware can make other programs non-responsive. Only do this if nothing else needs to run during the analysis.
#> Specifying more threads than a machine supports in hardware will likely lead to lost performance.

That’s it. If a function supports parallelization, it will now make use of multiple threads (4 in this example).

Limitations

In general, using more cores to solve a problem will not scale perfectly. In addition, there can be diminishing returns as the number of cores used increases. There are multiple factors that play into this. Users with very high core count machines (e.g., 32, 64, or 128 cores) may find that using all of their cores does not provide significant benefit over a more moderate amount of cores. The only way to know for sure is to benchmark the analysis at different core counts.

In general, using more cores means that additional memory will need to be used to solve the problem. The severity of the memory increase will vary depending on the problem. In the case of dispersal(), the memory requirements for additional cores used is fairly minimal. This means that as long as machines have a reasonable amount of memory per core, this should not be an issue.

If you use all the cores of a machine at once, it can negatively affect the performance of other software running. Therefore, if you need to use your machine for other tasks while the analysis is running, you will likely need to restrict how many cores you use.

For users with access to supercomputing resources, running a parallel analysis across different nodes requires more advanced features not available in the package. Therefore, samc analyses should only be run on a single node in supercomputers.