Benchmarking setup

The SpeedyMarkov and reference implementations have been benchmarked with a range of optimisations including making use of Rcpp and parallisation. Several parallisation approaches have been explored as has the use of a varying number of cores. The benchmarking grid has been implemented within SpeedyMarkov - see below for details.

  • Load required packages.
library(SpeedyMarkov)
library(microbenchmark)
library(parallel)
library(furrr)
#> Loading required package: future
library(ggplot2)
  • Code for the benchmarking grid.
SpeedyMarkov::benchmark_markov
#> function (markov_model = NULL, reference = NULL, duration = NULL, 
#>     samples = NULL, times = 1) 
#> {
#>     future::plan(future::multiprocess, workers = 4)
#>     benchmark <- microbenchmark::microbenchmark(Reference = {
#>         reference(cycles = duration, samples = samples)
#>     }, R = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "base", sample_type = "base")
#>     }, `Rcpp inner loop` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_inner", 
#>             sample_type = "base")
#>     }, `Rcpp simulation` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_all", sample_type = "base")
#>     }, `Rcpp simulation + sampling` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_all", sample_type = "rcpp")
#>     }, `Rcpp simulation + sampling - mclapply (2 cores)` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_all", sample_type = "rcpp", 
#>             batches = 2, batch_fn = parallel::mclapply, mc.cores = 2)
#>     }, `Rcpp simulation + sampling - mclapply (4 cores)` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_all", sample_type = "rcpp", 
#>             batches = 4, batch_fn = parallel::mclapply, mc.cores = 4)
#>     }, `Rcpp simulation + sampling - furrr::future_map (4 cores)` = {
#>         markov_ce_pipeline(markov_model(), duration = duration, 
#>             samples = samples, sim_type = "armadillo_all", sample_type = "rcpp", 
#>             batches = 4, batch_fn = furrr::future_map)
#>     }, times = times)
#>     return(benchmark)
#> }
#> <bytecode: 0x69e1e48>
#> <environment: namespace:SpeedyMarkov>

All benchmarking results have been run elsewhere (inst/scripts/benchmarking.R) and then saved (inst/benchmarks). These benchmarks can be recreated locally by running the following in the package directory (Warning:: This is likely to take upwards of an hour and a half).

Two state Markov model

The two state Markov model has been benchmarked for both 100 cycles and 200 cycles (duration) with 100,000 samples each time - repeated 10 times per benchmark.

Benchmarking results show that both Rcpp implementations consistently outperform other implementations with the full Rcpp implementation being overall slightly faster but with potentially some instability. Parallisation appears to work well and scale with the number of cores used. Both parallisation approaches work equally well with furrr preferred due to its support for Windows. On a 4 core machine a parallel approach takes around 10% of the reference approaches runtime. There was also a substantial speed up - though less dramatic - when compared to the R SpeedyMarkov approach.

Increasing the number of cycles (duration) resulted in a an approximate doubling of the runtime for both the reference and R SpeedyMarkov approaches. However, all C++ based approaches had only small increases in runtime highlighting the increased performance that C provides (though the difference between the partial and full C++ approaches did increase). In fact the parallel C++ implementations showed no apparent increase in runtime when the duration was doubled. This indicates that this approach should scale well to more complex models and/or longer runtimes.