Basic usage, <code>atime_versions</code> function

Basic usage, `atime_versions` function

In this vignette we show you how to compare asymptotic timings of an R expression which uses different versions of a package. Let us begin by cloning the binsegRcpp package,

old.opt <- options(width=100)
tdir <- tempfile()
dir.create(tdir)
git2r::clone("https://github.com/tdhock/binsegRcpp", tdir)
#> cloning into 'C:\Users\th798\AppData\Local\Temp\RtmpAhGltV\file3cc46e267506'...
#> Receiving objects:   1% (13/1258),   63 kb
#> Receiving objects:  11% (139/1258),   63 kb
#> Receiving objects:  21% (265/1258),  127 kb
#> Receiving objects:  31% (390/1258),  127 kb
#> Receiving objects:  41% (516/1258),  183 kb
#> Receiving objects:  51% (642/1258),  183 kb
#> Receiving objects:  61% (768/1258),  183 kb
#> Receiving objects:  71% (894/1258),  239 kb
#> Receiving objects:  81% (1019/1258),  239 kb
#> Receiving objects:  91% (1145/1258),  239 kb
#> Receiving objects: 100% (1258/1258),  251 kb, done.
#> Local:    master C:/Users/th798/AppData/Local/Temp/RtmpAhGltV/file3cc46e267506
#> Remote:   master @ origin (https://github.com/tdhock/binsegRcpp)
#> Head:     [977f385] 2022-08-24: rm rcppdeepstate yaml action

Next, we define a helper function run.atime that will run atime_versions, which is a simple way to compare different github versions of a function:

run.atime.versions <- function(TDIR){
  atime::atime_versions(
    pkg.path=TDIR,
    N=2^seq(2, 20),
    setup={
      max.segs <- as.integer(N/2)
      data.vec <- 1:N
    },
    expr=binsegRcpp::binseg_normal(data.vec, max.segs),
    cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
    "rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
    "mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091")
}

Here is an explanation of the arguments specified above:

pkg.path is the path to the github repository containing the R package,
N is a numeric vector of data sizes,
setup is an R expression which will be run to create data for each size N,
expr is an R expression which will be timed for each package version. Under the hood, a different R package is created for each package version, with package names like Package.SHA, binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e. This expr must contain double or triple colon package name prefix code, like binsegRcpp::binseg_normal above, which will be translated to several different version-specific expressions, like binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal.
The remaining arguments specify the different package versions (names for labels, values for SHA version IDs).

Note that in your code you don't have to create a helper function like run.atime.versions in the code above. We do it in the package vignette code, in order to run the different versions of the code using callr::r, in a separate R process. This allows us to avoid CRAN warnings about unexpected files found in the package check directory, by safely delete/remove the installed packages, after having run the example code. In the code block below we compute the timings,

atime.ver.list <- if(requireNamespace("callr")){
  requireNamespace("atime")
  callr::r(run.atime.versions, list(tdir))
}else{
  run.atime.versions(tdir)
}
#> Loading required namespace: callr
names(atime.ver.list$measurements)
#>  [1] "N"         "expr.name" "min"       "median"    "itr/sec"   "gc/sec"    "n_itr"     "n_gc"     
#>  [9] "result"    "memory"    "time"      "gc"        "kilobytes" "q25"       "q75"       "max"      
#> [17] "mean"      "sd"
atime.ver.list$measurements[, .(N, expr.name, min, median, max, kilobytes)]
#>         N     expr.name       min     median       max  kilobytes
#>     <num>        <char>     <num>      <num>     <num>      <num>
#>  1:     4            cv 0.0002638 0.00026955 0.0003486 9286.60938
#>  2:     4  rm unord map 0.0010513 0.00108270 0.0012737 2673.42969
#>  3:     4 mvl_construct 0.0007333 0.00076910 0.0032837  278.91406
#>  4:     8            cv 0.0002699 0.00027380 0.0003603   18.74219
#>  5:     8  rm unord map 0.0010847 0.00117040 0.0054281   70.00781
#>  6:     8 mvl_construct 0.0007157 0.00078580 0.0009568   67.48438
#>  7:    16            cv 0.0002705 0.00027460 0.0003687   18.92188
#>  8:    16  rm unord map 0.0010623 0.00111985 0.0012998   70.18750
#>  9:    16 mvl_construct 0.0007556 0.00081700 0.0010086   67.66406
#> 10:    32            cv 0.0002766 0.00030655 0.0004173   19.64062
#> 11:    32  rm unord map 0.0011544 0.00123000 0.0038467   72.82031
#> 12:    32 mvl_construct 0.0008006 0.00083545 0.0010113   70.29688
#> 13:    64            cv 0.0002944 0.00030205 0.0004217   25.38281
#> 14:    64  rm unord map 0.0011087 0.00116505 0.0015315   88.72656
#> 15:    64 mvl_construct 0.0008715 0.00093930 0.0044220   86.20312
#> 16:   128            cv 0.0003028 0.00030735 0.0004363   43.35156
#> 17:   128  rm unord map 0.0011066 0.00114855 0.0014970  120.67969
#> 18:   128 mvl_construct 0.0010173 0.00107275 0.0012461  117.26562
#> 19:   256            cv 0.0003259 0.00033780 0.0004260   64.85156
#> 20:   256  rm unord map 0.0011413 0.00118275 0.0013411  165.67969
#> 21:   256 mvl_construct 0.0013446 0.00141745 0.0015483  161.51562
#> 22:   512            cv 0.0003978 0.00041155 0.0004810  107.85156
#> 23:   512  rm unord map 0.0012517 0.00137185 0.0016203  255.67969
#> 24:   512 mvl_construct 0.0020066 0.00211960 0.0023550  250.01562
#> 25:  1024            cv 0.0004937 0.00051650 0.0007122  193.85156
#> 26:  1024  rm unord map 0.0014681 0.00171030 0.0046410  435.67969
#> 27:  1024 mvl_construct 0.0034753 0.00370460 0.0039303  427.01562
#> 28:  2048            cv 0.0006712 0.00070250 0.0007898  365.85156
#> 29:  2048  rm unord map 0.0019627 0.00223720 0.0028095  795.67969
#> 30:  2048 mvl_construct 0.0067450 0.00687395 0.0089058  781.01562
#> 31:  4096            cv 0.0011736 0.00119435 0.0013280  709.85156
#> 32:  4096  rm unord map 0.0033274 0.00381530 0.0048789 1515.67969
#> 33:  4096 mvl_construct 0.0134129 0.01366050 0.0156015 1489.01562
#> 34:  8192            cv 0.0022507 0.00248565 0.0027683 1397.85156
#> 35:  8192  rm unord map 0.0066052 0.00673415 0.0124579 2955.67969
#> 36: 16384            cv 0.0043061 0.00496710 0.0115577 2773.85156
#> 37: 16384  rm unord map 0.0111116 0.01150120 0.0167759 5835.67969
#> 38: 32768            cv 0.0098131 0.01013680 0.0144229 5525.85156
#>         N     expr.name       min     median       max  kilobytes

The result is a list with a measurements data table that contains measurements of time in seconds (min, median, max) and memory usage (kilobytes) for every version (expr.name) and data size (N). A more convenient version of the data for plotting can be obtained via the code below:

best.ver.list <- atime::references_best(atime.ver.list)
names(best.ver.list$measurements)
#>  [1] "unit"       "N"          "expr.name"  "min"        "median"     "itr/sec"    "gc/sec"    
#>  [8] "n_itr"      "n_gc"       "result"     "memory"     "time"       "gc"         "kilobytes" 
#> [15] "q25"        "q75"        "max"        "mean"       "sd"         "fun.name"   "fun.latex" 
#> [22] "expr.class" "expr.latex" "empirical"
best.ver.list$measurements[, .(N, expr.name, unit, empirical)]
#>         N     expr.name      unit    empirical
#>     <num>        <char>    <char>        <num>
#>  1:     4            cv kilobytes 9.286609e+03
#>  2:     8            cv kilobytes 1.874219e+01
#>  3:    16            cv kilobytes 1.892188e+01
#>  4:    32            cv kilobytes 1.964063e+01
#>  5:    64            cv kilobytes 2.538281e+01
#>  6:   128            cv kilobytes 4.335156e+01
#>  7:   256            cv kilobytes 6.485156e+01
#>  8:   512            cv kilobytes 1.078516e+02
#>  9:  1024            cv kilobytes 1.938516e+02
#> 10:  2048            cv kilobytes 3.658516e+02
#> 11:  4096            cv kilobytes 7.098516e+02
#> 12:  8192            cv kilobytes 1.397852e+03
#> 13: 16384            cv kilobytes 2.773852e+03
#> 14: 32768            cv kilobytes 5.525852e+03
#> 15:     4  rm unord map kilobytes 2.673430e+03
#> 16:     8  rm unord map kilobytes 7.000781e+01
#> 17:    16  rm unord map kilobytes 7.018750e+01
#> 18:    32  rm unord map kilobytes 7.282031e+01
#> 19:    64  rm unord map kilobytes 8.872656e+01
#> 20:   128  rm unord map kilobytes 1.206797e+02
#> 21:   256  rm unord map kilobytes 1.656797e+02
#> 22:   512  rm unord map kilobytes 2.556797e+02
#> 23:  1024  rm unord map kilobytes 4.356797e+02
#> 24:  2048  rm unord map kilobytes 7.956797e+02
#> 25:  4096  rm unord map kilobytes 1.515680e+03
#> 26:  8192  rm unord map kilobytes 2.955680e+03
#> 27: 16384  rm unord map kilobytes 5.835680e+03
#> 28:     4 mvl_construct kilobytes 2.789141e+02
#> 29:     8 mvl_construct kilobytes 6.748437e+01
#> 30:    16 mvl_construct kilobytes 6.766406e+01
#> 31:    32 mvl_construct kilobytes 7.029687e+01
#> 32:    64 mvl_construct kilobytes 8.620313e+01
#> 33:   128 mvl_construct kilobytes 1.172656e+02
#> 34:   256 mvl_construct kilobytes 1.615156e+02
#> 35:   512 mvl_construct kilobytes 2.500156e+02
#> 36:  1024 mvl_construct kilobytes 4.270156e+02
#> 37:  2048 mvl_construct kilobytes 7.810156e+02
#> 38:  4096 mvl_construct kilobytes 1.489016e+03
#> 39:     4            cv   seconds 2.695500e-04
#> 40:     8            cv   seconds 2.738000e-04
#> 41:    16            cv   seconds 2.746000e-04
#> 42:    32            cv   seconds 3.065500e-04
#> 43:    64            cv   seconds 3.020500e-04
#> 44:   128            cv   seconds 3.073500e-04
#> 45:   256            cv   seconds 3.378000e-04
#> 46:   512            cv   seconds 4.115500e-04
#> 47:  1024            cv   seconds 5.165000e-04
#> 48:  2048            cv   seconds 7.025000e-04
#> 49:  4096            cv   seconds 1.194350e-03
#> 50:  8192            cv   seconds 2.485650e-03
#> 51: 16384            cv   seconds 4.967100e-03
#> 52: 32768            cv   seconds 1.013680e-02
#> 53:     4  rm unord map   seconds 1.082700e-03
#> 54:     8  rm unord map   seconds 1.170400e-03
#> 55:    16  rm unord map   seconds 1.119850e-03
#> 56:    32  rm unord map   seconds 1.230000e-03
#> 57:    64  rm unord map   seconds 1.165050e-03
#> 58:   128  rm unord map   seconds 1.148550e-03
#> 59:   256  rm unord map   seconds 1.182750e-03
#> 60:   512  rm unord map   seconds 1.371850e-03
#> 61:  1024  rm unord map   seconds 1.710300e-03
#> 62:  2048  rm unord map   seconds 2.237200e-03
#> 63:  4096  rm unord map   seconds 3.815300e-03
#> 64:  8192  rm unord map   seconds 6.734150e-03
#> 65: 16384  rm unord map   seconds 1.150120e-02
#> 66:     4 mvl_construct   seconds 7.691000e-04
#> 67:     8 mvl_construct   seconds 7.858000e-04
#> 68:    16 mvl_construct   seconds 8.170000e-04
#> 69:    32 mvl_construct   seconds 8.354500e-04
#> 70:    64 mvl_construct   seconds 9.393000e-04
#> 71:   128 mvl_construct   seconds 1.072750e-03
#> 72:   256 mvl_construct   seconds 1.417450e-03
#> 73:   512 mvl_construct   seconds 2.119600e-03
#> 74:  1024 mvl_construct   seconds 3.704600e-03
#> 75:  2048 mvl_construct   seconds 6.873950e-03
#> 76:  4096 mvl_construct   seconds 1.366050e-02
#>         N     expr.name      unit    empirical

The data table above is a tall/long version of the same data, which can be plotted using the code below:

if(require(ggplot2)){
  hline.df <- with(atime.ver.list, data.frame(seconds.limit, unit="seconds"))
  gg <- ggplot()+
    theme_bw()+
    facet_grid(unit ~ ., scales="free")+
    geom_hline(aes(
      yintercept=seconds.limit),
      color="grey",
      data=hline.df)+
    geom_line(aes(
      N, empirical, color=expr.name),
      data=best.ver.list$meas)+
    geom_ribbon(aes(
      N, ymin=min, ymax=max, fill=expr.name),
      data=best.ver.list$meas[unit=="seconds"],
      alpha=0.5)+
    scale_x_log10()+
    scale_y_log10("median line, min/max band")
  if(require(directlabels)){
    gg+
      directlabels::geom_dl(aes(
        N, empirical, color=expr.name, label=expr.name),
        method="right.polygons",
        data=best.ver.list$meas)+
      theme(legend.position="none")+
      coord_cartesian(xlim=c(1,2e7))
  }else{
    gg
  }
}

plot of chunk unnamed-chunk-5

Advanced usage, `atime_versions_exprs` with `atime`

What if you wanted to compare different versions of one R package, to another R package? Continuing the example above, we can get a list of expressions, each one for a different version of the package, via the code below:

(ver.list <- atime::atime_versions_exprs(
  pkg.path=tdir,
  expr=binsegRcpp::binseg_normal(data.vec, max.segs),
  cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
  "rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
  "mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091"))
#> $cv
#> binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal(data.vec, 
#>     max.segs)
#> 
#> $`rm unord map`
#> binsegRcpp.dcd0808f52b0b9858352106cc7852e36d7f5b15d::binseg_normal(data.vec, 
#>     max.segs)
#> 
#> $mvl_construct
#> binsegRcpp.5942af606641428315b0e63c7da331c4cd44c091::binseg_normal(data.vec, 
#>     max.segs)

The ver.list created above can be augmented with other expressions, such as the following alternative implementation of binary segmentation from the changepoint package,

expr.list <- c(ver.list, if(requireNamespace("changepoint")){
  list(changepoint=substitute(changepoint::cpt.mean(
    data.vec, penalty="Manual", pen.value=0, method="BinSeg",
    Q=max.segs-1)))
})

The expr.list created above can be provided as an argument to the atime function as in the code below,

run.atime <- function(ELIST){
  atime::atime(
    N=2^seq(2, 20),
    setup={
      max.segs <- as.integer(N/2)
      data.vec <- 1:N
    },
    expr.list=ELIST)
}
atime.list <- if(requireNamespace("callr")){
  requireNamespace("atime")
  callr::r(run.atime, list(expr.list))
}else{
  run.atime(expr.list)
}
atime.list$measurements[, .(N, expr.name, median, kilobytes)]
#>         N     expr.name     median    kilobytes
#>     <num>        <char>      <num>        <num>
#>  1:     4            cv 0.00046950  9318.882812
#>  2:     4  rm unord map 0.00109700  2783.210938
#>  3:     4 mvl_construct 0.00077755   278.726562
#>  4:     4   changepoint 0.00037815  6462.851562
#>  5:     8            cv 0.00031745    18.742188
#>  6:     8  rm unord map 0.00112580    70.007812
#>  7:     8 mvl_construct 0.00080380    67.484375
#>  8:     8   changepoint 0.00044615    33.375000
#>  9:    16            cv 0.00030985    18.921875
#> 10:    16  rm unord map 0.00106800    70.187500
#> 11:    16 mvl_construct 0.00079845    67.664062
#> 12:    16   changepoint 0.00047945     3.523438
#> 13:    32            cv 0.00033295    19.640625
#> 14:    32  rm unord map 0.00117055    72.820312
#> 15:    32 mvl_construct 0.00081740    70.296875
#> 16:    32   changepoint 0.00055265    10.750000
#> 17:    64            cv 0.00029355    25.382812
#> 18:    64  rm unord map 0.00111875    88.726562
#> 19:    64 mvl_construct 0.00092890    86.203125
#> 20:    64   changepoint 0.00075975    52.734375
#> 21:   128            cv 0.00030855    43.351562
#> 22:   128  rm unord map 0.00121375   120.679688
#> 23:   128 mvl_construct 0.00108045   117.265625
#> 24:   128   changepoint 0.00114570   195.000000
#> 25:   256            cv 0.00033220    64.851562
#> 26:   256  rm unord map 0.00129940   165.679688
#> 27:   256 mvl_construct 0.00145150   161.515625
#> 28:   256   changepoint 0.00217260   704.359375
#> 29:   512            cv 0.00038840   107.851562
#> 30:   512  rm unord map 0.00128300   255.679688
#> 31:   512 mvl_construct 0.00205390   250.015625
#> 32:   512   changepoint 0.00653580  2633.656250
#> 33:  1024            cv 0.00050290   193.851562
#> 34:  1024  rm unord map 0.00150930   435.679688
#> 35:  1024 mvl_construct 0.00360890   429.195312
#> 36:  1024   changepoint 0.02989700 10144.984375
#> 37:  2048            cv 0.00081060   365.851562
#> 38:  2048  rm unord map 0.00193535   795.679688
#> 39:  2048 mvl_construct 0.00649460   781.015625
#> 40:  4096            cv 0.00123470   709.851562
#> 41:  4096  rm unord map 0.00373590  1515.679688
#> 42:  4096 mvl_construct 0.01372335  1489.015625
#> 43:  8192            cv 0.00240960  1397.851562
#> 44:  8192  rm unord map 0.00682480  2955.679688
#> 45: 16384            cv 0.00603525  2773.851562
#> 46: 16384  rm unord map 0.01149465  5835.679688
#> 47: 32768            cv 0.00980750  5525.851562
#> 48: 65536            cv 0.02045715 11029.851562
#>         N     expr.name     median    kilobytes

The results above show that timings were computed for the three different versions of the binsegRcpp code, along with the changepoint code. These data can be plotted via the default method as in the code below,

refs.best <- atime::references_best(atime.list)
plot(refs.best)

plot of chunk unnamed-chunk-9

Cleanup

Below we remove the installed packages, in order to avoid CRAN warnings:

atime::atime_versions_remove("binsegRcpp")
#> [1] 0
options(old.opt)

Basic usage, atime_versions function

Advanced usage, atime_versions_exprs with atime

Cleanup

Basic usage, `atime_versions` function

Advanced usage, `atime_versions_exprs` with `atime`