atime_versions
functionIn this vignette we show you how to compare asymptotic timings of an R expression which uses different versions of a package. Let us begin by cloning the binsegRcpp package,
old.opt <- options(width=100)
tdir <- tempfile()
dir.create(tdir)
git2r::clone("https://github.com/tdhock/binsegRcpp", tdir)
#> cloning into 'C:\Users\th798\AppData\Local\Temp\RtmpAhGltV\file3cc46e267506'...
#> Receiving objects: 1% (13/1258), 63 kb
#> Receiving objects: 11% (139/1258), 63 kb
#> Receiving objects: 21% (265/1258), 127 kb
#> Receiving objects: 31% (390/1258), 127 kb
#> Receiving objects: 41% (516/1258), 183 kb
#> Receiving objects: 51% (642/1258), 183 kb
#> Receiving objects: 61% (768/1258), 183 kb
#> Receiving objects: 71% (894/1258), 239 kb
#> Receiving objects: 81% (1019/1258), 239 kb
#> Receiving objects: 91% (1145/1258), 239 kb
#> Receiving objects: 100% (1258/1258), 251 kb, done.
#> Local: master C:/Users/th798/AppData/Local/Temp/RtmpAhGltV/file3cc46e267506
#> Remote: master @ origin (https://github.com/tdhock/binsegRcpp)
#> Head: [977f385] 2022-08-24: rm rcppdeepstate yaml action
Next, we define a helper function run.atime
that will run
atime_versions
, which is a simple way to compare different github
versions of a function:
run.atime.versions <- function(TDIR){
atime::atime_versions(
pkg.path=TDIR,
N=2^seq(2, 20),
setup={
max.segs <- as.integer(N/2)
data.vec <- 1:N
},
expr=binsegRcpp::binseg_normal(data.vec, max.segs),
cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
"rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
"mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091")
}
Here is an explanation of the arguments specified above:
pkg.path
is the path to the github repository containing the R package,N
is a numeric vector of data sizes,setup
is an R expression which will be run to create data for each size N
,expr
is an R expression which will be timed for each package
version. Under the hood, a different R package is created for each
package version, with package names like Package.SHA,
binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e
. This expr
must contain double or triple colon package name prefix code, like
binsegRcpp::binseg_normal
above, which will be translated to
several different version-specific expressions, like
binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal
.Note that in your code you don't have to create a helper function like
run.atime.versions
in the code above. We do it in the package
vignette code, in order to run the different versions of the code
using callr::r
, in a separate R process. This allows us to avoid
CRAN warnings about unexpected files found in the package check
directory, by safely delete/remove the installed packages, after
having run the example code. In the code block below we compute the
timings,
atime.ver.list <- if(requireNamespace("callr")){
requireNamespace("atime")
callr::r(run.atime.versions, list(tdir))
}else{
run.atime.versions(tdir)
}
#> Loading required namespace: callr
names(atime.ver.list$measurements)
#> [1] "N" "expr.name" "min" "median" "itr/sec" "gc/sec" "n_itr" "n_gc"
#> [9] "result" "memory" "time" "gc" "kilobytes" "q25" "q75" "max"
#> [17] "mean" "sd"
atime.ver.list$measurements[, .(N, expr.name, min, median, max, kilobytes)]
#> N expr.name min median max kilobytes
#> <num> <char> <num> <num> <num> <num>
#> 1: 4 cv 0.0002638 0.00026955 0.0003486 9286.60938
#> 2: 4 rm unord map 0.0010513 0.00108270 0.0012737 2673.42969
#> 3: 4 mvl_construct 0.0007333 0.00076910 0.0032837 278.91406
#> 4: 8 cv 0.0002699 0.00027380 0.0003603 18.74219
#> 5: 8 rm unord map 0.0010847 0.00117040 0.0054281 70.00781
#> 6: 8 mvl_construct 0.0007157 0.00078580 0.0009568 67.48438
#> 7: 16 cv 0.0002705 0.00027460 0.0003687 18.92188
#> 8: 16 rm unord map 0.0010623 0.00111985 0.0012998 70.18750
#> 9: 16 mvl_construct 0.0007556 0.00081700 0.0010086 67.66406
#> 10: 32 cv 0.0002766 0.00030655 0.0004173 19.64062
#> 11: 32 rm unord map 0.0011544 0.00123000 0.0038467 72.82031
#> 12: 32 mvl_construct 0.0008006 0.00083545 0.0010113 70.29688
#> 13: 64 cv 0.0002944 0.00030205 0.0004217 25.38281
#> 14: 64 rm unord map 0.0011087 0.00116505 0.0015315 88.72656
#> 15: 64 mvl_construct 0.0008715 0.00093930 0.0044220 86.20312
#> 16: 128 cv 0.0003028 0.00030735 0.0004363 43.35156
#> 17: 128 rm unord map 0.0011066 0.00114855 0.0014970 120.67969
#> 18: 128 mvl_construct 0.0010173 0.00107275 0.0012461 117.26562
#> 19: 256 cv 0.0003259 0.00033780 0.0004260 64.85156
#> 20: 256 rm unord map 0.0011413 0.00118275 0.0013411 165.67969
#> 21: 256 mvl_construct 0.0013446 0.00141745 0.0015483 161.51562
#> 22: 512 cv 0.0003978 0.00041155 0.0004810 107.85156
#> 23: 512 rm unord map 0.0012517 0.00137185 0.0016203 255.67969
#> 24: 512 mvl_construct 0.0020066 0.00211960 0.0023550 250.01562
#> 25: 1024 cv 0.0004937 0.00051650 0.0007122 193.85156
#> 26: 1024 rm unord map 0.0014681 0.00171030 0.0046410 435.67969
#> 27: 1024 mvl_construct 0.0034753 0.00370460 0.0039303 427.01562
#> 28: 2048 cv 0.0006712 0.00070250 0.0007898 365.85156
#> 29: 2048 rm unord map 0.0019627 0.00223720 0.0028095 795.67969
#> 30: 2048 mvl_construct 0.0067450 0.00687395 0.0089058 781.01562
#> 31: 4096 cv 0.0011736 0.00119435 0.0013280 709.85156
#> 32: 4096 rm unord map 0.0033274 0.00381530 0.0048789 1515.67969
#> 33: 4096 mvl_construct 0.0134129 0.01366050 0.0156015 1489.01562
#> 34: 8192 cv 0.0022507 0.00248565 0.0027683 1397.85156
#> 35: 8192 rm unord map 0.0066052 0.00673415 0.0124579 2955.67969
#> 36: 16384 cv 0.0043061 0.00496710 0.0115577 2773.85156
#> 37: 16384 rm unord map 0.0111116 0.01150120 0.0167759 5835.67969
#> 38: 32768 cv 0.0098131 0.01013680 0.0144229 5525.85156
#> N expr.name min median max kilobytes
The result is a list with a measurements
data table that contains
measurements of time in seconds (min
, median
, max
) and memory
usage (kilobytes
) for every version (expr.name
) and data size
(N
). A more convenient version of the data for plotting can be
obtained via the code below:
best.ver.list <- atime::references_best(atime.ver.list)
names(best.ver.list$measurements)
#> [1] "unit" "N" "expr.name" "min" "median" "itr/sec" "gc/sec"
#> [8] "n_itr" "n_gc" "result" "memory" "time" "gc" "kilobytes"
#> [15] "q25" "q75" "max" "mean" "sd" "fun.name" "fun.latex"
#> [22] "expr.class" "expr.latex" "empirical"
best.ver.list$measurements[, .(N, expr.name, unit, empirical)]
#> N expr.name unit empirical
#> <num> <char> <char> <num>
#> 1: 4 cv kilobytes 9.286609e+03
#> 2: 8 cv kilobytes 1.874219e+01
#> 3: 16 cv kilobytes 1.892188e+01
#> 4: 32 cv kilobytes 1.964063e+01
#> 5: 64 cv kilobytes 2.538281e+01
#> 6: 128 cv kilobytes 4.335156e+01
#> 7: 256 cv kilobytes 6.485156e+01
#> 8: 512 cv kilobytes 1.078516e+02
#> 9: 1024 cv kilobytes 1.938516e+02
#> 10: 2048 cv kilobytes 3.658516e+02
#> 11: 4096 cv kilobytes 7.098516e+02
#> 12: 8192 cv kilobytes 1.397852e+03
#> 13: 16384 cv kilobytes 2.773852e+03
#> 14: 32768 cv kilobytes 5.525852e+03
#> 15: 4 rm unord map kilobytes 2.673430e+03
#> 16: 8 rm unord map kilobytes 7.000781e+01
#> 17: 16 rm unord map kilobytes 7.018750e+01
#> 18: 32 rm unord map kilobytes 7.282031e+01
#> 19: 64 rm unord map kilobytes 8.872656e+01
#> 20: 128 rm unord map kilobytes 1.206797e+02
#> 21: 256 rm unord map kilobytes 1.656797e+02
#> 22: 512 rm unord map kilobytes 2.556797e+02
#> 23: 1024 rm unord map kilobytes 4.356797e+02
#> 24: 2048 rm unord map kilobytes 7.956797e+02
#> 25: 4096 rm unord map kilobytes 1.515680e+03
#> 26: 8192 rm unord map kilobytes 2.955680e+03
#> 27: 16384 rm unord map kilobytes 5.835680e+03
#> 28: 4 mvl_construct kilobytes 2.789141e+02
#> 29: 8 mvl_construct kilobytes 6.748437e+01
#> 30: 16 mvl_construct kilobytes 6.766406e+01
#> 31: 32 mvl_construct kilobytes 7.029687e+01
#> 32: 64 mvl_construct kilobytes 8.620313e+01
#> 33: 128 mvl_construct kilobytes 1.172656e+02
#> 34: 256 mvl_construct kilobytes 1.615156e+02
#> 35: 512 mvl_construct kilobytes 2.500156e+02
#> 36: 1024 mvl_construct kilobytes 4.270156e+02
#> 37: 2048 mvl_construct kilobytes 7.810156e+02
#> 38: 4096 mvl_construct kilobytes 1.489016e+03
#> 39: 4 cv seconds 2.695500e-04
#> 40: 8 cv seconds 2.738000e-04
#> 41: 16 cv seconds 2.746000e-04
#> 42: 32 cv seconds 3.065500e-04
#> 43: 64 cv seconds 3.020500e-04
#> 44: 128 cv seconds 3.073500e-04
#> 45: 256 cv seconds 3.378000e-04
#> 46: 512 cv seconds 4.115500e-04
#> 47: 1024 cv seconds 5.165000e-04
#> 48: 2048 cv seconds 7.025000e-04
#> 49: 4096 cv seconds 1.194350e-03
#> 50: 8192 cv seconds 2.485650e-03
#> 51: 16384 cv seconds 4.967100e-03
#> 52: 32768 cv seconds 1.013680e-02
#> 53: 4 rm unord map seconds 1.082700e-03
#> 54: 8 rm unord map seconds 1.170400e-03
#> 55: 16 rm unord map seconds 1.119850e-03
#> 56: 32 rm unord map seconds 1.230000e-03
#> 57: 64 rm unord map seconds 1.165050e-03
#> 58: 128 rm unord map seconds 1.148550e-03
#> 59: 256 rm unord map seconds 1.182750e-03
#> 60: 512 rm unord map seconds 1.371850e-03
#> 61: 1024 rm unord map seconds 1.710300e-03
#> 62: 2048 rm unord map seconds 2.237200e-03
#> 63: 4096 rm unord map seconds 3.815300e-03
#> 64: 8192 rm unord map seconds 6.734150e-03
#> 65: 16384 rm unord map seconds 1.150120e-02
#> 66: 4 mvl_construct seconds 7.691000e-04
#> 67: 8 mvl_construct seconds 7.858000e-04
#> 68: 16 mvl_construct seconds 8.170000e-04
#> 69: 32 mvl_construct seconds 8.354500e-04
#> 70: 64 mvl_construct seconds 9.393000e-04
#> 71: 128 mvl_construct seconds 1.072750e-03
#> 72: 256 mvl_construct seconds 1.417450e-03
#> 73: 512 mvl_construct seconds 2.119600e-03
#> 74: 1024 mvl_construct seconds 3.704600e-03
#> 75: 2048 mvl_construct seconds 6.873950e-03
#> 76: 4096 mvl_construct seconds 1.366050e-02
#> N expr.name unit empirical
The data table above is a tall/long version of the same data, which can be plotted using the code below:
if(require(ggplot2)){
hline.df <- with(atime.ver.list, data.frame(seconds.limit, unit="seconds"))
gg <- ggplot()+
theme_bw()+
facet_grid(unit ~ ., scales="free")+
geom_hline(aes(
yintercept=seconds.limit),
color="grey",
data=hline.df)+
geom_line(aes(
N, empirical, color=expr.name),
data=best.ver.list$meas)+
geom_ribbon(aes(
N, ymin=min, ymax=max, fill=expr.name),
data=best.ver.list$meas[unit=="seconds"],
alpha=0.5)+
scale_x_log10()+
scale_y_log10("median line, min/max band")
if(require(directlabels)){
gg+
directlabels::geom_dl(aes(
N, empirical, color=expr.name, label=expr.name),
method="right.polygons",
data=best.ver.list$meas)+
theme(legend.position="none")+
coord_cartesian(xlim=c(1,2e7))
}else{
gg
}
}
atime_versions_exprs
with atime
What if you wanted to compare different versions of one R package, to another R package? Continuing the example above, we can get a list of expressions, each one for a different version of the package, via the code below:
(ver.list <- atime::atime_versions_exprs(
pkg.path=tdir,
expr=binsegRcpp::binseg_normal(data.vec, max.segs),
cv="908b77c411bc7f4fcbcf53759245e738ae724c3e",
"rm unord map"="dcd0808f52b0b9858352106cc7852e36d7f5b15d",
"mvl_construct"="5942af606641428315b0e63c7da331c4cd44c091"))
#> $cv
#> binsegRcpp.908b77c411bc7f4fcbcf53759245e738ae724c3e::binseg_normal(data.vec,
#> max.segs)
#>
#> $`rm unord map`
#> binsegRcpp.dcd0808f52b0b9858352106cc7852e36d7f5b15d::binseg_normal(data.vec,
#> max.segs)
#>
#> $mvl_construct
#> binsegRcpp.5942af606641428315b0e63c7da331c4cd44c091::binseg_normal(data.vec,
#> max.segs)
The ver.list
created above can be augmented with other expressions,
such as the following alternative implementation of binary
segmentation from the changepoint package,
expr.list <- c(ver.list, if(requireNamespace("changepoint")){
list(changepoint=substitute(changepoint::cpt.mean(
data.vec, penalty="Manual", pen.value=0, method="BinSeg",
Q=max.segs-1)))
})
The expr.list
created above can be provided as an argument to the
atime
function as in the code below,
run.atime <- function(ELIST){
atime::atime(
N=2^seq(2, 20),
setup={
max.segs <- as.integer(N/2)
data.vec <- 1:N
},
expr.list=ELIST)
}
atime.list <- if(requireNamespace("callr")){
requireNamespace("atime")
callr::r(run.atime, list(expr.list))
}else{
run.atime(expr.list)
}
atime.list$measurements[, .(N, expr.name, median, kilobytes)]
#> N expr.name median kilobytes
#> <num> <char> <num> <num>
#> 1: 4 cv 0.00046950 9318.882812
#> 2: 4 rm unord map 0.00109700 2783.210938
#> 3: 4 mvl_construct 0.00077755 278.726562
#> 4: 4 changepoint 0.00037815 6462.851562
#> 5: 8 cv 0.00031745 18.742188
#> 6: 8 rm unord map 0.00112580 70.007812
#> 7: 8 mvl_construct 0.00080380 67.484375
#> 8: 8 changepoint 0.00044615 33.375000
#> 9: 16 cv 0.00030985 18.921875
#> 10: 16 rm unord map 0.00106800 70.187500
#> 11: 16 mvl_construct 0.00079845 67.664062
#> 12: 16 changepoint 0.00047945 3.523438
#> 13: 32 cv 0.00033295 19.640625
#> 14: 32 rm unord map 0.00117055 72.820312
#> 15: 32 mvl_construct 0.00081740 70.296875
#> 16: 32 changepoint 0.00055265 10.750000
#> 17: 64 cv 0.00029355 25.382812
#> 18: 64 rm unord map 0.00111875 88.726562
#> 19: 64 mvl_construct 0.00092890 86.203125
#> 20: 64 changepoint 0.00075975 52.734375
#> 21: 128 cv 0.00030855 43.351562
#> 22: 128 rm unord map 0.00121375 120.679688
#> 23: 128 mvl_construct 0.00108045 117.265625
#> 24: 128 changepoint 0.00114570 195.000000
#> 25: 256 cv 0.00033220 64.851562
#> 26: 256 rm unord map 0.00129940 165.679688
#> 27: 256 mvl_construct 0.00145150 161.515625
#> 28: 256 changepoint 0.00217260 704.359375
#> 29: 512 cv 0.00038840 107.851562
#> 30: 512 rm unord map 0.00128300 255.679688
#> 31: 512 mvl_construct 0.00205390 250.015625
#> 32: 512 changepoint 0.00653580 2633.656250
#> 33: 1024 cv 0.00050290 193.851562
#> 34: 1024 rm unord map 0.00150930 435.679688
#> 35: 1024 mvl_construct 0.00360890 429.195312
#> 36: 1024 changepoint 0.02989700 10144.984375
#> 37: 2048 cv 0.00081060 365.851562
#> 38: 2048 rm unord map 0.00193535 795.679688
#> 39: 2048 mvl_construct 0.00649460 781.015625
#> 40: 4096 cv 0.00123470 709.851562
#> 41: 4096 rm unord map 0.00373590 1515.679688
#> 42: 4096 mvl_construct 0.01372335 1489.015625
#> 43: 8192 cv 0.00240960 1397.851562
#> 44: 8192 rm unord map 0.00682480 2955.679688
#> 45: 16384 cv 0.00603525 2773.851562
#> 46: 16384 rm unord map 0.01149465 5835.679688
#> 47: 32768 cv 0.00980750 5525.851562
#> 48: 65536 cv 0.02045715 11029.851562
#> N expr.name median kilobytes
The results above show that timings were computed for the three different versions of the binsegRcpp code, along with the changepoint code. These data can be plotted via the default method as in the code below,
refs.best <- atime::references_best(atime.list)
plot(refs.best)
Below we remove the installed packages, in order to avoid CRAN warnings:
atime::atime_versions_remove("binsegRcpp")
#> [1] 0
options(old.opt)