abess: Fast Best-Subset Selection in Python and R

Python Build R Build codecov docs R docs cran pypi pyversions License Codacy Badge CodeFactor

abess (Adaptive BEst Subset Selection) library aims to solve the general best subset selection problem, i.e., find a small subset of predictors such that the resulting model is expected to have the highest accuracy. The selection for best subset shows great value in scientific researches and practical application. For example, clinicians wants to know whether a patient is health or not based on the expression level of a few of important genes.

This library implements a generic algorithm framework to find the optimal solution in an extremely fast way. This framework now supports the detection of best subset under: linear regression, classification (binary or multi-class), counting-response modeling, censored-response modeling, multi-response modeling (multi-tasks learning), etc. It also supports the variants of best subset selection like group best subset selection, nuisance penalized regression, especially, the time complexity of the best (group) subset selection for linear regression is certifiably polynomial.

Installation

To install the abess R package from CRAN, just run:

install.packages("abess")

Alternative, you can install the newest version of abess by following this instruction.

Runtime Performance

To show the power of abess in computation, we assess its timings of the CPU execution (seconds) on synthetic datasets, and compare to state-of-the-art variable selection methods. The variable selection and estimation results are deferred to performance. All computations are conducted on a Ubuntu platform with Intel(R) Core(TM) i9-9940X CPU @ 3.30GHz and 48 RAM. We compare abess R package with three widely used R packages: glmnet, ncvreg, and L0Learn. We get the runtime comparison results:

Compared with the other packages, abess shows competitive computational efficiency, and achieves the best computational power when variables have a large correlation.

Conducting the following command in shell can reproduce the above results in R:

$ Rscript abess/docs/simulation/R/timings.R

Open source software

abess is a free software and its source code are publicly available in Github. The core framework is programmed in C++. You can redistribute it and/or modify it under the terms of the GPL-v3 License. We welcome contributions for abess, especially stretching abess to the other best subset selection problems.

What’s news?

New features supported by the latest version (0.4.5):

Citation

If you use abess or refer to our tutorials in a presentation or publication, we would appreciate citations of our library. < Jin Zhu, Liyuan Hu, Junhao Huang, Kangkang Jiang, Yanhang Zhang, Shiyun Lin, Junxian Zhu, Xueqin Wang (2021). “abess: A Fast Best Subset Selection Library in Python and R.” arXiv:2110.09697.>

The corresponding BibteX entry:

@article{zhu-abess-arxiv,
  author    = {Jin Zhu and Liyuan Hu and Junhao Huang and Kangkang Jiang and Yanhang Zhang and Shiyun Lin and Junxian Zhu and Xueqin Wang},
  title     = {abess: A Fast Best Subset Selection Library in Python and R},
  journal   = {arXiv:2110.09697},
  year      = {2021},
}

References