The precrec
package provides accurate computations of ROC and Precision-Recall curves.
The evalmod
function calculates ROC and Precision-Recall curves and returns an S3 object.
library(precrec)
# Load a test dataset
data(P10N10)
# Calculate ROC and Precision-Recall curves
<- evalmod(scores = P10N10$scores, labels = P10N10$labels) sscurves
The R language specifies S3 objects and S3 generic functions as part of the most basic object-oriented system in R. The precrec
package provides nine S3 generics for the S3 object created by the evalmod
function.
S3 generic | Package | Description |
---|---|---|
base | Print the calculation results and the summary of the test data | |
as.data.frame | base | Convert a precrec object to a data frame |
plot | graphics | Plot performance evaluation measures |
autoplot | ggplot2 | Plot performance evaluation measures with ggplot2 |
fortify | ggplot2 | Prepare a data frame for ggplot2 |
auc | precrec | Make a data frame with AUC scores |
part | precrec | Set partial curves and calculate AUC scores |
pauc | precrec | Make a data frame with pAUC scores |
auc_ci | precrec | Calculate confidence intervals of AUC scores |
The plot
function outputs ROC and Precision-Recall curves
# Show ROC and Precision-Recall plots
plot(sscurves)
# Show a Precision-Recall plot
plot(sscurves, "PRC")
The autoplot
function outputs ROC and Precision-Recall curves by using the ggplot2
package.
# The ggplot2 package is required
library(ggplot2)
# Show ROC and Precision-Recall plots
autoplot(sscurves)
# Show a Precision-Recall plot
autoplot(sscurves, "PRC")
Reduced supporting points make the plotting speed faster for large data sets.
# 5 data sets with 50000 positives and 50000 negatives
<- create_sim_samples(5, 50000, 50000)
samp1
# Calculate curves
<- evalmod(scores = samp1$scores, labels = samp1$labels)
eval1
# Reduced supporting points
system.time(autoplot(eval1))
# Full supporting points
system.time(autoplot(eval1, reduce_points = FALSE))
## user system elapsed
## 0.456 0.069 0.587
## user system elapsed
## 9.631 0.787 10.480
The auc
function outputs a data frame with the AUC (Area Under the Curve) scores.
# Get a data frame with AUC scores
<- auc(sscurves)
aucs
# Use knitr::kable to display the result in a table format
::kable(aucs) knitr
modnames | dsids | curvetypes | aucs |
---|---|---|---|
m1 | 1 | ROC | 0.7200000 |
m1 | 1 | PRC | 0.7397716 |
# Get AUCs of Precision-Recall
<- subset(aucs, curvetypes == "PRC")
aucs_prc ::kable(aucs_prc) knitr
modnames | dsids | curvetypes | aucs | |
---|---|---|---|---|
2 | m1 | 1 | PRC | 0.7397716 |
The as.data.frame
function converts a precrec object to a data frame.
# Convert sscurves to a data frame
<- as.data.frame(sscurves)
sscurves.df
# Use knitr::kable to display the result in a table format
::kable(head(sscurves.df)) knitr
x | y | modname | dsid | type |
---|---|---|---|---|
0.000 | 0.0 | m1 | 1 | ROC |
0.000 | 0.1 | m1 | 1 | ROC |
0.000 | 0.2 | m1 | 1 | ROC |
0.001 | 0.2 | m1 | 1 | ROC |
0.002 | 0.2 | m1 | 1 | ROC |
0.003 | 0.2 | m1 | 1 | ROC |
The precrec
package provides four functions for data preparation.
Function | Description |
---|---|
join_scores | Join scores of multiple models into a list |
join_labels | Join observed labels of multiple test datasets into a list |
mmdata | Reformat input data for performance evaluation calculation |
create_sim_samples | Create random samples for simulations |
The join_scores
function combines multiple score datasets.
<- c(1, 2, 3, 4)
s1 <- c(5, 6, 7, 8)
s2 <- matrix(1:8, 4, 2)
s3
# Join two score vectors
<- join_scores(s1, s2)
scores1
# Join two vectors and a matrix
<- join_scores(s1, s2, s3) scores2
The join_labels
function combines multiple score datasets.
<- c(1, 0, 1, 1)
l1 <- c(1, 0, 1, 1)
l2 <- c(1, 0, 1, 0)
l3
# Join two label vectors
<- join_labels(l1, l2)
labels1 <- join_labels(l1, l3) labels2
The mmdata
function makes an input dataset for the evalmod
function.
# Create an input dataset with two score vectors and one label vector
<- mmdata(scores1, labels1)
msmdat
# Specify dataset IDs
<- mmdata(scores1, labels2, dsids = c(1, 2))
smmdat
# Specify model names and dataset IDs
<- mmdata(scores1, labels2, modnames = c("mod1", "mod2"), dsids = c(1, 2)) mmmdat
The create_sim_samples
function is useful to make a random sample dataset with different performance levels.
Level name | Description |
---|---|
random | Random |
poor_er | Poor early retrieval |
good_er | Good early retrieval |
excel | Excellent |
perf | Perfect |
all | All of the above |
# A dataset with 10 positives and 10 negatives for the random performance level
<- create_sim_samples(1, 10, 10, "random")
samps1
# A dataset for five different performance levels
<- create_sim_samples(1, 10, 10, "all")
samps2
# A dataset with 20 samples for the good early retrieval performance level
<- create_sim_samples(20, 10, 10, "good_er")
samps3
# A dataset with 20 samples for five different performance levels
<- create_sim_samples(20, 10, 10, "all") samps4
The evalmod
function calculate performance evaluation for multiple models when multiple model names are specified with the mmdata
or the evalmod
function.
There are several ways to create a dataset with the mmdata
function for multiple models.
# Use a list with multiple score vectors and a list with a single label vector
<- mmdata(scores1, labels1)
msmdat1
# Explicitly specify model names
<- mmdata(scores1, labels1, modnames = c("mod1", "mod2"))
msmdat2
# Use a sample dataset created by the create_sim_samples function
<- mmdata(samps2[["scores"]], samps2[["labels"]], modnames = samps2[["modnames"]]) msmdat3
The evalmod
function automatically detects multiple models.
# Calculate ROC and Precision-Recall curves for multiple models
<- evalmod(msmdat3) mscurves
All the S3 generics are effective for the S3 object generated by this approach.
# Show ROC and Precision-Recall curves with the ggplot2 package
autoplot(mscurves)
The as.data.frame
function also works with this object.
# Convert mscurves to a data frame
<- as.data.frame(mscurves)
mscurves.df
# Use knitr::kable to display the result in a table format
::kable(head(mscurves.df)) knitr
x | y | modname | dsid | type |
---|---|---|---|---|
0.000 | 0.0 | random | 1 | ROC |
0.000 | 0.1 | random | 1 | ROC |
0.001 | 0.1 | random | 1 | ROC |
0.002 | 0.1 | random | 1 | ROC |
0.003 | 0.1 | random | 1 | ROC |
0.004 | 0.1 | random | 1 | ROC |
The evalmod
function calculate performance evaluation for multiple test datasets when different test dataset IDs are specified with the mmdata
or the evalmod
function.
There are several ways to create a dataset with the mmdata
function for multiple test datasets.
# Specify test dataset IDs names
<- mmdata(scores1, labels2, dsids = c(1,2))
smmdat1
# Use a sample dataset created by the create_sim_samples function
<- mmdata(samps3[["scores"]], samps3[["labels"]], dsids = samps3[["dsids"]]) smmdat2
The evalmod
function automatically detects multiple test datasets.
# Calculate curves for multiple test datasets and keep all the curves
<- evalmod(smmdat2, raw_curves = TRUE) smcurves
All the S3 generics are effective for the S3 object generated by this approach.
# Show an average Precision-Recall curve with the 95% confidence bounds
autoplot(smcurves, "PRC", show_cb = TRUE)
# Show raw Precision-Recall curves
autoplot(smcurves, "PRC", show_cb = FALSE)
The as.data.frame
function also works with this object.
# Convert smcurves to a data frame
<- as.data.frame(smcurves)
smcurves.df
# Use knitr::kable to display the result in a table format
::kable(head(smcurves.df)) knitr
x | y | modname | dsid | type |
---|---|---|---|---|
0.000 | 0.0 | m1 | 1 | ROC |
0.000 | 0.1 | m1 | 1 | ROC |
0.000 | 0.2 | m1 | 1 | ROC |
0.000 | 0.3 | m1 | 1 | ROC |
0.001 | 0.3 | m1 | 1 | ROC |
0.002 | 0.3 | m1 | 1 | ROC |
The evalmod
function calculates performance evaluation for multiple models and multiple test datasets when different model names and test dataset IDs are specified with the mmdata
or the evalmod
function.
There are several ways to create a dataset with the mmdata
function for multiple models and multiple datasets.
# Specify model names and test dataset IDs names
<- mmdata(scores1, labels2, modnames= c("mod1", "mod2"), dsids = c(1, 2))
mmmdat1
# Use a sample dataset created by the create_sim_samples function
<- mmdata(samps4[["scores"]], samps4[["labels"]],
mmmdat2 modnames = samps4[["modnames"]], dsids = samps4[["dsids"]])
The evalmod
function automatically detects multiple models and multiple test datasets.
# Calculate curves for multiple models and multiple test datasets
<- evalmod(mmmdat2) mmcurves
All the S3 generics are effective for the S3 object generated by this approach.
# Show average Precision-Recall curves
autoplot(mmcurves, "PRC")
# Show average Precision-Recall curves with the 95% confidence bounds
autoplot(mmcurves, "PRC", show_cb = TRUE)
The as.data.frame
function also works with this object.
# Convert smcurves to a data frame
<- as.data.frame(mmcurves)
mmcurves.df
# Use knitr::kable to display the result in a table format
::kable(head(mmcurves.df)) knitr
x | y | ymin | ymax | modname | type |
---|---|---|---|---|---|
0.000 | 0.0 | 0.0000000 | 0.0000000 | random | ROC |
0.000 | 0.1 | 0.0449298 | 0.1550702 | random | ROC |
0.001 | 0.1 | 0.0449298 | 0.1550702 | random | ROC |
0.002 | 0.1 | 0.0449298 | 0.1550702 | random | ROC |
0.003 | 0.1 | 0.0449298 | 0.1550702 | random | ROC |
0.004 | 0.1 | 0.0449298 | 0.1550702 | random | ROC |
The evalmod
function automatically calculates confidence bands when a model contains multiple test sets in provided dataset. Confidence intervals are calculated for additional supporting points, which are specified by the ‘x_bins’ option of the evalmod
function.
The dataset smmdat2
contains 20 samples for a single model/classifier.
# Show all curves
<- evalmod(smmdat2, raw_curves = TRUE)
smcurves_all autoplot(smcurves_all)
Additional supporting points are calculated for x = (0, 0.5, 1.0)
when x_bins
is set to 2.
# x_bins: 2
<- evalmod(smmdat2, x_bins = 2)
smcurves_xb2 autoplot(smcurves_xb2)
Additional supporting points are calculated for x = (0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0)
when x_bins
is set to 10.
# x_bins: 10
<- evalmod(smmdat2, x_bins = 10)
smcurves_xb10 autoplot(smcurves_xb10)
The evalmod
function accepts the cb_alpha
option to specify the alpha value of the point-wise confidence bounds calculation. For instance, 95% confidence bands are calculated when cb_alpha
is 0.05.
# cb_alpha: 0.1 for 90% confidence band
<- evalmod(smmdat2, x_bins = 10, cb_alpha = 0.1)
smcurves_cb1 autoplot(smcurves_cb1)
# cb_alpha: 0.01 for 99% confidence band
<- evalmod(smmdat2, x_bins = 10, cb_alpha = 0.01)
smcurves_cb2 autoplot(smcurves_cb2)
The format_nfold
function takes a data frame with scores, label and n-fold columns and convert it to a list for evalmod
and mmdata
.
# Load data
data(M2N50F5)
# Use knitr::kable to display the result in a table format
::kable(head(M2N50F5)) knitr
score1 | score2 | label | fold |
---|---|---|---|
2.0606025 | 1.0689227 | pos | 1 |
0.3066092 | 0.1745491 | pos | 3 |
1.5597733 | -1.5666375 | pos | 1 |
-0.6044989 | 1.1572727 | pos | 3 |
-0.2229031 | 0.6070042 | pos | 5 |
-0.7679551 | -1.7908147 | pos | 5 |
# Convert data frame to list
= format_nfold(nfold_df = M2N50F5, score_cols = c(1, 2),
nfold_list1 lab_col = 3, fold_col = 4)
# Use column names
= format_nfold(nfold_df = M2N50F5, score_cols = c("score1", "score2"),
nfold_list2 lab_col = "label", fold_col = "fold")
# Use the result for evalmod
<- evalmod(scores = nfold_list2$scores, labels = nfold_list2$labels,
cvcurves modnames = rep(c("m1", "m2"), each = 5), dsids = rep(1:5, 2))
autoplot(cvcurves)
Both evalmod
and mmdata
function can directly take the arguments of the format_nfold
function.
# mmdata
<- mmdata(nfold_df = M2N50F5, score_cols = c(1, 2), lab_col = 3, fold_col = 4,
cvcurves2 modnames = c("m1", "m2"), dsids = 1:5)
# evalmod
<- evalmod(nfold_df = M2N50F5, score_cols = c(1, 2), lab_col = 3, fold_col = 4,
cvcurves3 modnames = c("m1", "m2"), dsids = 1:5)
autoplot(cvcurves3)
The evalmod
function also calculates basic evaluation measures - error, accuracy, specificity, sensitivity, and precision.
Measure | Description |
---|---|
error | Error rate |
accuracy | Accuracy |
specificity | Specificity, TNR, 1 - FPR |
sensitivity | Sensitivity, TPR, Recall |
precision | Precision, PPV |
mcc | Matthews correlation coefficient |
fscore | F-score |
The mode = "basic"
option makes the evalmod
function calculate the basic evaluation measures instead of performing ROC and Precision-Recall calculations.
# Calculate basic evaluation measures
<- evalmod(mmmdat2, mode = "basic") mmpoins
All the S3 generics except for auc
, part
and pauc
are effective for the S3 object generated by this approach.
# Show normalized ranks vs. error rate and accuracy
autoplot(mmpoins, c("error", "accuracy"))
# Show normalized ranks vs. specificity, sensitivity, and precision
autoplot(mmpoins, c("specificity", "sensitivity", "precision"))
# Show normalized ranks vs. Matthews correlation coefficient and F-score
autoplot(mmpoins, c("mcc", "fscore"))
In addition to the basic measures, the autoplot
function can plot normalized ranks vs. scores and labels.
# Show normalized ranks vs. scores and labels
autoplot(mmpoins, c("score", "label"))
The as.data.frame
function also works for the precrec objects of the basic measures.
# Convert mmpoins to a data frame
<- as.data.frame(mmpoins)
mmpoins.df
# Use knitr::kable to display the result in a table format
::kable(head(mmpoins.df)) knitr
x | y | ymin | ymax | modname | type |
---|---|---|---|---|---|
0.00 | NA | NA | NA | random | score |
0.05 | 1.8692210 | 1.6771141 | 2.0613279 | random | score |
0.10 | 1.3942640 | 1.2911874 | 1.4973405 | random | score |
0.15 | 1.1506842 | 1.0636751 | 1.2376933 | random | score |
0.20 | 0.9303034 | 0.8224658 | 1.0381411 | random | score |
0.25 | 0.7329731 | 0.6130149 | 0.8529312 | random | score |
The part
function calculates partial AUCs and standardized partial AUCs of both ROC and precision-recall curves. Standardized pAUCs (spAUCs) are standardized to the score range between 0 and 1.
It requires an S3 object produced by the evalmod
function and uses xlim
and ylim
to specify the partial area of your choice. The pauc
function outputs a data frame with the pAUC scores.
# Calculate ROC and Precision-Recall curves
<- evalmod(scores = P10N10$scores, labels = P10N10$labels)
curves
# Calculate partial AUCs
<- part(curves, xlim = c(0.0, 0.25))
curves.part
# Retrieve a dataframe of pAUCs
<- pauc(curves.part)
paucs.df
# Use knitr::kable to display the result in a table format
::kable(paucs.df) knitr
modnames | dsids | curvetypes | paucs | spaucs |
---|---|---|---|---|
m1 | 1 | ROC | 0.1006250 | 0.4025000 |
m1 | 1 | PRC | 0.2345849 | 0.9383396 |
All the S3 generics are effective for the S3 object generated by this approach.
# Show ROC and Precision-Recall curves
autoplot(curves.part)
The area under the ROC curve can be calculated from the U statistic, which is the test statistic of the Mann–Whitney U test.
The evalmod
function calculates AUCs with the U statistic when mode = ‘aucroc’.
# Calculate AUC (ROC)
<- evalmod(scores = P10N10$scores, labels = P10N10$labels, mode = 'aucroc')
aucs
# Convert to data.frame
<- as.data.frame(aucs)
aucs.df
# Use knitr::kable to display the result in a table format
::kable(aucs.df) knitr
modnames | dsids | aucs | ustats |
---|---|---|---|
m1 | 1 | 0.72 | 72 |
The auc_ci
function calculates confidence intervals of the calculated ROCs by the evalmod
function.
The auc_ci
function calculates CIs for both ROC and precision-recall AUCs. The specified data must contain multiple datasets, such as cross-validation data.
# Calculate CI of AUCs with normal distibution
<- auc_ci(smcurves)
auc_ci
# Use knitr::kable to display the result in a table format
::kable(auc_ci) knitr
modnames | curvetypes | mean | error | lower_bound | upper_bound | n |
---|---|---|---|---|---|---|
m1 | ROC | 0.7570000 | 0.0427070 | 0.7142930 | 0.7997070 | 20 |
m1 | PRC | 0.7968657 | 0.0389531 | 0.7579126 | 0.8358188 | 20 |
The auc_ci
function accepts a different significance level.
# Calculate CI of AUCs with alpha = 0.01
<- auc_ci(smcurves, alpha = 0.01)
auc_ci_a
# Use knitr::kable to display the result in a table format
::kable(auc_ci_a) knitr
modnames | curvetypes | mean | error | lower_bound | upper_bound | n |
---|---|---|---|---|---|---|
m1 | ROC | 0.7570000 | 0.0561265 | 0.7008735 | 0.8131265 | 20 |
m1 | PRC | 0.7968657 | 0.0511931 | 0.7456726 | 0.8480587 | 20 |
The auc_ci
function accepts either normal or t-distribution for CI calculation.
# Calculate CI of AUCs t-distribution
<- auc_ci(smcurves, dtype = "t")
auc_ci_t
# Use knitr::kable to display the result in a table format
::kable(auc_ci_t) knitr
modnames | curvetypes | mean | error | lower_bound | upper_bound | n |
---|---|---|---|---|---|---|
m1 | ROC | 0.7570000 | 0.0456063 | 0.7113937 | 0.8026063 | 20 |
m1 | PRC | 0.7968657 | 0.0415976 | 0.7552681 | 0.8384633 | 20 |
It is easy to simulate various scenarios, such as balanced vs. imbalanced datasets, by using the evalmod
and create_sim_samples
functions.
# Balanced dataset
<- create_sim_samples(100, 100, 100, "all")
samps5 <- mmdata(samps5[["scores"]], samps5[["labels"]],
simmdat1 modnames = samps5[["modnames"]], dsids = samps5[["dsids"]])
# Imbalanced dataset
<- create_sim_samples(100, 25, 100, "all")
samps6 <- mmdata(samps6[["scores"]], samps6[["labels"]],
simmdat2 modnames = samps6[["modnames"]], dsids = samps6[["dsids"]])
The evalmod
function automatically detects multiple models and multiple test datasets.
# Balanced dataset
<- evalmod(simmdat1)
simcurves1
# Imbalanced dataset
<- evalmod(simmdat2) simcurves2
ROC plots are unchanged between balanced and imbalanced datasets, whereas Precision-Recall plots show a clear difference between them. See our article or website for potential pitfalls by using ROC plots with imbalanced datasets.
# Balanced dataset
autoplot(simcurves1)
# Imbalanced dataset
autoplot(simcurves2)
Precrec: fast and accurate precision-recall and ROC curve calculations in R
Takaya Saito; Marc Rehmsmeier
Bioinformatics 2017; 33 (1): 145-147.
Classifier evaluation with imbalanced datasets - our web site that contains several pages with useful tips for performance evaluation on binary classifiers.
The Precision-Recall Plot Is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets - our paper that summarized potential pitfalls of ROC plots with imbalanced datasets and advantages of using precision-recall plots instead.