Package MKclass includes a collection of functions that I found useful for statistical classification.
We first load the package.
library(MKclass)
Given the incidence of the outcome of interest in the nonexposed (p0) and exposed (p1) group, several risk measures can be computed.
## Example from Wikipedia
risks(p0 = 0.4, p1 = 0.1)
## p0 p1 RR OR RRR ARR NNT
## 0.4000000 0.1000000 0.2500000 0.1666667 0.7500000 0.3000000 3.3333333
risks(p0 = 0.4, p1 = 0.5)
## p0 p1 RR OR RRI ARI NNH
## 0.40 0.50 1.25 1.50 0.25 0.10 10.00
Given p0 or p1 and OR, we can compute the respective RR.
or2rr(or = 1.5, p0 = 0.4)
## [1] 1.25
or2rr(or = 1/6, p1 = 0.1)
## [1] 0.25
There is also a function for computing an approximate confidence interval for the relative risk (RR).
## Example from Wikipedia
rrCI(a = 15, b = 135, c = 100, d = 150)
## $estimate
## relative risk
## 0.25
##
## $conf.int
## 2.5 % 97.5 %
## relative risk 0.1510993 0.4136353
## attr(,"conf.level")
## [1] 0.95
##
## $method
## [1] "asymptotic confidence interval"
##
## attr(,"class")
## [1] "confint"
rrCI(a = 75, b = 75, c = 100, d = 150)
## $estimate
## relative risk
## 1.25
##
## $conf.int
## 2.5 % 97.5 %
## relative risk 1.00256 1.55851
## attr(,"conf.level")
## [1] 0.95
##
## $method
## [1] "asymptotic confidence interval"
##
## attr(,"class")
## [1] "confint"
There are two functions that can be used to calculate and test AUC values. First function AUC, which computes the area under the receiver operating characteristic curve (AUC under ROC curve) using the connection of AUC to the Wilcoxon rank sum test. We use some random data and groups to demonstrate the use of this function.
x <- c(runif(50, max = 0.6), runif(50, min = 0.4))
g <- c(rep(0, 50), rep(1, 50))
AUC(x, group = g)
## area under curve (AUC)
## 0.9576
Sometimes the labels of the group should be switched to avoid an AUC smaller than 0.5, which represents a result worse than a pure random choice.
g <- c(rep(1, 50), rep(0, 50))
AUC(x, group = g)
## Warning in AUC(x, group = g): The computed AUC value 0.0424 will be replaced by
## 0.9576 which can be achieved be interchanging the sample labels!
## area under curve (AUC)
## 0.9576
## no switching
AUC(x, group = g, switchAUC = FALSE)
## area under curve (AUC)
## 0.0424
We can also perform statistical tests for AUC. First, the one-sample test which corresponds to the Wilcoxon signed rank test.
g <- c(rep(0, 50), rep(1, 50))
AUC.test(pred1 = x, lab1 = g)
## $Variable1
## AUC SE low CI up CI
## 0.95760000 0.02092592 0.91658595 0.99861405
##
## $Test
##
## Wilcoxon rank sum test with continuity correction
##
## data: 0 and 1
## W = 106, p-value = 3.194e-15
## alternative hypothesis: true AUC is not equal to 0.5
We can also compare two AUC using the test of Hanley and McNeil (1982).
x2 <- c(runif(50, max = 0.7), runif(50, min = 0.3))
g2 <- c(rep(0, 50), rep(1, 50))
AUC.test(pred1 = x, lab1 = g, pred2 = x2, lab2 = g2)
## $Variable1
## AUC SE low CI up CI
## 0.95760000 0.02092592 0.91658595 0.99861405
##
## $Variable2
## AUC SE low CI up CI
## 0.82080000 0.04238554 0.73772586 0.90387414
##
## $Test
##
## Hanley and McNeil test for two AUCs
##
## data: x and x2
## z = 2.894, p-value = 0.003803
## alternative hypothesis: true difference in AUC is not equal to 0
## 95 percent confidence interval:
## 0.04415301 0.22944699
## sample estimates:
## Difference in AUC
## 0.1368
There is also a function for pairwise comparison if there are more than two groups.
x3 <- c(x, x2)
g3 <- c(g, c(rep(2, 50), rep(3, 50)))
pairwise.auc(x = x3, g = g3)
## Warning in AUC(xi, xj): The computed AUC value 0.0424 will be replaced by 0.9576
## which can be achieved be interchanging the sample labels!
## Warning in AUC(xi, xj): The computed AUC value 0.4176 will be replaced by 0.5824
## which can be achieved be interchanging the sample labels!
## Warning in AUC(xi, xj): The computed AUC value 0.1272 will be replaced by 0.8728
## which can be achieved be interchanging the sample labels!
## Warning in AUC(xi, xj): The computed AUC value 0.1792 will be replaced by 0.8208
## which can be achieved be interchanging the sample labels!
## 0 vs 1 0 vs 2 0 vs 3 1 vs 2 1 vs 3 2 vs 3
## 0.9576 0.5824 0.8728 0.9028 0.6376 0.8208
In case of medical diagnostic tests, usually sensitivity and specificity of the tests are known and there is also at least a rough estimate of the prevalence of the tested disease. In the practival application, the positive predictive value (PPV) and the negative predictive value are of crucial importance.
## Example: HIV test
## 1. ELISA screening test (4th generation)
predValues(sens = 0.999, spec = 0.998, prev = 0.001)
## PPV NPV
## 0.3333333 0.9999990
## 2. Western-Plot confirmation test
predValues(sens = 0.998, spec = 0.999996, prev = 1/3)
## PPV NPV
## 0.999992 0.999001
In the development of diagnostic tests and more general in binary classification a variety of performance measures and scores can be found in literature. Functions perfMeasures and prefScores compute several of them.
## example from dataset infert
fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
pred <- predict(fit, type = "response")
## with group numbers
perfMeasures(pred, truth = infert$case, namePos = 1)
##
## Performance Measure(s)
##
## Measure Value
## 1 accuracy (ACC) 0.71370968
## 2 probability of correct classification (PCC) 0.71370968
## 3 fraction correct (FC) 0.71370968
## 4 simple matching coefficient (SMC) 0.71370968
## 5 Rand (similarity) index (RSI) 0.71370968
## 6 probability of misclassification (PMC) 0.28629032
## 7 error rate (ER) 0.28629032
## 8 fraction incorrect (FIC) 0.28629032
## 9 sensitivity (SENS) 0.33734940
## 10 recall (REC) 0.33734940
## 11 true positive rate (TPR) 0.33734940
## 12 probability of detection (PD) 0.33734940
## 13 hit rate (HR) 0.33734940
## 14 specificity (SPEC) 0.90303030
## 15 true negative rate (TNR) 0.90303030
## 16 selectivity (SEL) 0.90303030
## 17 detection rate (DR) 0.11290323
## 18 false positive rate (FPR) 0.09696970
## 19 fall-out (FO) 0.09696970
## 20 false alarm (rate) (FAR) 0.09696970
## 21 probability of false alarm (PFA) 0.09696970
## 22 false negative rate (FNR) 0.66265060
## 23 miss rate (MR) 0.66265060
## 24 false discovery rate (FDR) 0.36363636
## 25 false omission rate (FOR) 0.26960784
## 26 prevalence (PREV) 0.33467742
## 27 (positive) pre-test probability (PREP) 0.33467742
## 28 (positive) pre-test odds (PREO) 0.50303030
## 29 detection prevalence (DPREV) 0.17741935
## 30 negative pre-test probability (NPREP) 0.66532258
## 31 negative pre-test odds (NPREO) 1.98795181
## 32 no information rate (NIR) 0.66532258
## 33 weighted accuracy (WACC) 0.62018985
## 34 balanced accuracy (BACC) 0.62018985
## 35 (bookmaker) informedness (INF) 0.24037970
## 36 Youden's J statistic (YJS) 0.24037970
## 37 deltap' (DPp) 0.24037970
## 38 positive likelihood ratio (PLR) 3.47891566
## 39 negative likelihood ratio (NLR) 0.73380771
## 40 weighted likelihood ratio (WLR) 2.10636169
## 41 balanced likelihood ratio (BLR) 2.10636169
## 42 diagnostic odds ratio (DOR) 4.74090909
## 43 positive predictive value (PPV) 0.63636364
## 44 precision (PREC) 0.63636364
## 45 (positive) post-test probability (POSTP) 0.63636364
## 46 (positive) post-test odds (POSTO) 1.75000000
## 47 Bayes factor G1 (BFG1) 3.47891566
## 48 negative predictive value (NPV) 0.73039216
## 49 negative post-test probability (NPOSTP) 0.73039216
## 50 negative post-test odds (NPOSTO) 2.70909091
## 51 Bayes factor G0 (BFG0) 1.36275482
## 52 markedness (MARK) 0.36675579
## 53 deltap (DP) 0.36675579
## 54 weighted predictive value (WPV) 0.68337790
## 55 balanced predictive value (BPV) 0.68337790
## 56 F1 score (F1S) 0.44094488
## 57 Dice similarity coefficient (DSC) 0.44094488
## 58 F beta score (FBS) 0.44094488
## 59 Jaccard similarity coefficient (JSC) 0.28282828
## 60 threat score (TS) 0.28282828
## 61 critical success index (CSI) 0.28282828
## 62 Matthews' correlation coefficient (MCC) 0.29691859
## 63 Pearson's correlation (r phi) (RPHI) 0.29691859
## 64 Phi coefficient (PHIC) 0.29691859
## 65 Cramer's V (CRV) 0.29691859
## 66 proportion of positive predictions (PPP) 0.17741935
## 67 expected accuracy (EACC) 0.60665973
## 68 Cohen's kappa coefficient (CKC) 0.27215608
## 69 mutual information in bits (MI2) 0.06014644
## 70 joint entropy in bits (JE2) 1.06320911
## 71 variation of information in bits (VI2) 1.47374006
## 72 Jaccard distance (JD) 0.96078821
## 73 information quality ratio (INFQR) 0.03921179
## 74 uncertainty coefficient (UC) 0.06540258
## 75 entropy coefficient (EC) 0.06540258
## 76 proficiency (metric) (PROF) 0.06540258
## 77 deficiency (metric) (DFM) 0.93459742
## 78 redundancy (RED) 0.03773224
## 79 symmetric uncertainty (SU) 0.07546449
## 80 normalized uncertainty (NU) 0.07637373
perfScores(pred, truth = infert$case, namePos = 1)
##
## Performance Score(s)
##
## Score Value
## 1 area under curve (AUC) 0.7285506
## 2 Gini index (GINI) 0.4571011
## 3 Brier score (BS) 0.1911729
## 4 positive Brier score (PBS) 0.3605069
## 5 negative Brier score (NBS) 0.1059928
## 6 weighted Brier score (WBS) 0.2332498
## 7 balanced Brier score (BBS) 0.2332498
## with group names
my.case <- factor(infert$case, labels = c("control", "case"))
perfMeasures(pred, truth = my.case, namePos = "case")
##
## Performance Measure(s)
##
## Measure Value
## 1 accuracy (ACC) 0.71370968
## 2 probability of correct classification (PCC) 0.71370968
## 3 fraction correct (FC) 0.71370968
## 4 simple matching coefficient (SMC) 0.71370968
## 5 Rand (similarity) index (RSI) 0.71370968
## 6 probability of misclassification (PMC) 0.28629032
## 7 error rate (ER) 0.28629032
## 8 fraction incorrect (FIC) 0.28629032
## 9 sensitivity (SENS) 0.33734940
## 10 recall (REC) 0.33734940
## 11 true positive rate (TPR) 0.33734940
## 12 probability of detection (PD) 0.33734940
## 13 hit rate (HR) 0.33734940
## 14 specificity (SPEC) 0.90303030
## 15 true negative rate (TNR) 0.90303030
## 16 selectivity (SEL) 0.90303030
## 17 detection rate (DR) 0.11290323
## 18 false positive rate (FPR) 0.09696970
## 19 fall-out (FO) 0.09696970
## 20 false alarm (rate) (FAR) 0.09696970
## 21 probability of false alarm (PFA) 0.09696970
## 22 false negative rate (FNR) 0.66265060
## 23 miss rate (MR) 0.66265060
## 24 false discovery rate (FDR) 0.36363636
## 25 false omission rate (FOR) 0.26960784
## 26 prevalence (PREV) 0.33467742
## 27 (positive) pre-test probability (PREP) 0.33467742
## 28 (positive) pre-test odds (PREO) 0.50303030
## 29 detection prevalence (DPREV) 0.17741935
## 30 negative pre-test probability (NPREP) 0.66532258
## 31 negative pre-test odds (NPREO) 1.98795181
## 32 no information rate (NIR) 0.66532258
## 33 weighted accuracy (WACC) 0.62018985
## 34 balanced accuracy (BACC) 0.62018985
## 35 (bookmaker) informedness (INF) 0.24037970
## 36 Youden's J statistic (YJS) 0.24037970
## 37 deltap' (DPp) 0.24037970
## 38 positive likelihood ratio (PLR) 3.47891566
## 39 negative likelihood ratio (NLR) 0.73380771
## 40 weighted likelihood ratio (WLR) 2.10636169
## 41 balanced likelihood ratio (BLR) 2.10636169
## 42 diagnostic odds ratio (DOR) 4.74090909
## 43 positive predictive value (PPV) 0.63636364
## 44 precision (PREC) 0.63636364
## 45 (positive) post-test probability (POSTP) 0.63636364
## 46 (positive) post-test odds (POSTO) 1.75000000
## 47 Bayes factor G1 (BFG1) 3.47891566
## 48 negative predictive value (NPV) 0.73039216
## 49 negative post-test probability (NPOSTP) 0.73039216
## 50 negative post-test odds (NPOSTO) 2.70909091
## 51 Bayes factor G0 (BFG0) 1.36275482
## 52 markedness (MARK) 0.36675579
## 53 deltap (DP) 0.36675579
## 54 weighted predictive value (WPV) 0.68337790
## 55 balanced predictive value (BPV) 0.68337790
## 56 F1 score (F1S) 0.44094488
## 57 Dice similarity coefficient (DSC) 0.44094488
## 58 F beta score (FBS) 0.44094488
## 59 Jaccard similarity coefficient (JSC) 0.28282828
## 60 threat score (TS) 0.28282828
## 61 critical success index (CSI) 0.28282828
## 62 Matthews' correlation coefficient (MCC) 0.29691859
## 63 Pearson's correlation (r phi) (RPHI) 0.29691859
## 64 Phi coefficient (PHIC) 0.29691859
## 65 Cramer's V (CRV) 0.29691859
## 66 proportion of positive predictions (PPP) 0.17741935
## 67 expected accuracy (EACC) 0.60665973
## 68 Cohen's kappa coefficient (CKC) 0.27215608
## 69 mutual information in bits (MI2) 0.06014644
## 70 joint entropy in bits (JE2) 1.06320911
## 71 variation of information in bits (VI2) 1.47374006
## 72 Jaccard distance (JD) 0.96078821
## 73 information quality ratio (INFQR) 0.03921179
## 74 uncertainty coefficient (UC) 0.06540258
## 75 entropy coefficient (EC) 0.06540258
## 76 proficiency (metric) (PROF) 0.06540258
## 77 deficiency (metric) (DFM) 0.93459742
## 78 redundancy (RED) 0.03773224
## 79 symmetric uncertainty (SU) 0.07546449
## 80 normalized uncertainty (NU) 0.07637373
perfScores(pred, truth = my.case, namePos = "case")
##
## Performance Score(s)
##
## Score Value
## 1 area under curve (AUC) 0.7285506
## 2 Gini index (GINI) 0.4571011
## 3 Brier score (BS) 0.1911729
## 4 positive Brier score (PBS) 0.3605069
## 5 negative Brier score (NBS) 0.1059928
## 6 weighted Brier score (WBS) 0.2332498
## 7 balanced Brier score (BBS) 0.2332498
## using weights
perfMeasures(pred, truth = infert$case, namePos = 1, weight = 0.3)
##
## Performance Measure(s)
##
## Measure Value
## 1 accuracy (ACC) 0.71370968
## 2 probability of correct classification (PCC) 0.71370968
## 3 fraction correct (FC) 0.71370968
## 4 simple matching coefficient (SMC) 0.71370968
## 5 Rand (similarity) index (RSI) 0.71370968
## 6 probability of misclassification (PMC) 0.28629032
## 7 error rate (ER) 0.28629032
## 8 fraction incorrect (FIC) 0.28629032
## 9 sensitivity (SENS) 0.33734940
## 10 recall (REC) 0.33734940
## 11 true positive rate (TPR) 0.33734940
## 12 probability of detection (PD) 0.33734940
## 13 hit rate (HR) 0.33734940
## 14 specificity (SPEC) 0.90303030
## 15 true negative rate (TNR) 0.90303030
## 16 selectivity (SEL) 0.90303030
## 17 detection rate (DR) 0.11290323
## 18 false positive rate (FPR) 0.09696970
## 19 fall-out (FO) 0.09696970
## 20 false alarm (rate) (FAR) 0.09696970
## 21 probability of false alarm (PFA) 0.09696970
## 22 false negative rate (FNR) 0.66265060
## 23 miss rate (MR) 0.66265060
## 24 false discovery rate (FDR) 0.36363636
## 25 false omission rate (FOR) 0.26960784
## 26 prevalence (PREV) 0.33467742
## 27 (positive) pre-test probability (PREP) 0.33467742
## 28 (positive) pre-test odds (PREO) 0.50303030
## 29 detection prevalence (DPREV) 0.17741935
## 30 negative pre-test probability (NPREP) 0.66532258
## 31 negative pre-test odds (NPREO) 1.98795181
## 32 no information rate (NIR) 0.66532258
## 33 weighted accuracy (WACC) 0.73332603
## 34 balanced accuracy (BACC) 0.62018985
## 35 (bookmaker) informedness (INF) 0.24037970
## 36 Youden's J statistic (YJS) 0.24037970
## 37 deltap' (DPp) 0.24037970
## 38 positive likelihood ratio (PLR) 3.47891566
## 39 negative likelihood ratio (NLR) 0.73380771
## 40 weighted likelihood ratio (WLR) 1.55734010
## 41 balanced likelihood ratio (BLR) 2.10636169
## 42 diagnostic odds ratio (DOR) 4.74090909
## 43 positive predictive value (PPV) 0.63636364
## 44 precision (PREC) 0.63636364
## 45 (positive) post-test probability (POSTP) 0.63636364
## 46 (positive) post-test odds (POSTO) 1.75000000
## 47 Bayes factor G1 (BFG1) 3.47891566
## 48 negative predictive value (NPV) 0.73039216
## 49 negative post-test probability (NPOSTP) 0.73039216
## 50 negative post-test odds (NPOSTO) 2.70909091
## 51 Bayes factor G0 (BFG0) 1.36275482
## 52 markedness (MARK) 0.36675579
## 53 deltap (DP) 0.36675579
## 54 weighted predictive value (WPV) 0.70218360
## 55 balanced predictive value (BPV) 0.68337790
## 56 F1 score (F1S) 0.44094488
## 57 Dice similarity coefficient (DSC) 0.44094488
## 58 F beta score (FBS) 0.44094488
## 59 Jaccard similarity coefficient (JSC) 0.28282828
## 60 threat score (TS) 0.28282828
## 61 critical success index (CSI) 0.28282828
## 62 Matthews' correlation coefficient (MCC) 0.29691859
## 63 Pearson's correlation (r phi) (RPHI) 0.29691859
## 64 Phi coefficient (PHIC) 0.29691859
## 65 Cramer's V (CRV) 0.29691859
## 66 proportion of positive predictions (PPP) 0.17741935
## 67 expected accuracy (EACC) 0.60665973
## 68 Cohen's kappa coefficient (CKC) 0.27215608
## 69 mutual information in bits (MI2) 0.06014644
## 70 joint entropy in bits (JE2) 1.06320911
## 71 variation of information in bits (VI2) 1.47374006
## 72 Jaccard distance (JD) 0.96078821
## 73 information quality ratio (INFQR) 0.03921179
## 74 uncertainty coefficient (UC) 0.06540258
## 75 entropy coefficient (EC) 0.06540258
## 76 proficiency (metric) (PROF) 0.06540258
## 77 deficiency (metric) (DFM) 0.93459742
## 78 redundancy (RED) 0.03773224
## 79 symmetric uncertainty (SU) 0.07546449
## 80 normalized uncertainty (NU) 0.07637373
perfScores(pred, truth = infert$case, namePos = 1, wBS = 0.3)
##
## Performance Score(s)
##
## Score Value
## 1 area under curve (AUC) 0.7285506
## 2 Gini index (GINI) 0.4571011
## 3 Brier score (BS) 0.1911729
## 4 positive Brier score (PBS) 0.3605069
## 5 negative Brier score (NBS) 0.1059928
## 6 weighted Brier score (WBS) 0.1823470
## 7 balanced Brier score (BBS) 0.2332498
The function optCutoff computes the optimal cutoff for various performance measures for binary classification. More precisely, all performance measures that are implemented in function perfMeasures.
## example from dataset infert
fit <- glm(case ~ spontaneous+induced, data = infert, family = binomial())
pred <- predict(fit, type = "response")
optCutoff(pred, truth = infert$case, namePos = 1)
## Optimal Cut-off YJS
## 0.3750400 0.3474991
The computation of an optimal cut-off doesn't make any sense for continuous scoring rules as their computation does not involve any cut-off (discretization/dichotomization).
These tests are used to investigate the goodness of fit in logistic regression.
## Hosmer-Lemeshow goodness of fit tests for C and H statistic
HLgof.test(fit = pred, obs = infert$case)
## Warning in HLgof.test(fit = pred, obs = infert$case): Found only 6 different
## groups for Hosmer-Lemesho C statistic.
## $C
##
## Hosmer-Lemeshow C statistic
##
## data: pred and infert$case
## X-squared = 4.4272, df = 4, p-value = 0.3513
##
##
## $H
##
## Hosmer-Lemeshow H statistic
##
## data: pred and infert$case
## X-squared = 6.3168, df = 8, p-value = 0.6118
## e Cessie-van Houwelingen-Copas-Hosmer global goodness of fit test
HLgof.test(fit = pred, obs = infert$case,
X = model.matrix(case ~ spontaneous+induced, data = infert))
## Warning in HLgof.test(fit = pred, obs = infert$case, X = model.matrix(case ~ :
## Found only 6 different groups for Hosmer-Lemesho C statistic.
## $C
##
## Hosmer-Lemeshow C statistic
##
## data: pred and infert$case
## X-squared = 4.4272, df = 4, p-value = 0.3513
##
##
## $H
##
## Hosmer-Lemeshow H statistic
##
## data: pred and infert$case
## X-squared = 6.3168, df = 8, p-value = 0.6118
##
##
## $gof
##
## le Cessie-van Houwelingen-Copas-Hosmer global goodness of fit test
##
## data: pred and infert$case
## z = 1.7533, p-value = 0.07954
sessionInfo()
## R version 4.0.3 RC (2020-10-02 r79291)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Linux Mint 19.3
##
## Matrix products: default
## BLAS: /usr/lib/x86_64-linux-gnu/libf77blas.so.3.10.3
## LAPACK: /home/kohlm/RTOP/Rbranch/lib/libRlapack.so
##
## locale:
## [1] LC_CTYPE=de_DE.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=de_DE.UTF-8 LC_COLLATE=C
## [5] LC_MONETARY=de_DE.UTF-8 LC_MESSAGES=de_DE.UTF-8
## [7] LC_PAPER=de_DE.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=de_DE.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] MKclass_0.3
##
## loaded via a namespace (and not attached):
## [1] compiler_4.0.3 magrittr_1.5 htmltools_0.5.0 tools_4.0.3
## [5] yaml_2.2.1 stringi_1.4.6 rmarkdown_2.3 knitr_1.29
## [9] stringr_1.4.0 digest_0.6.25 xfun_0.16 rlang_0.4.7
## [13] evaluate_0.14