When people evaluate the performance of a diagnostic test, it is important to control both True Positive Rate (TPR) and False Positive Rate (FPR). In the literature, most researchers propose the partial area under the ROC curve (pAUC) with restrictions on FPR to assess a binary classification system, which is named as FPR pAUC. It could be artificially designed to measure the area controlled by TPR, but is often misleading conceptually and practically. A new and intuitive method, named two-way pAUC, is provided in Yang et al. (2016), which focuses directly on the partial area under the ROC curve with both horizontal and vertical restrictions. This package solves two-way pAUC estimation based on a non-parametric method in Yang et al. (2016). Moreover, estimation and inference of FPR partial AUC and FNR parital ODC are included in this package, utilizing algorithms proposed in Yang et al. (2017) (see Methodology for details).
The ROC curve is a well-established graphical tool used to evaluate performance of a classifier in accurately discriminating between subjects from different populations (e.g., diseased and healthy individuals). Let \(F\) and \(G\) be distribution functions of random variables \(X\) and \(Y\) corresponding to independent populations. Let \(G^{-1}(t)=\inf \{y: G(y)\geq t\}\) be the quantile function of \(G\), \(0< t< 1\). Let \(S_F(t)\) and \(S_G(t)\) be the corresponding survival functions \(S_F(t)=1-F(t)\) and \(S_G(t)=1-G(t)\). For \(t\in (0,1)\), the ROC curve is defined as \(ROC(t) =1-F\{G^{-1}(1-t)\}\) or \(ROC(t)=S_{F}\{S_{G}^{-1}(t)\}\), where \(t\) is the value of FPR and \(S_G^{-1}(t) = G^{-1}(1-t)\). The ROC curve is not a convenient tool for comparisions, in particular when two ROC curves cross. A summary measure of an ROC curve can be found by integrating the ROC curve over the the range of FPR values to obtain the area under the ROC curve as \(AUC=\int^{1}_{0} ROC(t)\mathrm{d}t =\int^{-\infty}_{\infty}S_{F}(u) \mathrm{d}S_{G}(u)\). For economical and practical purposes, it is common to hold the FPR to a low level. When interest is restricted to a sub-region of the ROC space, the partial area under the ROC curve, \(pAUC(P_0)=\int^{P_0}_{0} ROC(p)\mathrm{d}p\) for the threshold value of FPR \(P_0 \in (0,1)\), can provide a useful summary measure.
Let \(\mathbf{X}=\{X_{i}, i= 1,...,m\}\) and \(\mathbf{Y}=\{Y_{i}, i= 1,...,n\}\) be random samples from the distribution functions \(F(x)\) and \(G(y)\), respectively. A Mann-Whitney nonparamteric method for pAUC is (method='WM'
) \[
\widehat{pAUC}(P_0 ) = \frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}I(X_i\ge Y_j)I\{Y_j\ge S^{-1}_{G,n}(P_0)\},
\] where \(S_{G,n}^{-1}(t)=\inf \left\{x\in R; t\geq S_{G,n}(x)\right\}\) and \(S_{F,m}(\cdot)\) and \(S_{G,n}(\cdot)\) are estimators of \(S_F\) and \(S_G\) based on empirical distributions.
Wang and Chang (2011) propose the following method (method = 'expect'
), \[\widetilde{pAUC}(P_0) = P_{0} - \frac{1}{m} \overset{m}{\underset{i=1}{\sum}} \min \{ S_{G, n}(X_{i}), P_{0}\}.\]
Yang et al. (2017) propose a jackknife method (method = 'jackknife'
) based on \(\widetilde{pAUC}(P_0)\), in particular, \[
\widetilde{pAUC}_{jack}(P_0)=\frac{1}{n+m}\sum^{n+m}_{h=1}{V}_h(P_0),
\] where \[
{V}_h(P_0)=(n+m)\widetilde{pAUC}(P_0 )-(n+m-1)\widetilde{pAUC}_h(P_0),
\] and \[
\widetilde{pAUC}_h(P_0)=\left\{\begin{array}{ll}
P_{0} - \frac{1}{m-1} \sum \limits_{ i \ne h }^m \min \{ S_{G, n}(X_{i}), P_{0}\} & ~\text{ $1 \leq h \leq m$}\\
P_{0} - \frac{1}{m} \sum \limits_{ i=1 }^m \min \{ S_{G,n-1, h-m}(X_{i}), P_{0}\} & ~ \text{$m+1 \leq h \leq m+n,$}
\end{array} \right.
\] where \[S_{G,n-1, h-m}(X_{i})=\frac{1}{n-1} \overset{n}{\underset{j=1, j \neq h-m}{\sum}}I(Y_j>X_{i}). \]
The ordinal dominance curve (ODC) introduced by Bamber (1973), describes the association between true negative rate (TNR) and false negative rate (FNR), \(ODC(t)= G\{F^{-1}(t)\}\) where \(t \in (0,1)\). The area under the ODC, \(\int^{1}_{0} ODC(t)\mathrm{d}t = \int^{\infty}_{-\infty}G(u)\mathrm{d}F(u)\), is a commonly used summary measure. A partial area under the ODC (pODC) from \(0\) to \(P_0\) is taken as \(pODC(P_0) = \int^{P_0}_{0} ODC(t)\mathrm{d}t\).
A Mann-Whitney nonparamteric method for pAUC is (method='WM'
) \[\widehat{pODC}(P_0)=\frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}I(Y_j\le X_i
)I\{X_i\le F^{-1}_{m}(P_0)\},\] where \(F_{m}^{-1}(P_0)\) is an empirical quantile estimate at \(P_0\) and \(F_m(\cdot)\) and \(G_{n}(\cdot)\) are the empirical distributions of \(F(\cdot)\) and \(G(\cdot)\).
Yang et al. (2017) propose the following method (method = 'expect'
), \[\widetilde{pODC}(P_0) = P_{0} - \frac{1}{n} \overset{n}{\underset{j=1}{\sum}} \min \{ F_m(Y_{j}), P_{0}\}.\] Yang et al. (2017) propose a jackknife method (method = 'jackknife'
) based on \(\widetilde{pODC}(P_0)\), in particular, \[
\widetilde{pODC}_{jack}(P_0)=\frac{1}{n+m}\sum^{n+m}_{h=1}\check{U}_h(P_0),
\] where \[
\check{U}_h(P_0)=(n+m)\widetilde{pODC}(P_0)-(n+m-1)\widetilde{pODC}_h(P_0)
\] and \[
\widetilde{pODC}_h(P_0)=\left\{\begin{array}{ll}
P_{0} - \frac{1}{n-1} \sum \limits_{j \ne h}^n \min \{ F_{m}(Y_{j}), P_{0}\} & ~\text{ $1 \leq h \leq n$}\\
P_{0} - \frac{1}{n} \sum \limits_{j = 1 }^n \min \{F_{m-1, h-n}(Y_{j}), P_{0}\} & ~ \text{$n+1 \leq h \leq m+n,$}
\end{array} \right.
\] where \[F_{m-1, h-n}(Y_{j})=\frac{1}{m-1} \overset{m}{\underset{i=1, i \neq h-n}{\sum}}I(X_{i} \leq Y_j). \]
The definition and estimation of two-way pAUC are proposed intuitively. Given bounds \(p_0\) and \(q_0\), two-way pAUC is formulated as \[ U(p_0,q_0) =\int^{p_0}_{S_G\{S^{-1}_F(q_0)\}}S_F\{S^{-1}_{G}(u)\}du-[p_0-S_G\{S^{-1}_F(q_0)\}]q_0 . \] Alternatively, from a probability perspective, \({U}(p_0,q_0)\) can be transformed as: \[ P\{ \mathbf{Y} < \mathbf{X}, \mathbf{X}\le S_F^{-1}(q_0), \mathbf{Y}\ge S_G^{-1}(p_0)\}. \] A trimmed Mann-Whitney U-statistics estimator directly following the above expression is \[ \frac{1}{mn}\sum^{m}_{i=1}\sum^{n}_{j=1}V_{i,j} (p_0 , q_0), \] where \(V_{i,j} (p_0 , q_0) = I \{ Y_j\le X_i , X_i\le S^{-1}_{F,m}(q_0), Y_j\ge S^{-1}_{G,n}(p_0) \}\).
Yang et al. (2017) prove that, under certain conditions, \[
\sqrt{m+n}\{\widehat{pAUC}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to \infty, \nonumber
\] where \(\frac{m}{m+n}\to \lambda\), \[
\sigma^2_1(P_0)=\int^{S_G^{-1}(P_0)}_{+\infty}\{P_0-S_G(t)\}^2dS_F(t)-\left\{\int^{S_G^{-1}(P_0)}_{+\infty} S_F(t)dS_G(t)\right\}^2,
\] and \[
\sigma^2_2(P_0)=\int^{S_G^{-1}(P_0)}_{+\infty}[S_F(t)-S_F\{S_G^{-1}(P_0)\}]^2dS_G(t)-\left(\int^{S_G^{-1}(P_0)}_{+\infty}[S_F(t)-S_F\{S_G^{-1}(P_0)\}]dS_G(t)\right)^2.
\] Moveover, under same conditions, \[\sqrt{m+n}\{\widetilde{pAUC}_{}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to \infty.\] and \[
\sqrt{m+n}\{\widetilde{pAUC}_{jack}(P_0)-pAUC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}\right\}, m,n\to \infty.
\]
Consider the jackknife variance estimator \[S^2_{\widetilde{pAUC}}={(m+n)}^{-1}\sum^{m+n}_{h=1}\{{V}_h(P_0)-\widetilde{pAUC}_{jack}(P_0)\}^2.\] Yang et al. (2017) prove that \[
S^2_{\widetilde{pAUC}}(P_0)=\frac{\sigma^{2}_{1}(P_0)}{\lambda}+\frac{\sigma^{2}_{2}(P_0)}{1-\lambda}+o_p(1).
\] Therefore, \[
\frac{\sqrt{m+n}\{\widetilde{pAUC}_{jack}(P_0)-pAUC(P_0)\}}{\sqrt{S^2_{\widetilde{pAUC}}(P_0)}}\stackrel{d}{\to}N(0,1).
\]
In ODC cases, we have \[ \sqrt{m+n}\{\widehat{pODC}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left(0,\frac{\sigma^{2}_{3}}{1-\lambda}+\frac{\sigma^{2}_{4}}{\lambda}\right), m,n\to \infty, \] where \[ \sigma^2_3=\int^{F^{-1}(P_0)}_{-\infty}\{P_0-F(t)\}^2dG(t)-\left\{\int^{F^{-1}(P_0)}_{-\infty} G(t)dF(t)\right\}^2, \] and \[ \sigma^2_4=\int^{F^{-1}(P_0)}_{-\infty}[G(t)-G\{F^{-1}(P_0)\}]^2dF(t)-\left(\int^{F^{-1}(P_0)}_{-\infty}[G(t)-G\{F^{-1}(P_0)\}]dF(t)\right)^2. \] Similarly, \[ \sqrt{m+n}\{\widetilde{pODC}_{}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^{2}_{3}(P_0)}{1- \lambda}+\frac{\sigma^{2}_{4}(P_0)}{\lambda}\right\}, \] and \[\sqrt{m+n}\{\widetilde{pODC}_{jack}(P_0)-pODC(P_0)\}\stackrel{d}{\to}N\left(0,\frac{\sigma^{2}_{3}}{1- \lambda}+\frac{\sigma^{2}_{4}}{\lambda}\right).\] Together with \[ \begin{align*} S^2_{\widetilde{pODC}}& ={(m+n)}^{-1}\sum^{m+n}_{h=1}\{\check{U}_h(P_0)-\widetilde{pODC}_{jack}(P_0)\}^2\\ &=\frac{\sigma^{2}_{3}(P_0)}{1-\lambda}+\frac{\sigma^{2}_{4}(P_0)}{\lambda}+o_p(1). \end{align*} \] and \[ \frac{\sqrt{m+n}\{\widetilde{pODC}_{jack}(P_0)-pODC(P_0)\}}{\sqrt{S^2_{\widetilde{pODC}}(P_0)}}\stackrel{d}{\to}N(0,1). \]
From Yang et al. (2016), we have, under certain conditions, \[ \sqrt{m+n}\{\hat{U}(p_0,q_0)-U(p_0,q_0)\}\stackrel{d}{\to}N\left\{0,\frac{\sigma^2_5}{\lambda}+\frac{\sigma^2_6}{1-\lambda}\right\}, \quad \text{as } \;\; m,n\to \infty, \] where \[ \begin{align} \sigma^2_5= &F\{G^{-1}(1-p_0)\}[G\{F^{-1}(1-q_0)\}-(1-p_0)]^2+\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}[G\{F^{-1}(1-q_0)\}-G(t)]^2dF(t)\nonumber\\ &-\left\{\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}F(t)dG(t)\right\}^2\nonumber, \end{align} \] and \[ \begin{align} \sigma^2_6=&[1-q_0-F\{G^{-1}(1-p_0)\}]^2(1-p_0)\nonumber+ \int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}\{1-q_0-F(t)\}^2dG(t) \\ \nonumber & -\left\{\int^{F^{-1}(1-q_0)}_{G^{-1}(1-p_0)}G(t)dF(t)\right\}^2\nonumber. \end{align} \]
This packages contains following functions:
tproc.est
proc
'WM'
, 'expect'
and 'jackknife'
.
proc.est
'WM'
, 'expect'
and 'jackknife'
.
proc.ci
'WM'
, 'expect'
and 'jackknife'
.
podc
'WM'
, 'expect'
and 'jackknife'
.
podc.est
'WM'
, 'expect'
and 'jackknife'
.
podc.ci
'WM'
, 'expect'
and 'jackknife'
.
The purpose of this section is to show users the basic usage of this package. We will briefly go through main functions, see what they can do and have a look at outputs. An detailed example of complete procedures of estimation and inference will be presented to give users a general sense of the pakcage.
First, we load tpAUC
package:
library(tpAUC)
Then, we estimate two-way partial AUC with date from package pROC
.
library('pROC')
data(aSAH)
tproc.est(aSAH$outcome, aSAH$s100b, threshold=c(0.8,0.2) )
## Warning in tproc.est(aSAH$outcome, aSAH$s100b, threshold = c(0.8, 0.2)):
## response levels are not 0/1, the first level is defaultly regarded as
## negative.
## [1] 0.4186992
#estimate two-way partial AUC
tproc.est
returns an estimate of two-way partial AUC.
Then, we turn to FPR partial AUC.
proc.est(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8 )
## Warning in proc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold
## = 0.8): response levels are not 0/1, the first level is defaultly regarded
## as negative.
## [1] 0.548103
# use method 'expect' to estimate partial AUC
proc.ci(aSAH$outcome,aSAH$s100b, cp=0.95 ,threshold=0.8,method='expect')
## Warning in proc.ci(aSAH$outcome, aSAH$s100b, cp = 0.95, threshold = 0.8, :
## response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in proc.est(response, predictor, threshold = threshold, method
## = "expect", : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## 2.5 % 97.5 %
## [1,] 0.4528877 0.6433182
# use method 'expect' to infer partial AUC
Alternatively, we can use proc
to do both estimation and inference simultaneously.
proc(aSAH$outcome,aSAH$s100b,threshold=0.8, method='expect',ci=TRUE, cp=0.95 )
## Warning in proc.est(response, predictor, threshold = threshold, method
## = method, : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## Warning in proc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in proc.est(response, predictor, threshold = threshold, method
## = "expect", : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## $pauc
## [1] 0.548103
##
## $ci
## 2.5 % 97.5 %
## [1,] 0.4528877 0.6433182
# set ci=TRUE to get confidence interval
Similar procedures on FNR partial ODC are as follows.
podc.est(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8)
## Warning in podc.est(aSAH$outcome, aSAH$s100b, method = "expect", threshold
## = 0.8): response levels are not 0/1, the first level is defaultly regarded
## as negative.
## [1] 0.5195122
# estimate FNR partial ODC with method 'expect'
podc.ci(aSAH$outcome, aSAH$s100b, method='expect',threshold=0.8, cp=0.97)
## Warning in podc.ci(aSAH$outcome, aSAH$s100b, method = "expect", threshold
## = 0.8, : response levels are not 0/1, the first level is defaultly regarded
## as negative.
## Warning in podc.est(response, predictor, threshold = threshold, method
## = "expect", : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## 1.5 % 98.5 %
## [1,] 0.403401 0.6356234
# infer FNR partial ODC with method 'expect'
podc
aggregates the functions of podc.est
and podc.ci
.
podc(aSAH$outcome, aSAH$s100b,threshold=0.8, method='expect',ci=TRUE, cp=0.97)
## Warning in podc.est(response, predictor, threshold = threshold, method
## = method, : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## Warning in podc.ci(response, predictor, cp = cp, threshold = threshold, :
## response levels are not 0/1, the first level is defaultly regarded as
## negative.
## Warning in podc.est(response, predictor, threshold = threshold, method
## = "expect", : response levels are not 0/1, the first level is defaultly
## regarded as negative.
## $podc
## [1] 0.5195122
##
## $ci
## 1.5 % 98.5 %
## [1,] 0.403401 0.6356234
# inference and estimation