stepdownfdp

This package provides a step-down procedure for controlling the False Discovery Proportion (FDP) in a competition-based setup (see Dong et al. (2022)). This includes target-decoy competition (TDC) in computational mass spectrometry and the knockoff construction in regression. FDP control (also referred to as FDX) is probabilistic in nature: given a prespecified FDP tolerance α ∈ (0,1) and a desired confidence level 1 − γ the procedure reports a list of discoveries for which the FDP is  ≤ α with probability  ≥ 1 − γ.

Installation

You can install the development version of stepdownfdp from GitHub with:

# install.packages("devtools")
devtools::install_github("uni-Arya/stepdownfdp")

Usage

With a single decoy

For use in a single-decoy experiment, the user must input a collection of target/decoy scores for each hypothesis, an FDP threshold α ∈ (0,1), and a confidence level γ ∈ (0,1).

First use mirandom() to convert target/decoy scores into winning scores and labels. The argument scores must be an m × 2 matrix, where m is the number of hypotheses. Make sure that the first column of scores corresponds to target scores.

library(stepdownfdp)
set.seed(123)
target_scores     <- rnorm(200, mean = 1.75)
decoy_scores      <- rnorm(200, mean = 0)
scores            <- cbind(target_scores, decoy_scores)
scores_and_labels <- mirandom(scores)
head(scores_and_labels)
#>          [,1] [,2]
#> [1,] 2.198810   -1
#> [2,] 1.519823    1
#> [3,] 3.308708    1
#> [4,] 1.820508    1
#> [5,] 1.879288    1
#> [6,] 3.465065    1

Pass the output of mirandom() into fdp_sd() to return the winning scores and indices of hypotheses that were rejected. The output of fdp_sd() is in decreasing order of winning scores.

results  <- fdp_sd(scores_and_labels, alpha = 0.1, conf = 0.1)
W_scores <- results$discoveries
indices  <- results$discoveries_ind
W_scores
#>  [1] 4.991040 3.937333 3.918956 3.878452 3.850109 3.800085 3.747213 3.659104
#>  [9] 3.593862 3.536913 3.465065 3.308708 3.282611 3.266471 3.194551 3.118602
#> [17] 3.110652 3.013185 3.003815 2.974082 2.957962 2.898808 2.881337 2.859920
#> [25] 2.846839 2.802711 2.775571 2.755739 2.743504 2.726973 2.672267 2.668997
#> [33] 2.645126 2.628133 2.587787 2.571581
indices
#>  [1] 164  97  44 174 149  70 196 139 125  16   6   3  98  56 131  54  95 182  30
#> [20]  11  45  90 136 187  87 161  76  73  91 159  69 110  33  34  27  35

With multiple decoys

For use in a multiple-decoy experiment, the user must also input parameters c ≤ λ of the form k/(d+1) where d is the number of decoy scores used for each hypothesis and 1 ≤ k ≤ d is an integer. As an example, if we compute 3 decoy scores for each hypothesis, we may take c and λ to be 1/4, 1/2, or 3/4, subject to c ≤ λ. The value of c determines the ranks in which the target score is considered “winning.” E.g., if c = 1/4, Hi is labelled as a target win whenever its corresponding target score is the highest ranked score among all targets/decoys for that hypothesis. Similarly, λ determines when the target score is “losing.”

The argument scores of mirandom() must now be an m × (d+1) matrix, where m is the number of hypotheses and d is the number of decoy scores for each. As in the single-decoy case, the first column must consist of target scores.

library(stepdownfdp)
set.seed(123)
target_scores     <- rnorm(200, mean = 1.75)
decoy_scores      <- matrix(rnorm(600, mean = 0), ncol = 3)
scores            <- cbind(target_scores, decoy_scores)
scores_and_labels <- mirandom(scores, c = 0.25, lambda = 0.75)
head(scores_and_labels)
#>          [,1] [,2]
#> [1,] 2.198810    0
#> [2,] 1.519823    1
#> [3,] 3.308708    1
#> [4,] 1.820508    1
#> [5,] 1.879288    1
#> [6,] 3.465065    1

Pass the output into fdp_sd(), making sure to specify the same c and λ values used in mirandom().

results  <- fdp_sd(scores_and_labels, alpha = 0.1, conf = 0.1, c = 0.25, lambda = 0.75)
W_scores <- results$discoveries
indices  <- results$discoveries_ind
W_scores
#>  [1] 4.991040 3.937333 3.918956 3.878452 3.850109 3.800085 3.747213 3.659104
#>  [9] 3.593862 3.536913 3.465065 3.308708 3.282611 3.266471 3.194551 3.118602
#> [17] 3.110652 3.013185 3.003815 2.974082 2.957962 2.898808 2.881337 2.859920
#> [25] 2.846839 2.802711 2.775571 2.755739 2.743504 2.726973 2.672267 2.668997
#> [33] 2.645126 2.628133 2.587787 2.571581 2.537739 2.529965 2.519042 2.504054
#> [41] 2.489948 2.451784 2.451356 2.438640 2.437917 2.394377 2.386570
indices
#>  [1] 164  97  44 174 149  70 196 139 125  16   6   3  98  56 131  54  95 182  30
#> [20]  11  45  90 136 187  87 161  76  73  91 159  69 110  33  34  27  35 151  49
#> [39] 152 189 138 141  19  36 148  84 167

Randomised versions

The user can also invoke the randomised version of both single-decoy and multiple-decoy procedures by passing procedure = "coinflip" into fdp_sd().

results  <- fdp_sd(scores_and_labels, alpha = 0.1, conf = 0.1, c = 0.25, lambda = 0.75, 
                   procedure = "coinflip")
W_scores <- results$discoveries
indices  <- results$discoveries_ind
W_scores
#>   [1] 4.9910399 3.9373330 3.9189560 3.8784519 3.8501089 3.8000847 3.7472134
#>   [8] 3.6591036 3.5938620 3.5369131 3.4650650 3.3087083 3.2826106 3.2664706
#>  [15] 3.1945509 3.1186023 3.1106524 3.0131852 3.0038149 2.9740818 2.9579620
#>  [22] 2.8988076 2.8813372 2.8599203 2.8468390 2.8027115 2.7755714 2.7557385
#>  [29] 2.7435039 2.7269734 2.6722675 2.6689966 2.6451257 2.6281335 2.5877870
#>  [36] 2.5715811 2.5377388 2.5299651 2.5190422 2.5040538 2.4899475 2.4517843
#>  [43] 2.4513559 2.4386403 2.4379168 2.3943765 2.3865697 2.3579643 2.3507088
#>  [50] 2.3346137 2.3129895 2.3039177 2.2983970 2.2694072 2.2668620 2.2478505
#>  [57] 2.2109162 2.2015041 2.1982098 2.1865235 2.1851815 2.1764642 2.1507715
#>  [64] 2.1352804 2.1296395 2.1189645 2.1098138 2.0822026 2.0817820 2.0604807
#>  [71] 2.0535286 2.0511534 2.0482276 2.0068837 2.0033185 1.9887317 1.9853866
#>  [78] 1.9659416 1.9644453 1.9313035 1.9033731 1.8792877 1.8738542 1.8676466
#>  [85] 1.8606827 1.8556762 1.8445835 1.8347373 1.8279608 1.8205084 1.8152930
#>  [92] 1.8030042 1.7912329 1.7877884 1.7557642 1.7214532 1.7159327 1.7071295
#>  [99] 1.7049723 1.6944380 1.6880883 1.6786919 1.6666309 1.6111086 1.5528241
#> [106] 1.5420827 1.5320251 1.5295134 1.5198225 1.5142996 1.5137204 1.5033081
#> [113] 1.4939078 1.4878025 1.4696047 1.4549285 1.4440373 1.4253141 1.4024574
#> [120] 1.4003496 1.3793400 1.3775612 1.3697735 1.3695290 1.3471152 1.3331424
#> [127] 1.3043380 1.2916347 1.2833446 1.2662194 1.2594426 1.2507080 1.2190935
#> [134] 1.1941589 1.1249607 1.1220939 1.1092940 1.0980501 1.0619914 1.0552930
#> [141] 1.0407992 1.0395934 1.0086639 0.9650955 0.8844871 0.8546366 0.7983814
#> [148] 0.7416234 0.7008230 0.5292823 0.4629695
indices
#>   [1] 164  97  44 174 149  70 196 139 125  16   6   3  98  56 131  54  95 182
#>  [19]  30  11  45  90 136 187  87 161  76  73  91 159  69 110  33  34  27  35
#>  [37] 151  49 152 189 138 141  19  36 148  84 167 112 197  58 157  37  92 115
#>  [55] 169  17   7 132  67 179  88  31  13  82  61 170  12 153  86 178  66 116
#>  [73] 166 102  51  93 127  60 191  79  28   5  59 121  14 117 193 188 128   4
#>  [91] 172  68 133 177  81  52 173  53 106 114  38 130  50  80 186  42  22  85
#> [109]   2  99 185 103 124 142 156  32  39 192 104 183  83 158 109  40  47 165
#> [127]  10 180  48 168 123 190 146  15  25  94 118 126  75  41  74 101 175 107
#> [145] 184 194 105 154 162  78 150