robin

ROBustness In Network

Valeria Policastro

2022-05-16

robin

In network analysis, many community detection algorithms have been developed. However,their applications leave unaddressed one important question: the statistical validation of the results.

robin (ROBustness in Network) has a double aim: tests the robustness of a community detection algorithm to detect if the community structure found is statistically significant and compares two detection algorithms to choose the one that better fits the network of interest.

Reference in Policastro V., Righelli D., Carissimo A., Cutillo L., De Feis I. (2021) https://journal.r-project.org/archive/2021/RJ-2021-040/index.html.

It provides:
1) a procedure to examine the robustness of a community detection algorithm against a random graph;
2) a procedure to choose among different community detection algorithms the one that better fits the network of interest;
3) two tests to determine the statistical difference between the curves;
4) a graphical interactive representation.

Installation

#install.packages("robin")

If there are problems with the installation try:

# if (!requireNamespace("BiocManager", quietly = TRUE))
#     install.packages("BiocManager")
# BiocManager::install("gprege")
# 
# install.packages("robin")

Loading package

library("robin")

Input

prepGraph function creates an igraph object from the input file. This step is necessary for robin execution

The unweighted graph can be read from different format: edgelist, pajek, graphml, gml, ncol, lgl, dimacs, graphdb and igraph graphs.

my_network <- system.file("example/football.gml", package="robin")
# downloaded from: http://www-personal.umich.edu/~mejn/netdata/
graph <- prepGraph(file=my_network, file.format="gml")
graph
## IGRAPH 3253399 U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from 3253399:
##  [1] 1--  2 1--  5 1-- 10 1-- 17 1-- 24 1-- 34 1-- 36 1-- 42 1-- 66 1-- 91
## [11] 1-- 94 1--105 2-- 26 2-- 28 2-- 34 2-- 38 2-- 46 2-- 58 2-- 90 2--102
## [21] 2--104 2--106 2--110 3--  4 3--  7 3-- 14 3-- 15 3-- 16 3-- 48 3-- 61
## [31] 3-- 65 3-- 73 3-- 75 3--101 3--107 4--  6 4-- 12 4-- 27 4-- 41 4-- 53
## [41] 4-- 59 4-- 73 4-- 75 4-- 82 4-- 85 4--103 5--  6 5-- 10 5-- 17 5-- 24
## [51] 5-- 29 5-- 42 5-- 70 5-- 94 5--105 5--109 6-- 11 6-- 12 6-- 53 6-- 75
## [61] 6-- 82 6-- 85 6-- 91 6-- 98 6-- 99 6--108 7--  8 7-- 33 7-- 40 7-- 48
## [71] 7-- 56 7-- 59 7-- 61 7-- 65 7-- 86 7--101 7--107 8--  9 8-- 22 8-- 23
## + ... omitted several edges

Network visualization

plotGraph function offers a graphical representation of the network with the aid of networkD3 package.

plotGraph(graph)

Community detection

methodCommunity function detects communities using all the algorithms implemented in igraph package: “walktrap”, “edgeBetweenness”, “fastGreedy”, “spinglass”, “leadingEigen”, “labelProp”, “infomap”, “optimal”, “other”.

methodCommunity(graph=graph, method="fastGreedy") 
## IGRAPH clustering fast greedy, groups: 6, mod: 0.55
## + groups:
##   $`1`
##    [1]   7  14  16  33  40  48  61  65 101 107
##   
##   $`2`
##    [1]   8   9  10  17  22  23  24  42  47  50  52  54  68  69  74  78  79  89
##   [19] 105 109 111 112 115
##   
##   $`3`
##    [1]   1   2  20  26  30  31  34  36  38  46  56  80  81  83  90  94  95 102
##   [19] 104 106 110
##   + ... omitted several groups/vertices

membershipCommunities function detects the community membership.

membershipCommunities(graph=graph, method="fastGreedy") 
##   [1] 3 3 5 5 5 5 1 2 2 2 5 5 4 1 4 1 2 6 4 3 6 2 2 2 5 3 4 6 5 3 3 4 1 3 4 3 6
##  [38] 3 4 1 5 2 6 4 6 3 2 1 6 2 5 2 5 2 4 3 6 6 6 6 1 4 6 6 1 6 6 2 2 5 6 4 5 2
##  [75] 5 6 6 2 2 3 3 5 3 5 5 4 6 6 2 3 5 6 6 3 3 6 6 6 5 4 1 3 5 3 2 3 1 5 2 3 2
## [112] 2 6 6 2

Community visualization

plotComm function produces an interactive 3D plot of the communites detected by the chosen algorithm.

members <- membershipCommunities(graph=graph, method="fastGreedy")
plotComm(graph=graph, members=members)

Robustness of a community detection algorithm

Null model

robin offers two choices for the null model:

  1. it can be generated by using the function random

  2. it can be built externally and passed directly to the argument graphRandom of the robinRobust function.

The function random creates a random graph with the same degree distribution of the original graph, but with completely random edges. The graph argument must be the same returned by prepGraph function.

graphRandom <- random(graph=graph)
graphRandom
## IGRAPH 32769a3 U--- 115 613 -- 
## + attr: id (v/n), label (v/c), value (v/n)
## + edges from 32769a3:
##  [1]   1-- 65   1-- 88  10--114   1-- 41   1--  6   1-- 34   1-- 36  42-- 91
##  [9]  52-- 66  52-- 88  34-- 94  54-- 60   2-- 26  75-- 92   2-- 34  39-- 42
## [17]   2-- 75   2-- 58   2-- 94   2--105  82--104  87--106 105--110  61-- 87
## [25]  58-- 84   3-- 63   3-- 15  30-- 53   3-- 51   3-- 61   3-- 42   3-- 73
## [33]   3-- 26  56-- 97   3-- 91   4-- 79  42--109   4--112   4-- 41   4-- 83
## [41]  55-- 59  85--110   3--  4   4--115  85--101   4--103   5--108   5--  8
## [49]   5--102   5-- 16  25-- 29   5-- 76   5-- 38  40-- 71   5-- 59   5-- 73
## [57]   6-- 11  12-- 32  70--112  10-- 49   6-- 82   6--107  91-- 99   6-- 98
## + ... omitted several edges

robinRobust

robinRobust function implements the validation of the community robustness. In this example we used “vi” distance as stability measure, “independent” type procedure and “louvain” as community detection algorithm.

Users can choose also different measures as: “nmi”,“split.join”, “adjusted.rand”.

The graph argument must be the one returned by prepGraph function. The graphRandom must be the one returned by random function or own random graph.

proc <- robinRobust(graph=graph, graphRandom=graphRandom, measure="vi", 
                  method="louvain", type="independent")
## Detected robin method  independent  type
## Perturbed  31  edges
## Perturbed  61  edges
## Perturbed  92  edges
## Perturbed  123  edges
## Perturbed  153  edges
## Perturbed  184  edges
## Perturbed  215  edges
## Perturbed  245  edges
## Perturbed  276  edges
## Perturbed  306  edges
## Perturbed  337  edges
## Perturbed  368  edges

As output robinRobust will give all the measures at different level of perturbation from 0% to 60% for the real and random graph.

plotRobin function plots the curves. The (x,y)-axes represents the percentage of perturbation and the average of the stability measure, respectively. The arguments of model1 and model2 must be the measures for the real graph and the random graph that are the outputs of the robinRobust function.

We will expect that with a robust algorithm the behavior of the two curves is different. We expect that the curve of the real graph vary less than the curve of the random graph, this visually means that the curve of the real graph is lower than the one of the random graph, so it is more stable than a random graph.

plotRobin(graph=graph, model1=proc$Mean, model2=proc$MeanRandom,
legend=c("real data", "null model"))


The procedure implemented depends on the network of interest. In this example, the Louvain algorithm fits good the network of interest,as the curve of the stability measure assumes lower values than the one obtained by the null model.

Statistical tests

The differences between the stability measure curves are tested using:

  1. Functional Data Analysis (FDA);

  2. Gaussian Process (GP).

Moreover to quantify the differences between the curves when they are very close the Area Under the Curves (AUC) are evaluated.

robinFDATest function implements a test giving a p-value for different intervals of the curves. It tests in which interval the two curves are different.

robinFDATest(graph=graph, model1=proc$Mean, model2=proc$MeanRandom, 
            legend=c("real graph", "random graph"))
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## $adj.pvalue
## [1] 0.2293 0.0080 0.0048 0.0022 0.0021 0.0026 0.0031 0.0040 0.0055
## 
## $pvalues
## [1] 0.2293 0.0020 0.0020 0.0020 0.0020 0.0020 0.0031 0.0040 0.0020

The first figure represents the stability measure plot using Louvain algorithm for detecting communities. The second one represents the corresponding p-values and adjusted p-values of the Interval Testing procedure. Horizontal red line corresponds to the critical value 0.05.

robinGPTest function implements the GP testing.

robinGPTest(model1=proc$Mean, model2=proc$MeanRandom)
##  Profile  1 
##  Profile  2
## [1] 139.2966

It tests the two curves globally. The null hypothesis claims that the two curves come from the same process, the alternative hypothesis that they come from two different processes. The output is the Bayes Factor. One of the most common interpretations is the one proposed by Harold Jeffereys (1961) and slightly modified by Lee and Wagenmakers in 2013:
IF B10 IS… THEN YOU HAVE…
° > 100 Extreme evidence for H1
° 30 – 100 Very strong evidence for H1
° 10 – 30 Strong evidence for H1
° 3 – 10 Moderate evidence for H1
° 1 – 3 Anecdotal evidence for H0
° 1 No evidence
° 1/3 – 1 Anecdotal evidence for H0
° 1/3 – 1/10 Moderate evidence for H0
° 1/10 – 1/30 Strong evidence for H0
° 1/30 – 1/100 Very strong evidence for H0
° < 1/100 Extreme evidence for H0

robinAUC function implements the AUC.

robinAUC(graph=graph, model1=proc$Mean, model2=proc$MeanRandom)
## $area1
## [1] 0.1210938
## 
## $area2
## [1] 0.2602397

The outputs are the area under the two curves.

Comparison of two community detection algorithms

In this example we want to compare the “Fast Greedy” and the “Louvain” algorithms to see which is the best algorithm.

We firstly plot the communities detected by both algorithms.

membersFast <- membershipCommunities(graph=graph, method="fastGreedy")
membersLouv <- membershipCommunities(graph=graph, method="louvain")
plotComm(graph=graph, members=membersFast)
plotComm(graph=graph, members=membersLouv)

Secondly, we compare them with robinCompare function.

robinCompare function compares two detection algorithms on the same network to choose the one that better fits the network of interest.

comp <- robinCompare(graph=graph, method1="fastGreedy",
                method2="louvain", measure="vi", type="independent")
## Detected robin method  independent  type
## Perturbed  31  edges
## Perturbed  61  edges
## Perturbed  92  edges
## Perturbed  123  edges
## Perturbed  153  edges
## Perturbed  184  edges
## Perturbed  215  edges
## Perturbed  245  edges
## Perturbed  276  edges
## Perturbed  306  edges
## Perturbed  337  edges
## Perturbed  368  edges

A parallelized and faster version of this function is robinCompareFast function. (we recommend to use it mostly with big size networks)

Thirdly, we plot the curves of the compared methods.

plotRobin(graph=graph, model1=comp$Mean1, model2=comp$Mean2, 
legend=c("fastGreedy", "louvain"), title="FastGreedy vs Louvain")

In this example, the Louvain algorithm fits better the network of interest, as the curve of the stability measure assumes lower values than the one obtained by the Fast greedy method.

Fourthly we test the statistical differences between these two curves that now are created on two different community detection algorithm. The tests are already explained with more detail above.

robinFDATest(graph=graph, model1=comp$Mean1, model2=comp$Mean2)
## [1] "First step: basis expansion"
## Swapping 'y' and 'argvals', because 'y' is  simpler,
##   and 'argvals' should be;  now  dim(argvals) =  13 ;  dim(y) =  13 x 20 
## [1] "Second step: joint univariate tests"
## [1] "Third step: interval-wise combination and correction"
## [1] "creating the p-value matrix: end of row 2 out of 9"
## [1] "creating the p-value matrix: end of row 3 out of 9"
## [1] "creating the p-value matrix: end of row 4 out of 9"
## [1] "creating the p-value matrix: end of row 5 out of 9"
## [1] "creating the p-value matrix: end of row 6 out of 9"
## [1] "creating the p-value matrix: end of row 7 out of 9"
## [1] "creating the p-value matrix: end of row 8 out of 9"
## [1] "creating the p-value matrix: end of row 9 out of 9"
## [1] "Interval Testing Procedure completed"

## TableGrob (1 x 2) "arrange": 2 grobs
##   z     cells    name           grob
## 1 1 (1-1,1-1) arrange gtable[layout]
## 2 2 (1-1,2-2) arrange gtable[layout]
## $adj.pvalue
## [1] 0.4789 0.0545 0.0092 0.0043 0.0076 0.0550 0.5236 0.1572 0.2286
## 
## $pvalues
## [1] 0.4789 0.0216 0.0027 0.0015 0.0015 0.0081 0.5236 0.0595 0.1345
robinGPTest(model1=comp$Mean1, model2=comp$Mean2)
##  Profile  1 
##  Profile  2
## [1] 78.4532
robinAUC(graph=graph, model1=comp$Mean1, model2=comp$Mean2)
## $area1
## [1] 0.1682709
## 
## $area2
## [1] 0.1220478