Getting Started with netjack

Teague Henry

2019-07-07

This vignette provides an introduction to the netjack package and overviews common data input and analysis pipelines. For a tutorial about creating custom network functions and network statistics see the “Custom Functions in netjack” vignette.

Introduction

Samples of registered networks, or networks that consist of the same node set, are increasing common in a variety of scientific fields. The netjack package implements an framework to let researchers quickly manipulate and analyze large samples of registered networks, as well as develop custom functionality that builds on the existing netjack framework.

In this vignette, we go over the following procedures:

Basic Data Objects and Function Classes

netjack is built around a series of S4 classes that represent different levels of network manipulation, and functions that act on each level of network manipulation. This section describes these classes and functions at a summary level.

The most basic data object is the Net object. This represents a single network, along with node level variables, such as partition assignments.

A NetSample object represents a collection of Net objects, along with network level variables. For example, if each network in the sample represents a single individual’s functional brain network, a network level variable could be the diagnostic status of each individual.

The Net and NetSample objects are representations of raw data. To work with these data objects, the net_apply function can be used to apply a network manipulation function. This class of functions take a single network, and perform a series of manipulations on the network, returning the manipulated networks. As an example the node_jackknife() function applied to a Net object returns a set of Net objects corresponding to the original network with each node removed in turn.

To represent the output of a net_apply, the S4 classes NetSet and NetSampleSet are used. These classes represent both the original Net or NetSample, as well as the product of the net_apply.

One common procedure that network analysis uses is the calculation of various network statistics. In netjack network statistic functions can be used via the net_stat_apply function to quickly be calculated for both NetSet and NetSampleSet objects. The output of a net_stat_apply is a NetStatSet or NetSampleStatSet object.

netjack implements several statistical testing procedures that are described below. Additionally, to extract a data.frame of the calculated network statistics from a NetStatSet or NetSampleStatSet object, to_data_frame() can be used.

Data Input

To illustrate the various features of netjack, two simulated datasets are provided, GroupA and GroupB. Networks can be loaded into the netjack framework from adjacency matrices, either as single Net objects, or more commonly as one NetSample object.

library(netjack)
data("GroupA")

Subject1 <- as_Net(GroupA[[1]], "Subject1")
show(Subject1)
## Net 
##  Net Name:  Subject1 
##  Node Size:  20 
##  Node variables:  index

Node Variables can be assigned during construction as a named list:

Subject2 <- as_Net(GroupA[[2]], "Subject1", node.variables = list(community = c(rep(1,10), rep(2,10))))

show(Subject2)
## Net 
##  Net Name:  Subject1 
##  Node Size:  20 
##  Node variables:  index community

Typically, a researcher using netjack is analyzing a sample of registered networks rather than a single network. NetSample objects can be constructed in much the same way as a Net object can, using lists of adjacency matrices rather than a single matrix:

GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20) , node.variables = list(community = c(rep(1,10), rep(2,10))), sample.variables = list(group = rep(1, 20)))

show(GroupASamp)
## Net Sample 
##  Net Names:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 
##  Sample Variables:  group orig.net

Importantly, when a NetSample object is created, the list of node variables is applied to every network. This is appropriate in registered network applications, where for example, in neuroimaging networks, each node represents a specific brain region, and each node is the same for each subject.

Sample variables represent network level characteristics. For example, if each network represents a functional connectivity network from a neuroimaging study, a sample variable might be the diagnostic status of a particular individual.

Network Manipulation Functions

Once a sample of networks is represented as a NetSample object, a network manipulation function can be applied. As described previously, these functions change a network in some way. As an example, the node_jackknife function returns a set of networks, where each node has been removed in turn.

Network manipulation functions can be applied via net_apply to a Net object to produce a NetSet, or can be applied via net_apply to a NetSample object to produce a NetSampleSet

Sub1Jackknifed <- net_apply(network = Subject1, net.function = "node_jackknife")

show(Sub1Jackknifed)
## Net Set
## Applied Function: "node_jackknife"
## Function Arguments: 
## Original Network Name: Subject1
## Contains 20 manipulated networks.
GroupAJackknifed <- net_apply(network = GroupASamp, net.function = "node_jackknife")

show(GroupAJackknifed)
## Net Sample Set
## Applied Function: "node_jackknife"
## Function Arguments: 
## Contains 20 NetSets

Network manipulation functions that involve node level variables can be used by including them in the net.function.args argument within net_apply. For example, network_jackknife removes sub-networks on the basis of a node level grouping variable.

GroupANetJackknifed <- net_apply(GroupASamp, net.function = "network_jackknife", net.function.args = list(network.variable = "community"))

show(GroupANetJackknifed)
## Net Sample Set
## Applied Function: "network_jackknife"
## Function Arguments: 
##  network.variable = community
## Contains 20 NetSets

Once a network manipulation function has been applied, network statistics can be computed.

Network Statistic Functions

A network statistic is a single numerical summary of some aspect of a network’s structure or topology. netjack focuses on the analysis of networks at a network statistic level, and provides simple interfaces for calculating network statistics on collections of networks.

Similar in structure to the network manipulation functions, network statistic functions are applied via a net_stat_apply function, which can be used with either a NetSet object, or a NetSampleSet object. This produces a NetStatSet and a NetSampleStatSet respectively.

Sub1JackknifedGlobEff <- net_stat_apply(Sub1Jackknifed, net.stat.fun = global_efficiency)
## Registered S3 methods overwritten by 'ggplot2':
##   method         from 
##   [.quosures     rlang
##   c.quosures     rlang
##   print.quosures rlang
show(Sub1JackknifedGlobEff)
## Net Statistic Set 
##  Applied Function:  "node_jackknife" 
##  Function Arguments:   
##  Applied Statistic Function:  global_efficiency 
##  Statistic Function Arguments:   
##  Original Network Name:  Subject1
GroupAJackknifedGlobEff <- net_stat_apply(GroupAJackknifed, net.stat.fun = global_efficiency)

show(GroupAJackknifedGlobEff)
## Net Sample Statistic Set 
##  Applied Function:  "node_jackknife" 
##  Function Arguments:   
##  Applied Statistic Function:  global_efficiency 
##  Statistic Function Arguments:   
##  Original Network Names:  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Once a NetStatSet or NetSampleStatSet has been computed, the computed network statistics can be extracted into a data.frame by using the to_data_frame function. The data frame returned is in long format, with a row for each manipulated network.

Sub1Data = to_data_frame(Sub1JackknifedGlobEff)

names(Sub1Data)
## [1] "orig.net"  "orig.stat" "net.names" "nets.stat" "stat.name"
GroupAData = to_data_frame(GroupAJackknifedGlobEff)

head(GroupAData)
##   orig.net orig.stat net.names nets.stat         stat.name group
## 1        1 0.6447368         1 0.6403509 global_efficiency     1
## 2        1 0.6447368         2 0.6432749 global_efficiency     1
## 3        1 0.6447368         3 0.6461988 global_efficiency     1
## 4        1 0.6447368         4 0.6403509 global_efficiency     1
## 5        1 0.6447368         5 0.6491228 global_efficiency     1
## 6        1 0.6447368         6 0.6491228 global_efficiency     1

Difference, Group and Group Difference Testing.

netjack implements three statistical testing procedures in easy to use functions for both tabular and graphical output. The first test is the difference test which assess if any specific network manipulation causes a significant difference from the original network in a given network statistic. This test is implemented with the diff_test and graphically with the net_ggPlot function. Plotting uses the ggplot package, making the aesthetic presentation easily manipulated.

The example dataset GroupA has been generated so that the removal of node 10 will result in a significant difference in global efficiency from the original networks. Below are the full set of steps for this analysis:

GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20))
GroupAJackknifed = net_apply(GroupASamp, net.function = "node_jackknife")
GroupAJackknifedGlobEff = net_stat_apply(GroupAJackknifed, net.stat.fun =  "global_efficiency")

diff_test(GroupAJackknifedGlobEff)
##             net.names          diff            p   adjusted.p
## mean of x           1  0.0023245614 7.548641e-02 1.161329e-01
## mean of x1         10 -0.1257156781 1.144533e-12 2.289065e-11
## mean of x2         11  0.0032017544 1.426799e-03 9.285723e-03
## mean of x3         12  0.0029093567 2.114492e-03 9.285723e-03
## mean of x4         13  0.0004239766 7.472946e-01 7.866259e-01
## mean of x5         14  0.0017397661 1.386111e-01 1.848149e-01
## mean of x6         15  0.0018859649 3.286142e-02 6.572285e-02
## mean of x7         16  0.0020321637 9.205373e-02 1.315053e-01
## mean of x8         17  0.0011549708 3.240808e-01 3.600898e-01
## mean of x9         18  0.0032017544 5.382591e-03 1.794197e-02
## mean of x10        19  0.0024707602 4.495854e-02 8.174280e-02
## mean of x11         2  0.0013011696 1.567161e-01 1.958951e-01
## mean of x12        20  0.0001315789 8.786445e-01 8.786445e-01
## mean of x13         3  0.0037865497 3.843974e-04 3.843974e-03
## mean of x14         4  0.0024707602 1.850869e-02 4.627173e-02
## mean of x15         5  0.0023245614 1.758724e-02 4.627173e-02
## mean of x16         6  0.0033479532 2.321431e-03 9.285723e-03
## mean of x17         7  0.0023245614 2.621901e-02 5.826446e-02
## mean of x18         8  0.0018859649 6.155882e-02 1.025980e-01
## mean of x19         9  0.0011549708 2.974065e-01 3.498900e-01
diff_test_ggPlot(GroupAJackknifedGlobEff)

The second test implemented is the group test. This examines differences between to sample level groups (such as healthy controls and individuals with a disorder) in a network statistic, subject to a network manipulation.

In this example, GroupA has been simulated to have node 10 be important for global efficiency, while GroupB has node 15 as important for global efficiency. We combine these datasets into a single object, and perform the group testing now.

fullGroup = c(GroupA, GroupB)

fullSamp = as_NetSample(fullGroup,net.names = as.character(1:40), sample.variables = list(group = c(rep("GroupA", 20), rep("GroupB", 20))))

fullSampJackknifed = net_apply(fullSamp, net.function = "node_jackknife")
fullSampleJackknifedGlobEff = net_stat_apply(fullSampJackknifed, net.stat.fun = "global_efficiency")

group_test(fullSampleJackknifedGlobEff, grouping.variable = "group")
##    net.names            p         1         2   adjusted.p
## 1          1 3.479674e-01 0.6416667 0.6461988 3.479674e-01
## 2         10 5.457061e-15 0.5136264 0.6459064 5.457061e-15
## 3         11 6.055813e-01 0.6425439 0.6445906 6.055813e-01
## 4         12 6.505909e-01 0.6422515 0.6440058 6.505909e-01
## 5         13 1.402365e-01 0.6397661 0.6457602 1.402365e-01
## 6         14 1.435786e-01 0.6410819 0.6469298 1.435786e-01
## 7         15 3.240468e-16 0.6412281 0.5295224 3.240468e-16
## 8         16 1.961826e-01 0.6413743 0.6464912 1.961826e-01
## 9         17 9.077830e-02 0.6404971 0.6472222 9.077830e-02
## 10        18 7.671224e-01 0.6425439 0.6438596 7.671224e-01
## 11        19 2.873317e-01 0.6418129 0.6451754 2.873317e-01
## 12         2 2.970671e-01 0.6406433 0.6448830 2.970671e-01
## 13        20 1.230613e-01 0.6394737 0.6461988 1.230613e-01
## 14         3 8.593489e-01 0.6431287 0.6438596 8.593489e-01
## 15         4 2.796704e-01 0.6418129 0.6464912 2.796704e-01
## 16         5 3.257748e-01 0.6416667 0.6457602 3.257748e-01
## 17         6 7.125387e-01 0.6426901 0.6441520 7.125387e-01
## 18         7 2.771352e-01 0.6416667 0.6461988 2.771352e-01
## 19         8 1.800351e-01 0.6412281 0.6467836 1.800351e-01
## 20         9 3.712339e-01 0.6404971 0.6441520 3.712339e-01
group_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")

Finally, the group difference test assesses if the network manipulation has a differential impact on the network statistic between the groups. This test is implemented with group_diff_test and graphically with netGroupDiff_ggPlot.

group_diff_test(fullSampleJackknifedGlobEff, grouping.variable = "group")
##    net.names            p             1             2   adjusted.p
## 1          1 7.837022e-01  0.0023245614  0.0027777778 8.707803e-01
## 2         10 5.474698e-19 -0.1257156781  0.0024853801 5.474698e-18
## 3         11 2.053881e-01  0.0032017544  0.0011695906 4.211775e-01
## 4         12 3.727280e-02  0.0029093567  0.0005847953 1.723862e-01
## 5         13 2.105888e-01  0.0004239766  0.0023391813 4.211775e-01
## 6         14 2.348841e-01  0.0017397661  0.0035087719 4.270620e-01
## 7         15 4.898286e-23  0.0018859649 -0.1138986355 9.796572e-22
## 8         16 5.036256e-01  0.0020321637  0.0030701754 7.748087e-01
## 9         17 5.329649e-02  0.0011549708  0.0038011696 1.776550e-01
## 10        18 1.113582e-01  0.0032017544  0.0004385965 3.181664e-01
## 11        19 6.274374e-01  0.0024707602  0.0017543860 8.646142e-01
## 12         2 9.032667e-01  0.0013011696  0.0014619883 9.508071e-01
## 13        20 4.309656e-02  0.0001315789  0.0027777778 1.723862e-01
## 14         3 1.857876e-02  0.0037865497  0.0004385965 1.238584e-01
## 15         4 6.484607e-01  0.0024707602  0.0030701754 8.646142e-01
## 16         5 9.900847e-01  0.0023245614  0.0023391813 9.900847e-01
## 17         6 1.398629e-01  0.0033479532  0.0007309942 3.496572e-01
## 18         7 7.441306e-01  0.0023245614  0.0027777778 8.707803e-01
## 19         8 3.589458e-01  0.0018859649  0.0033625731 5.982429e-01
## 20         9 7.695662e-01  0.0011549708  0.0007309942 8.707803e-01
group_diff_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")

From this, we can see that when node 10 or node 15 are removed, this results in a significantly different change from the original global efficiency value between Group A and Group B.