netjack
This vignette provides an introduction to the netjack
package and overviews common data input and analysis pipelines. For a tutorial about creating custom network functions and network statistics see the “Custom Functions in netjack
” vignette.
Samples of registered networks, or networks that consist of the same node set, are increasing common in a variety of scientific fields. The netjack
package implements an framework to let researchers quickly manipulate and analyze large samples of registered networks, as well as develop custom functionality that builds on the existing netjack
framework.
In this vignette, we go over the following procedures:
netjack
netjack
is built around a series of S4 classes that represent different levels of network manipulation, and functions that act on each level of network manipulation. This section describes these classes and functions at a summary level.
The most basic data object is the Net
object. This represents a single network, along with node level variables, such as partition assignments.
A NetSample
object represents a collection of Net
objects, along with network level variables. For example, if each network in the sample represents a single individual’s functional brain network, a network level variable could be the diagnostic status of each individual.
The Net
and NetSample
objects are representations of raw data. To work with these data objects, the net_apply
function can be used to apply a network manipulation function. This class of functions take a single network, and perform a series of manipulations on the network, returning the manipulated networks. As an example the node_jackknife()
function applied to a Net
object returns a set of Net
objects corresponding to the original network with each node removed in turn.
To represent the output of a net_apply
, the S4 classes NetSet
and NetSampleSet
are used. These classes represent both the original Net
or NetSample
, as well as the product of the net_apply
.
One common procedure that network analysis uses is the calculation of various network statistics. In netjack
network statistic functions can be used via the net_stat_apply
function to quickly be calculated for both NetSet
and NetSampleSet
objects. The output of a net_stat_apply
is a NetStatSet
or NetSampleStatSet
object.
netjack
implements several statistical testing procedures that are described below. Additionally, to extract a data.frame
of the calculated network statistics from a NetStatSet
or NetSampleStatSet
object, to_data_frame()
can be used.
To illustrate the various features of netjack
, two simulated datasets are provided, GroupA and GroupB. Networks can be loaded into the netjack
framework from adjacency matrices, either as single Net objects, or more commonly as one NetSample object.
library(netjack)
data("GroupA")
Subject1 <- as_Net(GroupA[[1]], "Subject1")
show(Subject1)
## Net
## Net Name: Subject1
## Node Size: 20
## Node variables: index
Node Variables can be assigned during construction as a named list:
Subject2 <- as_Net(GroupA[[2]], "Subject1", node.variables = list(community = c(rep(1,10), rep(2,10))))
show(Subject2)
## Net
## Net Name: Subject1
## Node Size: 20
## Node variables: index community
Typically, a researcher using netjack
is analyzing a sample of registered networks rather than a single network. NetSample
objects can be constructed in much the same way as a Net
object can, using lists of adjacency matrices rather than a single matrix:
GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20) , node.variables = list(community = c(rep(1,10), rep(2,10))), sample.variables = list(group = rep(1, 20)))
show(GroupASamp)
## Net Sample
## Net Names: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## Sample Variables: group orig.net
Importantly, when a NetSample object is created, the list of node variables is applied to every network. This is appropriate in registered network applications, where for example, in neuroimaging networks, each node represents a specific brain region, and each node is the same for each subject.
Sample variables represent network level characteristics. For example, if each network represents a functional connectivity network from a neuroimaging study, a sample variable might be the diagnostic status of a particular individual.
Once a sample of networks is represented as a NetSample
object, a network manipulation function can be applied. As described previously, these functions change a network in some way. As an example, the node_jackknife
function returns a set of networks, where each node has been removed in turn.
Network manipulation functions can be applied via net_apply
to a Net
object to produce a NetSet
, or can be applied via net_apply
to a NetSample
object to produce a NetSampleSet
Sub1Jackknifed <- net_apply(network = Subject1, net.function = "node_jackknife")
show(Sub1Jackknifed)
## Net Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Original Network Name: Subject1
## Contains 20 manipulated networks.
GroupAJackknifed <- net_apply(network = GroupASamp, net.function = "node_jackknife")
show(GroupAJackknifed)
## Net Sample Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Contains 20 NetSets
Network manipulation functions that involve node level variables can be used by including them in the net.function.args
argument within net_apply
. For example, network_jackknife
removes sub-networks on the basis of a node level grouping variable.
GroupANetJackknifed <- net_apply(GroupASamp, net.function = "network_jackknife", net.function.args = list(network.variable = "community"))
show(GroupANetJackknifed)
## Net Sample Set
## Applied Function: "network_jackknife"
## Function Arguments:
## network.variable = community
## Contains 20 NetSets
Once a network manipulation function has been applied, network statistics can be computed.
A network statistic is a single numerical summary of some aspect of a network’s structure or topology. netjack
focuses on the analysis of networks at a network statistic level, and provides simple interfaces for calculating network statistics on collections of networks.
Similar in structure to the network manipulation functions, network statistic functions are applied via a net_stat_apply
function, which can be used with either a NetSet
object, or a NetSampleSet
object. This produces a NetStatSet
and a NetSampleStatSet
respectively.
Sub1JackknifedGlobEff <- net_stat_apply(Sub1Jackknifed, net.stat.fun = global_efficiency)
## Registered S3 methods overwritten by 'ggplot2':
## method from
## [.quosures rlang
## c.quosures rlang
## print.quosures rlang
show(Sub1JackknifedGlobEff)
## Net Statistic Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Applied Statistic Function: global_efficiency
## Statistic Function Arguments:
## Original Network Name: Subject1
GroupAJackknifedGlobEff <- net_stat_apply(GroupAJackknifed, net.stat.fun = global_efficiency)
show(GroupAJackknifedGlobEff)
## Net Sample Statistic Set
## Applied Function: "node_jackknife"
## Function Arguments:
## Applied Statistic Function: global_efficiency
## Statistic Function Arguments:
## Original Network Names: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Once a NetStatSet
or NetSampleStatSet
has been computed, the computed network statistics can be extracted into a data.frame
by using the to_data_frame
function. The data frame returned is in long format, with a row for each manipulated network.
Sub1Data = to_data_frame(Sub1JackknifedGlobEff)
names(Sub1Data)
## [1] "orig.net" "orig.stat" "net.names" "nets.stat" "stat.name"
GroupAData = to_data_frame(GroupAJackknifedGlobEff)
head(GroupAData)
## orig.net orig.stat net.names nets.stat stat.name group
## 1 1 0.6447368 1 0.6403509 global_efficiency 1
## 2 1 0.6447368 2 0.6432749 global_efficiency 1
## 3 1 0.6447368 3 0.6461988 global_efficiency 1
## 4 1 0.6447368 4 0.6403509 global_efficiency 1
## 5 1 0.6447368 5 0.6491228 global_efficiency 1
## 6 1 0.6447368 6 0.6491228 global_efficiency 1
netjack
implements three statistical testing procedures in easy to use functions for both tabular and graphical output. The first test is the difference test which assess if any specific network manipulation causes a significant difference from the original network in a given network statistic. This test is implemented with the diff_test
and graphically with the net_ggPlot
function. Plotting uses the ggplot
package, making the aesthetic presentation easily manipulated.
The example dataset GroupA
has been generated so that the removal of node 10 will result in a significant difference in global efficiency from the original networks. Below are the full set of steps for this analysis:
GroupASamp = as_NetSample(GroupA, net.names = as.character(1:20))
GroupAJackknifed = net_apply(GroupASamp, net.function = "node_jackknife")
GroupAJackknifedGlobEff = net_stat_apply(GroupAJackknifed, net.stat.fun = "global_efficiency")
diff_test(GroupAJackknifedGlobEff)
## net.names diff p adjusted.p
## mean of x 1 0.0023245614 7.548641e-02 1.161329e-01
## mean of x1 10 -0.1257156781 1.144533e-12 2.289065e-11
## mean of x2 11 0.0032017544 1.426799e-03 9.285723e-03
## mean of x3 12 0.0029093567 2.114492e-03 9.285723e-03
## mean of x4 13 0.0004239766 7.472946e-01 7.866259e-01
## mean of x5 14 0.0017397661 1.386111e-01 1.848149e-01
## mean of x6 15 0.0018859649 3.286142e-02 6.572285e-02
## mean of x7 16 0.0020321637 9.205373e-02 1.315053e-01
## mean of x8 17 0.0011549708 3.240808e-01 3.600898e-01
## mean of x9 18 0.0032017544 5.382591e-03 1.794197e-02
## mean of x10 19 0.0024707602 4.495854e-02 8.174280e-02
## mean of x11 2 0.0013011696 1.567161e-01 1.958951e-01
## mean of x12 20 0.0001315789 8.786445e-01 8.786445e-01
## mean of x13 3 0.0037865497 3.843974e-04 3.843974e-03
## mean of x14 4 0.0024707602 1.850869e-02 4.627173e-02
## mean of x15 5 0.0023245614 1.758724e-02 4.627173e-02
## mean of x16 6 0.0033479532 2.321431e-03 9.285723e-03
## mean of x17 7 0.0023245614 2.621901e-02 5.826446e-02
## mean of x18 8 0.0018859649 6.155882e-02 1.025980e-01
## mean of x19 9 0.0011549708 2.974065e-01 3.498900e-01
diff_test_ggPlot(GroupAJackknifedGlobEff)
The second test implemented is the group test. This examines differences between to sample level groups (such as healthy controls and individuals with a disorder) in a network statistic, subject to a network manipulation.
In this example, GroupA
has been simulated to have node 10 be important for global efficiency, while GroupB
has node 15 as important for global efficiency. We combine these datasets into a single object, and perform the group testing now.
fullGroup = c(GroupA, GroupB)
fullSamp = as_NetSample(fullGroup,net.names = as.character(1:40), sample.variables = list(group = c(rep("GroupA", 20), rep("GroupB", 20))))
fullSampJackknifed = net_apply(fullSamp, net.function = "node_jackknife")
fullSampleJackknifedGlobEff = net_stat_apply(fullSampJackknifed, net.stat.fun = "global_efficiency")
group_test(fullSampleJackknifedGlobEff, grouping.variable = "group")
## net.names p 1 2 adjusted.p
## 1 1 3.479674e-01 0.6416667 0.6461988 3.479674e-01
## 2 10 5.457061e-15 0.5136264 0.6459064 5.457061e-15
## 3 11 6.055813e-01 0.6425439 0.6445906 6.055813e-01
## 4 12 6.505909e-01 0.6422515 0.6440058 6.505909e-01
## 5 13 1.402365e-01 0.6397661 0.6457602 1.402365e-01
## 6 14 1.435786e-01 0.6410819 0.6469298 1.435786e-01
## 7 15 3.240468e-16 0.6412281 0.5295224 3.240468e-16
## 8 16 1.961826e-01 0.6413743 0.6464912 1.961826e-01
## 9 17 9.077830e-02 0.6404971 0.6472222 9.077830e-02
## 10 18 7.671224e-01 0.6425439 0.6438596 7.671224e-01
## 11 19 2.873317e-01 0.6418129 0.6451754 2.873317e-01
## 12 2 2.970671e-01 0.6406433 0.6448830 2.970671e-01
## 13 20 1.230613e-01 0.6394737 0.6461988 1.230613e-01
## 14 3 8.593489e-01 0.6431287 0.6438596 8.593489e-01
## 15 4 2.796704e-01 0.6418129 0.6464912 2.796704e-01
## 16 5 3.257748e-01 0.6416667 0.6457602 3.257748e-01
## 17 6 7.125387e-01 0.6426901 0.6441520 7.125387e-01
## 18 7 2.771352e-01 0.6416667 0.6461988 2.771352e-01
## 19 8 1.800351e-01 0.6412281 0.6467836 1.800351e-01
## 20 9 3.712339e-01 0.6404971 0.6441520 3.712339e-01
group_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")
Finally, the group difference test assesses if the network manipulation has a differential impact on the network statistic between the groups. This test is implemented with group_diff_test
and graphically with netGroupDiff_ggPlot
.
group_diff_test(fullSampleJackknifedGlobEff, grouping.variable = "group")
## net.names p 1 2 adjusted.p
## 1 1 7.837022e-01 0.0023245614 0.0027777778 8.707803e-01
## 2 10 5.474698e-19 -0.1257156781 0.0024853801 5.474698e-18
## 3 11 2.053881e-01 0.0032017544 0.0011695906 4.211775e-01
## 4 12 3.727280e-02 0.0029093567 0.0005847953 1.723862e-01
## 5 13 2.105888e-01 0.0004239766 0.0023391813 4.211775e-01
## 6 14 2.348841e-01 0.0017397661 0.0035087719 4.270620e-01
## 7 15 4.898286e-23 0.0018859649 -0.1138986355 9.796572e-22
## 8 16 5.036256e-01 0.0020321637 0.0030701754 7.748087e-01
## 9 17 5.329649e-02 0.0011549708 0.0038011696 1.776550e-01
## 10 18 1.113582e-01 0.0032017544 0.0004385965 3.181664e-01
## 11 19 6.274374e-01 0.0024707602 0.0017543860 8.646142e-01
## 12 2 9.032667e-01 0.0013011696 0.0014619883 9.508071e-01
## 13 20 4.309656e-02 0.0001315789 0.0027777778 1.723862e-01
## 14 3 1.857876e-02 0.0037865497 0.0004385965 1.238584e-01
## 15 4 6.484607e-01 0.0024707602 0.0030701754 8.646142e-01
## 16 5 9.900847e-01 0.0023245614 0.0023391813 9.900847e-01
## 17 6 1.398629e-01 0.0033479532 0.0007309942 3.496572e-01
## 18 7 7.441306e-01 0.0023245614 0.0027777778 8.707803e-01
## 19 8 3.589458e-01 0.0018859649 0.0033625731 5.982429e-01
## 20 9 7.695662e-01 0.0011549708 0.0007309942 8.707803e-01
group_diff_test_ggPlot(fullSampleJackknifedGlobEff, grouping.variable="group")
From this, we can see that when node 10 or node 15 are removed, this results in a significantly different change from the original global efficiency value between Group A and Group B.