While {migraph}
includes a number of datasets (see here),
and there are several packages in R that already include a range of
social network data, there often comes a time when it is necessary to
import and analyse data from other sources. Fortunately,
{migraph}
has a range of tools you can employ to import
your data and manipulate it.
There are a great number of networks datasets and data resources. Here we keep just a necessarily partial list, but we are happy to update it whenever additional datasets are suggested. See for example:
See also:
Please let us know if you identify any further repositories of social or political networks and we would be happy to add them here.
{migraph}
includes several functions that help read from
(import) and write to (export) network data in a growing number of
formats.
One format most users are long familiar with is Excel. In Excel,
users are typically collecting network data as edgelists, nodelists, or
both. Edgelists are typically the main object to be imported, and we can
import them from an Excel file or a .csv
file.1
library(migraph)
<- read_edgelist("~Downloads/mynetworkdata.xlsx")
g1 <- read_edgelist("~Downloads/mynetworkdata.csv", sv = "semi-colon")
g1 <- read_edgelist()
g1 <- read_nodelist() n1
If you do not specify a particular file name, a helpful popup will open that assists you with locating and importing a file from your operating system. Importing a nodelist of nodal attributes operates very similarly.
In some cases, users will be faced with having to collect data
themselves, or wish to first manipulate the data in Excel before
importing it, but may be uncertain about the expected format of an
edgelist. Here it may be useful to try exporting one of the built-in
datasets in {migraph}
to see how complete network data
looks. If this is potentially complex, calling
write_edgelist()
without any arguments will export a test
file with a barebones structure that you can overwrite with your own
data.
There are other functions here too that help import from or export to common external network data formats. Here are some examples:
# for importing .net or .paj files
read_pajek()
write_pajek()
# for importing .##h files
# (.##d files are automatically imported alongside)
read_ucinet()
write_ucinet()
By default, read_
and write_
edgelist
and nodelist
will import objects into
a data frame or tibble format or ‘class’ object, and read_
and write_
pajek
or ucinet
will
import objects into a tidygraph class format.
These can be already useful, as {migraph}
functions
recognise and work with most main classes of network/graph objects in R:
edgelists, matrices, igraph, tidygraph, and network objects.
However it is sometimes necessary to convert a given object from one
class to another. Here we can use any of a collection of coercion
functions, all prefixed by as_
, to move from any of those
objects that {migraph}
recognises to any other.
Let’s use one of the built in datasets in {migraph}
to
demonstrate this. Davis, Gardner and Gardner’s (1941)
ison_southern_women
dataset is a classic two-mode network,
so let’s use this to start with. {migraph}
stores this
dataset as an ‘igraph’ object, though other included datasets are in
‘tidygraph’ or sometimes ‘network’ formats.
library(migraph)
# this is in igraph format
ison_southern_women #> IGRAPH f8d9f5f UN-B 32 93 --
#> + attr: type (v/l), name (v/c)
#> + edges from f8d9f5f (vertex names):
#> [1] EVELYN --E1 EVELYN --E2 EVELYN --E3 EVELYN --E4 EVELYN --E5
#> [6] EVELYN --E6 EVELYN --E8 EVELYN --E9 LAURA --E1 LAURA --E2
#> [11] LAURA --E3 LAURA --E5 LAURA --E6 LAURA --E7 LAURA --E8
#> [16] THERESA --E2 THERESA --E3 THERESA --E4 THERESA --E5 THERESA --E6
#> [21] THERESA --E7 THERESA --E8 THERESA --E9 BRENDA --E1 BRENDA --E3
#> [26] BRENDA --E4 BRENDA --E5 BRENDA --E6 BRENDA --E7 BRENDA --E8
#> [31] CHARLOTTE--E3 CHARLOTTE--E4 CHARLOTTE--E5 CHARLOTTE--E7 FRANCES --E3
#> [36] FRANCES --E5 FRANCES --E6 FRANCES --E8 ELEANOR --E5 ELEANOR --E6
#> + ... omitted several edges
as_tidygraph(ison_southern_women) # now let's make it a tidygraph tbl_graph object
#> # A tbl_graph: 32 nodes and 93 edges
#> #
#> # A bipartite simple graph with 1 component
#> #
#> # Node Data: 32 × 2 (active)
#> type name
#> <lgl> <chr>
#> 1 FALSE EVELYN
#> 2 FALSE LAURA
#> 3 FALSE THERESA
#> 4 FALSE BRENDA
#> 5 FALSE CHARLOTTE
#> 6 FALSE FRANCES
#> # … with 26 more rows
#> #
#> # Edge Data: 93 × 2
#> from to
#> <int> <int>
#> 1 1 19
#> 2 1 20
#> 3 1 21
#> # … with 90 more rows
as_network(ison_southern_women) # a network object
#> Network attributes:
#> vertices = 32
#> directed = FALSE
#> hyper = FALSE
#> loops = FALSE
#> multiple = FALSE
#> bipartite = 18
#> total edges= 93
#> missing edges= 0
#> non-missing edges= 93
#>
#> Vertex attribute names:
#> vertex.names
#>
#> No edge attributes
as_matrix(ison_southern_women) # a matrix object
#> E1 E2 E3 E4 E5 E6 E7 E8 E9 E10 E11 E12 E13 E14
#> EVELYN 1 1 1 1 1 1 0 1 1 0 0 0 0 0
#> LAURA 1 1 1 0 1 1 1 1 0 0 0 0 0 0
#> THERESA 0 1 1 1 1 1 1 1 1 0 0 0 0 0
#> BRENDA 1 0 1 1 1 1 1 1 0 0 0 0 0 0
#> CHARLOTTE 0 0 1 1 1 0 1 0 0 0 0 0 0 0
#> FRANCES 0 0 1 0 1 1 0 1 0 0 0 0 0 0
#> ELEANOR 0 0 0 0 1 1 1 1 0 0 0 0 0 0
#> PEARL 0 0 0 0 0 1 0 1 1 0 0 0 0 0
#> RUTH 0 0 0 0 1 0 1 1 1 0 0 0 0 0
#> VERNE 0 0 0 0 0 0 1 1 1 0 0 1 0 0
#> MYRA 0 0 0 0 0 0 0 1 1 1 0 1 0 0
#> KATHERINE 0 0 0 0 0 0 0 1 1 1 0 1 1 1
#> SYLVIA 0 0 0 0 0 0 1 1 1 1 0 1 1 1
#> NORA 0 0 0 0 0 1 1 0 1 1 1 1 1 1
#> HELEN 0 0 0 0 0 0 1 1 0 1 1 1 1 1
#> DOROTHY 0 0 0 0 0 0 0 1 1 1 0 1 0 0
#> OLIVIA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
#> FLORA 0 0 0 0 0 0 0 0 1 0 1 0 0 0
# this is an incidence matrix since it is a two-mode network
# if it were a one-mode network, the function would return an adjacency matrix
as_edgelist(ison_southern_women) # an edgelist data frame/tibble
#> # A tibble: 93 × 2
#> from to
#> <chr> <chr>
#> 1 EVELYN E1
#> 2 EVELYN E2
#> 3 EVELYN E3
#> 4 EVELYN E4
#> 5 EVELYN E5
#> 6 EVELYN E6
#> 7 EVELYN E8
#> 8 EVELYN E9
#> 9 LAURA E1
#> 10 LAURA E2
#> # … with 83 more rows
Generally, {migraph}
attempts to retain as much
information as possible when converting objects between different
classes. The presumption is that users should explicitly decide to
reduce or simplify their data. {migraph}
includes a
number of functions for transforming (or removing) certain
properties of network objects. For example:
to_unnamed()
removes/anonymises all vertex/node
labelsto_undirected()
replaces directed ties with an
undirected tie (if an arc in either direction is present)to_unweighted()
binarises or dichotomises a network
around a particular threshold (by default 1
)to_unsigned()
returns just the “positive” or “negative”
ties from a signed network, respectivelyto_uniplex()
reduces a multigraph or multiplex network
to one with a single set of edges or tiesto_simplex()
removes all loops or self-ties from a
complex networkThen there are a few more special functions included here too:
to_multilevel()
converts objects with two or more modes
into a multimodal network structure with attribute ‘lvl’ (1, 2, etc)
instead of ‘type’ (FALSE or TRUE)to_onemode()
converts multimodal networks into networks
with only one type of node, retaining the same nodes and tiesto_main_component()
identifies and returns only the
main component of a networkto_named()
adds random names to an anonymous network,
which can be useful for pedagogical purposesto_unnamed(ison_marvel_relationships)
#> # A tbl_graph: 53 nodes and 558 edges
#> #
#> # An undirected multigraph with 4 components
#> #
#> # Node Data: 53 × 9 (active)
#> Gender Appearances Attractive Rich Intellect Omnilingual PowerOrigin
#> <chr> <int> <int> <int> <int> <int> <chr>
#> 1 Male 427 0 0 1 1 Radiation
#> 2 Male 589 1 0 1 0 Human
#> 3 Male 1207 0 0 1 1 Mutant
#> 4 Male 7609 1 0 1 0 Mutant
#> 5 Male 2189 1 1 1 0 Human
#> 6 Female 2907 1 0 1 0 Human
#> # … with 47 more rows, and 2 more variables: UnarmedCombat <int>,
#> # ArmedCombat <int>
#> #
#> # Edge Data: 558 × 3
#> from to sign
#> <int> <int> <dbl>
#> 1 1 4 -1
#> 2 1 11 -1
#> 3 1 12 -1
#> # … with 555 more rows
to_named(ison_algebra)
#> # A tbl_graph: 16 nodes and 144 edges
#> #
#> # A directed simple graph with 1 component
#> #
#> # Node Data: 16 × 1 (active)
#> name
#> <chr>
#> 1 Jermaine
#> 2 Willie
#> 3 Erma
#> 4 Rita
#> 5 Michele
#> 6 Stacy
#> # … with 10 more rows
#> #
#> # Edge Data: 144 × 5
#> from to friends social tasks
#> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 5 0 1.2 0.3
#> 2 1 8 0 0.15 0
#> 3 1 9 0 2.85 0.3
#> # … with 141 more rows
to_undirected(ison_algebra)
#> # A tbl_graph: 16 nodes and 76 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 16 × 1 (active)
#> name
#> <chr>
#> 1 Melinda
#> 2 Abby
#> 3 Darryl
#> 4 Veronica
#> 5 Rylan
#> 6 Lindsey
#> # … with 10 more rows
#> #
#> # Edge Data: 76 × 5
#> from to friends social tasks
#> <int> <int> <dbl> <dbl> <dbl>
#> 1 1 2 1 0 0
#> 2 2 3 0 0.15 0
#> 3 1 5 0 1.2 0.3
#> # … with 73 more rows
to_unsigned(ison_marvel_relationships, keep = "positive")
#> # A tbl_graph: 53 nodes and 277 edges
#> #
#> # An undirected simple graph with 6 components
#> #
#> # Node Data: 53 × 10 (active)
#> name Gender Appearances Attractive Rich Intellect Omnilingual PowerOrigin
#> <chr> <chr> <int> <int> <int> <int> <int> <chr>
#> 1 Abom… Male 427 0 0 1 1 Radiation
#> 2 Ant-… Male 589 1 0 1 0 Human
#> 3 Apoc… Male 1207 0 0 1 1 Mutant
#> 4 Beast Male 7609 1 0 1 0 Mutant
#> 5 Blac… Male 2189 1 1 1 0 Human
#> 6 Blac… Female 2907 1 0 1 0 Human
#> # … with 47 more rows, and 2 more variables: UnarmedCombat <int>,
#> # ArmedCombat <int>
#> #
#> # Edge Data: 277 × 2
#> from to
#> <int> <int>
#> 1 2 25
#> 2 2 29
#> 3 2 44
#> # … with 274 more rows
Note that for two-mode networks, there are also functions for converting or ‘projecting’ two-mode networks into one-mode networks.
to_mode1(ison_southern_women)
#> IGRAPH 1960c78 UNW- 18 139 --
#> + attr: name (v/c), weight (e/n)
#> + edges from 1960c78 (vertex names):
#> [1] EVELYN --LAURA EVELYN --BRENDA EVELYN --THERESA EVELYN --CHARLOTTE
#> [5] EVELYN --FRANCES EVELYN --ELEANOR EVELYN --RUTH EVELYN --PEARL
#> [9] EVELYN --NORA EVELYN --VERNE EVELYN --MYRA EVELYN --KATHERINE
#> [13] EVELYN --SYLVIA EVELYN --HELEN EVELYN --DOROTHY EVELYN --OLIVIA
#> [17] EVELYN --FLORA LAURA --BRENDA LAURA --THERESA LAURA --CHARLOTTE
#> [21] LAURA --FRANCES LAURA --ELEANOR LAURA --RUTH LAURA --PEARL
#> [25] LAURA --NORA LAURA --VERNE LAURA --SYLVIA LAURA --HELEN
#> [29] LAURA --MYRA LAURA --KATHERINE LAURA --DOROTHY THERESA--BRENDA
#> + ... omitted several edges
to_mode2(ison_southern_women)
#> IGRAPH 517e95e UNW- 14 66 --
#> + attr: name (v/c), weight (e/n)
#> + edges from 517e95e (vertex names):
#> [1] E1 --E2 E1 --E3 E1 --E4 E1 --E5 E1 --E6 E1 --E8 E1 --E9 E1 --E7
#> [9] E2 --E3 E2 --E4 E2 --E5 E2 --E6 E2 --E8 E2 --E9 E2 --E7 E3 --E4
#> [17] E3 --E5 E3 --E6 E3 --E8 E3 --E9 E3 --E7 E4 --E5 E4 --E6 E4 --E8
#> [25] E4 --E9 E4 --E7 E5 --E6 E5 --E8 E5 --E9 E5 --E7 E6 --E8 E6 --E9
#> [33] E6 --E7 E6 --E10 E6 --E11 E6 --E12 E6 --E13 E6 --E14 E7 --E8 E7 --E9
#> [41] E7 --E12 E7 --E10 E7 --E13 E7 --E14 E7 --E11 E8 --E9 E8 --E12 E8 --E10
#> [49] E8 --E13 E8 --E14 E8 --E11 E9 --E12 E9 --E10 E9 --E13 E9 --E14 E9 --E11
#> [57] E10--E12 E10--E13 E10--E14 E10--E11 E11--E12 E11--E13 E11--E14 E12--E13
#> + ... omitted several edges
If you import one or more edgelists and nodelists, it can be useful to bind these together in an igraph, tidygraph, or network class object.
Adding nodal attributes to a given network is relatively
straightforward. One can bind a single new attribute to the nodes with
add_node_attribute()
or copy a set of attributes from one
network/graph to another with copy_node_attributes()
. But
often the easiest way to do this is to take a network/graph, make sure
it is first coerced into a tidygraph object, and then add any additional
nodal attributes (including measures from {migraph}
) as
follows:
as_tidygraph(mpn_elite_mex) %>%
mutate(order = 1:35,
color = "red",
degree = node_degree(mpn_elite_mex))
#> # A tbl_graph: 35 nodes and 117 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 35 × 11 (active)
#> name full_name entry_year military in_mpn PlaceOfBirth state region order
#> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <dbl> <int>
#> 1 Trev… Trevino,… 1910 1 0 Guerrero Coah… 1 1
#> 2 Made… Madero, … 1911 0 0 Parras de l… Coah… 1 2
#> 3 Carr… Carranza… 1913 1 0 Cuatro Cien… Coah… 1 3
#> 4 Agui… Aguilar,… 1918 1 0 Cordoba Vera… 3 4
#> 5 Obre… Obregon,… 1920 1 0 Siquisiva, … Sono… 1 5
#> 6 Call… Calles, … 1924 1 0 Guaymas Sono… 1 6
#> # … with 29 more rows, and 2 more variables: color <chr>, degree <node_msr>
#> #
#> # Edge Data: 117 × 2
#> from to
#> <int> <int>
#> 1 2 3
#> 2 2 5
#> 3 2 6
#> # … with 114 more rows
Adding edge attributes or new edges is not quite so straightforward,
in part because you will need to decide which it is that you want to do.
If you would like to just add a new tie attribute to an existing set of
ties, without adding any new edges, then
add_tie_attributes()
operates similarly to
add_node_attribute()
above. But if the result should be a
multiplex network and the ties in the different component networks only
partially overlap, then you will need to use
join_ties()
:
generate_random(10, .3) %>%
join_ties(generate_random(10, .3), "next")
#> # A tbl_graph: 10 nodes and 24 edges
#> #
#> # An undirected simple graph with 1 component
#> #
#> # Node Data: 10 × 0 (active)
#> # … with 4 more rows
#> #
#> # Edge Data: 24 × 4
#> from to orig `next`
#> <int> <int> <dbl> <dbl>
#> 1 1 4 0 1
#> 2 1 7 1 0
#> 3 1 8 0 1
#> # … with 21 more rows
Lastly, sometimes we want to extract certain information from a
network or graph object. Here too {migraph}
has you
covered.
node_names(mpn_elite_mex) # gets the names of the nodes
#> [1] "Trevino" "Madero" "Carranza"
#> [4] "Aguilar" "Obregon" "Calles"
#> [7] "Aleman Gonzalez" "Portes Gil" "L. Cardenas"
#> [10] "Avila Camacho" "I. Beteta" "Jara"
#> [13] "R. Beteta" "Aleman Valdes" "Sanchez Taboada"
#> [16] "Serra Rojas" "Ruiz Galindo" "Bustamante"
#> [19] "Loyo" "Carvajal" "Ruiz Cortines"
#> [22] "Carrillo Flores" "Ortiz Mena" "Gonzalez Blanco"
#> [25] "Salinas Lozano" "Lopez Mateos" "Margain"
#> [28] "Diaz Ordaz" "M.R. Beteta" "Echeverria Alvarez"
#> [31] "Lopez Portillo" "C. Cardenas" "De la Madrid"
#> [34] "Salinas de Gortari" "Aleman Velasco"
node_attribute(ison_marvel_relationships, "Gender") # gets any named nodal attribute
#> [1] "Male" "Male" "Male" "Male" "Male" "Female" "Male" "Male"
#> [9] "Male" "Male" "Male" "Male" "Male" "Male" "Male" "Male"
#> [17] "Male" "Female" "Male" "Male" "Male" "Male" "Male" "Male"
#> [25] "Female" "Male" "Male" "Female" "Female" "Male" "Male" "Female"
#> [33] "Female" "Male" "Male" "Male" "Female" "Male" "Male" "Male"
#> [41] "Male" "Male" "Male" "Female" "Male" "Male" "Female" "Male"
#> [49] "Male" "Male" "Male" "Male" "Male"
tie_attribute(ison_marvel_relationships, "sign") # gets any named edge attribute
#> [1] -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1
#> [26] -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [51] 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [76] 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1
#> [101] 1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 1 1
#> [126] 1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1
#> [151] 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1
#> [176] 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1
#> [201] 1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 1
#> [226] 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1
#> [251] -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1
#> [276] -1 1 1 -1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1
#> [301] 1 -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 -1 1 1 1 1 1 1 1 1 1 -1
#> [326] -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1
#> [351] -1 -1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1
#> [376] -1 -1 -1 1 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1
#> [401] 1 1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 1 1 -1 -1 -1 -1
#> [426] -1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 -1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1
#> [451] -1 -1 -1 1 1 -1 -1 -1 -1 -1 1 1 1 1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1
#> [476] -1 -1 -1 1 1 1 1 1 -1 -1 -1 -1 -1 1 1 1 1 1 -1 -1 1 1 1 1 1
#> [501] 1 1 -1 1 1 1 1 -1 1 1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 1
#> [526] -1 -1 -1 -1 1 1 1 1 -1 -1 1 1 1 1 1 -1 -1 -1 1 1 1 -1 -1 -1 -1
#> [551] 1 1 -1 -1 1 -1 1 -1
tie_weights(mpn_elite_mex)
#> [1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [38] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [75] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
#> [112] 1 1 1 1 1 1
We can describe the network using similar functions. How many nodes in the network, or how many edges?
graph_nodes(mpn_elite_mex)
#> [1] 35
graph_ties(mpn_elite_mex)
#> [1] 117
graph_dims(mpn_elite_mex)
#> [1] 35
Note that if you import from a .csv file, please specify
whether the separation value should be commas
(sv = "comma"
) or semi-colons
(sv = "semi-colon"
).↩︎