The sdcHierarchies
packages allows to create, modify and
export nested hierarchies that are used for example to define tables in
statistical disclosure control software such as in sdcTable
Before using, the package needs to be loaded:
library(sdcHierarchies)
hier_create()
allows to create a hierarchy. Argument
root
specifies the name of the root node. Optionally, it is
possible to add some nodes to the top-level by listing their names in
argument node_labs
. Also, hier_display()
shows
the hierarchical structure of the current tree as shown below:
<- hier_create(root = "Total", nodes = LETTERS[1:5])
h hier_display(h)
## Total
## ├─A
## ├─B
## ├─C
## ├─D
## └─E
Once such an object is created, it can be modified by the following functions:
hier_add()
: allows to add nodes to the hierarchyhier_delete()
: allows to delete nodes from the
treehier_rename()
: allows to rename nodesThese functions can be applied as shown below:
## adding nodes below the node specified in argument `node`
<- hier_add(h, root = "A", nodes = c("a1", "a2"))
h <- hier_add(h, root = "B", nodes = c("b1", "b2"))
h <- hier_add(h, root = "b1", nodes = c("b1_a", "b1_b"))
h
# deleting one or more nodes from the hierarchy
<- hier_delete(h, nodes = c("a1", "b2"))
h <- hier_delete(h, nodes = c("a2"))
h
# rename nodes
<- hier_rename(h, nodes = c("C" = "X", "D" = "Y"))
h hier_display(h)
## Total
## ├─A
## ├─B
## │ └─b1
## │ ├─b1_a
## │ └─b1_b
## ├─X
## ├─Y
## └─E
We note that the underlying data.tree package allows to modify the objects on reference so no explicit assignment of the form is required.
Function hier_info()
returns information about the nodes
that are specified in argument leaves
.
# about a specific node
<- hier_info(h, nodes = c("b1", "E")) info
info
is a named list where each list element refers to a
queried node. The results for level b1
could be extracted
as shown below:
$b1 info
## $name
## [1] "b1"
##
## $is_rootnode
## [1] FALSE
##
## $level
## [1] 3
##
## $is_leaf
## [1] FALSE
##
## $siblings
## character(0)
##
## $contributing_codes
## [1] "b1_a" "b1_b"
##
## $children
## [1] "b1_a" "b1_b"
##
## $parent
## [1] "B"
##
## $is_bogus
## [1] FALSE
##
## $parent_bogus
## character(0)
Information about all nodes can be extracted by not specifying
argument leaves
.
Function hier_convert()
takes a hierarchy and allows to
convert the network based structure to different formats while
hier_export()
does the conversion and writes the results to
a file on the disk. The following formats are currently supported:
df
: a “@;label”-based format that can be used in sdcTabledt
: the same as df
, but the result is
returned as a argus
: also a “@;label”-based format that used to
create hrc-files suitable for \(\tau\)-argusjson
: a json-encoded stringcode
: the required code to re-build the current
hierarchysdc
: a list
which is a suitable input for
sdcTable# conversion to a "@;label"-based format
<- hier_convert(h, as = "df")
res_df print(res_df)
## level name
## 1: @ Total
## 2: @@ A
## 3: @@ B
## 4: @@@ b1
## 5: @@@@ b1_a
## 6: @@@@ b1_b
## 7: @@ X
## 8: @@ Y
## 9: @@ E
The required code to create this hierarchy could be computed using:
<- hier_convert(h, as = "code"); cat(code, sep = "\n") code
## library(sdcHierarchies)
## tree <- hier_create(root = 'Total', nodes = c('A', 'B', 'X', 'Y', 'E'))
## tree <- hier_add(tree = tree, root = 'B', nodes = 'b1')
## tree <- hier_add(tree = tree, root = 'b1', nodes = c('b1_a', 'b1_b'))
## print(tree)
Using hier_export()
one can write the results to a file.
This is for example useful if one wants to create hrc
-files
that could be used as input for \(\tau\)-argus which can be achieved as
follows:
hier_export(h, as = "argus", path = file.path(tempfile(), "hierarchy.hrc"))
hier_import()
returns a network-based hierarchy given
either a data.frame (in @;labs
-format), json format, code
or from a tau-argus compatible hrc-file
. For example if we
want to create a hierarchy based of res_df
which was
previously created using hier_convert()
, the code is as
simple as:
<- hier_import(inp = res_df, from = "df")
n_df hier_display(n_df)
## Total
## ├─A
## ├─B
## │ └─b1
## │ ├─b1_a
## │ └─b1_b
## ├─E
## ├─X
## └─Y
Using hier_import(inp = "hierarchy.hrc", from = "argus")
one could create a sdc hierarchy object directly from a
hrc
-file.
Often it is the case, the the nested hierarchy information in encoded
in a string. Function hier_compute()
allows to transform
such strings into hierarchy objects. One can distinguish two cases: The
first case is where all input codes have the same length while in the
latter case the length of the codes differs. Let’s assume we have a
geographic code given in geo_m
where digits 1-2 refer to
the first level, digit 3 to the second and digits 4-5 to the third level
of the hierarchy.
<- c(
geo_m "01051", "01053", "01054", "01055", "01056", "01057", "01058", "01059", "01060", "01061", "01062",
"02000",
"03151", "03152", "03153", "03154", "03155", "03156", "03157", "03158", "03251", "03252", "03254", "03255",
"03256", "03257", "03351", "03352", "03353", "03354", "03355", "03356", "03357", "03358", "03359", "03360",
"03361", "03451", "03452", "03453", "03454", "03455", "03456",
"10155")
Function hier_compute()
takes a character vector and
creates a hierarchy from it. In argument method
, two ways
of specifying the encoded levels can be chosen.
endpos
: an integerish-vector must be specified in
argument dim_spec
holding the end-position at each
levellen
: an integerish-vector must be specified in argument
dim_spec
containing for each level how many digits are
requiredIn case the overal total is not encoded in the input, specifying
argument root
allows to give a name to the overall total.
Additionally, it is possible to set the desired output format in
parameter as
. In the example below setting
as = "df"
returns the result as a data.frame
in @; key
-format. The two methods on how to define the
positions of the levels are interchangable and lead to the same
hierarchy as shown below:
<- hier_compute(
v1 inp = geo_m,
dim_spec = c(2, 3, 5),
root = "Tot",
method = "endpos",
as = "df"
)
<- hier_compute(
v2 inp = geo_m,
dim_spec = c(2, 1, 2),
root = "Tot",
method = "len",
as = "df"
)
identical(v1, v2)
## [1] TRUE
hier_display(v1)
## Tot
## ├─01
## │ └─010
## │ ├─01051
## │ ├─01053
## │ ├─01054
## │ ├─01055
## │ ├─01056
## │ ├─01057
## │ ├─01058
## │ ├─01059
## │ ├─01060
## │ ├─01061
## │ └─01062
## ├─02
## │ └─020
## │ └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │ ├─03451
## │ ├─03452
## │ ├─03453
## │ ├─03454
## │ ├─03455
## │ └─03456
## └─10
## └─101
## └─10155
If the total is contained in the string, let’s say in the first 3 positions of the input values, the hierarchy can be computed as follows:
<- paste0("Tot", geo_m)
geo_m_with_tot head(geo_m_with_tot)
## [1] "Tot01051" "Tot01053" "Tot01054" "Tot01055" "Tot01056" "Tot01057"
<- hier_compute(
v3 inp = geo_m_with_tot,
dim_spec = c(3, 2, 1, 2),
method = "len"
hier_display(v3) );
## Tot
## ├─01
## │ └─010
## │ ├─01051
## │ ├─01053
## │ ├─01054
## │ ├─01055
## │ ├─01056
## │ ├─01057
## │ ├─01058
## │ ├─01059
## │ ├─01060
## │ ├─01061
## │ └─01062
## ├─02
## │ └─020
## │ └─02000
## ├─03
## │ ├─031
## │ │ ├─03151
## │ │ ├─03152
## │ │ ├─03153
## │ │ ├─03154
## │ │ ├─03155
## │ │ ├─03156
## │ │ ├─03157
## │ │ └─03158
## │ ├─032
## │ │ ├─03251
## │ │ ├─03252
## │ │ ├─03254
## │ │ ├─03255
## │ │ ├─03256
## │ │ └─03257
## │ ├─033
## │ │ ├─03351
## │ │ ├─03352
## │ │ ├─03353
## │ │ ├─03354
## │ │ ├─03355
## │ │ ├─03356
## │ │ ├─03357
## │ │ ├─03358
## │ │ ├─03359
## │ │ ├─03360
## │ │ └─03361
## │ └─034
## │ ├─03451
## │ ├─03452
## │ ├─03453
## │ ├─03454
## │ ├─03455
## │ └─03456
## └─10
## └─101
## └─10155
The result is the same as v1
and v2
previously generated.
hier_compute()
can also deal with inputs that are of
different length as shown in the next example.
## second example, unequal strings; overall total not included in input
<- c(
yae_h "1.1.1.", "1.1.2.",
"1.2.1.", "1.2.2.", "1.2.3.", "1.2.4.", "1.2.5.", "1.3.1.",
"1.3.2.", "1.3.3.", "1.3.4.", "1.3.5.",
"1.4.1.", "1.4.2.", "1.4.3.", "1.4.4.", "1.4.5.",
"1.5.", "1.6.", "1.7.", "1.8.", "1.9.", "2.", "3.")
<- hier_compute(
v1 inp = yae_h,
dim_spec = c(2,2,2),
root = "Tot",
method = "len"
hier_display(v1) );
## Tot
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ └─1.4.5.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.
We also note that there is another way to specify the inputs in
hier_compute()
. Setting argument
method = "list"
allows to create a hierarchy from a given
named list. In such a list, the name of a list element is interpreted as
the name of the parent node of all codes of the specific list element.
An example is shown below:
<- list()
yae_ll "Total"]] <- c("1.", "2.", "3.")
yae_ll[["1."]] <- paste0("1.", 1:9, ".")
yae_ll[["1.1."]] <- paste0("1.1.", 1:2, ".")
yae_ll[["1.2."]] <- paste0("1.2.", 1:5, ".")
yae_ll[["1.3."]] <- paste0("1.3.", 1:5, ".")
yae_ll[["1.4."]] <- paste0("1.4.", 1:6, ".")
yae_ll[[<- hier_compute(inp = yae_ll, root = "Total", method = "list") d
## Argument 'dim_spec' is ignored when constructing a hierarchy from a nested list.
hier_display(d)
## Total
## ├─1.
## │ ├─1.1.
## │ │ ├─1.1.1.
## │ │ └─1.1.2.
## │ ├─1.2.
## │ │ ├─1.2.1.
## │ │ ├─1.2.2.
## │ │ ├─1.2.3.
## │ │ ├─1.2.4.
## │ │ └─1.2.5.
## │ ├─1.3.
## │ │ ├─1.3.1.
## │ │ ├─1.3.2.
## │ │ ├─1.3.3.
## │ │ ├─1.3.4.
## │ │ └─1.3.5.
## │ ├─1.4.
## │ │ ├─1.4.1.
## │ │ ├─1.4.2.
## │ │ ├─1.4.3.
## │ │ ├─1.4.4.
## │ │ ├─1.4.5.
## │ │ └─1.4.6.
## │ ├─1.5.
## │ ├─1.6.
## │ ├─1.7.
## │ ├─1.8.
## │ └─1.9.
## ├─2.
## └─3.
Using hier_grid()
it is possible to compute all
combinations of codes given several hierarchies. This is useful to build
a complete table (e.g for merging purposes). The functionality of
hier_grid
is shown below. First, we need to specify some
hierarchies.
<- hier_create("Total", nodes = LETTERS[1:3])
h1 <- hier_add(h1, root = "A", node = "a1")
h1 <- hier_add(h1, root = "a1", node = "aa1")
h1
<- hier_create("Total", letters[1:5])
h2 <- hier_add(h2, root = "b", node = "b1")
h2 <- hier_add(h2, root = "d", node = "d1") h2
Note that we - on purpose - added some “bogus” codes to each
h1
and h2
as codes a1
and
aa1
in h1
and b1
and
d1
in h2
are just identical to their
respective parent categories. Applying hier_grid
is as
simple as
hier_grid(h1, h2)
## v1 v2
## 1: Total Total
## 2: A Total
## 3: a1 Total
## 4: aa1 Total
## 5: B Total
## 6: C Total
## 7: Total a
## 8: A a
## 9: a1 a
## 10: aa1 a
## 11: B a
## 12: C a
## 13: Total b
## 14: A b
## 15: a1 b
## 16: aa1 b
## 17: B b
## 18: C b
## 19: Total b1
## 20: A b1
## 21: a1 b1
## 22: aa1 b1
## 23: B b1
## 24: C b1
## 25: Total c
## 26: A c
## 27: a1 c
## 28: aa1 c
## 29: B c
## 30: C c
## 31: Total d
## 32: A d
## 33: a1 d
## 34: aa1 d
## 35: B d
## 36: C d
## 37: Total d1
## 38: A d1
## 39: a1 d1
## 40: aa1 d1
## 41: B d1
## 42: C d1
## 43: Total e
## 44: A e
## 45: a1 e
## 46: aa1 e
## 47: B e
## 48: C e
## v1 v2
separating all target hierarchies with a ,
.
hier_grid
then computes all combinations of codes from
hierarchies h1
and h2
. Using the default
options, these bogus codes are included in the output
data.table
. Setting argument add_dups = FALSE
removes all rows containing such bogus codes. Setting option
add_levs = TRUE
adds some columns labeled
levs_v{n}
to the output data set. Each of this colum
contains values which define the hierarchy level of the corresponding
code given in variable v{n} in the same row in the table as shown
below.
hier_grid(h1, h2, add_dups = FALSE, add_levs = TRUE)
## v1 v2 levs_v1 levs_v2
## 1: Total Total 1 1
## 2: A Total 2 1
## 3: B Total 2 1
## 4: C Total 2 1
## 5: Total a 1 2
## 6: A a 2 2
## 7: B a 2 2
## 8: C a 2 2
## 9: Total b 1 2
## 10: A b 2 2
## 11: B b 2 2
## 12: C b 2 2
## 13: Total c 1 2
## 14: A c 2 2
## 15: B c 2 2
## 16: C c 2 2
## 17: Total d 1 2
## 18: A d 2 2
## 19: B d 2 2
## 20: C d 2 2
## 21: Total e 1 2
## 22: A e 2 2
## 23: B e 2 2
## 24: C e 2 2
## v1 v2 levs_v1 levs_v2
The package also contains a shiny-based interactive app that can be
started using hier_app()
. The app allows to pass as input
either a character vector (that should be converted into a hierarchy) or
an existing hierarchy and can be started as follows given the hierarchy
previously generated using hier_compute()
:
<- hier_app(d) d
If a character vector is passed to hier_app()
, the
interface allows to specify the arguments for
hier_compute()
. Once a hierarchy is created, the interface
changes and the tree can be dynamically changed by dragging nodes
around. Futhermore, it is possible to add, remove or rename nodes. The
required code to construct the current hierarchy is displayed and can be
saved to disk. Furthermore, there is functionality to undo the
last step as well as to export results to either the
R-session or write results to a file. This is
especially helpful if one wants to create for example an
hrc
-file as input for \(\tau\)-argus. Please note that
hier_app()
is able to return the modified hierarchy and not
only save results to disk. In order to continue working, one may assign
the result to a new object as shown in the code above.
In case you have any suggestions or improvements, please feel free to file an issue at our issue tracker or contribute to the package by filing a pull request against the master branch.