The arkhe package provides a set of S4 classes for archaeological data matrices. These new classes represent different special types of matrix.
CountMatrix
represents absolute frequency data,OccurrenceMatrix
represents a co-occurrence
matrix,CompositionMatrix
represents relative frequency
data,IncidenceMatrix
represents presence/absence data,StratigraphicMatrix
represents stratigraphic
relationships.It assumes that you keep your data tidy: each variable (taxon/type) must be saved in its own column and each observation (assemblage/sample) must be saved in its own row. Note that missing values are not allowed.
The internal structure of S4 classes implemented in arkhe is depicted in the UML class diagram in the following figure.
CountMatrix
)We denote the \(m \times p\) count matrix by \(A = \left[ a_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:
\[\begin{align} a_{i \cdot} = \sum_{j = 1}^{p} a_{ij} && a_{\cdot j} = \sum_{i = 1}^{m} a_{ij} && a_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} a_{ij} && \forall a_{ij} \in \mathbb{N} \end{align}\]
CompositionMatrix
)A frequency matrix represents relative abundances.
We denote the \(m \times p\) frequency matrix by \(B = \left[ b_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:
\[\begin{align} b_{i \cdot} = \sum_{j = 1}^{p} b_{ij} = 1 && b_{\cdot j} = \sum_{i = 1}^{m} b_{ij} && b_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} b_{ij} && \forall b_{ij} \in \left[ 0,1 \right] \end{align}\]
OccurrenceMatrix
)A co-occurrence matrix is a symmetric matrix with zeros on its main diagonal, which works out how many times each pairs of taxa occur together in at least one sample.
The \(p \times p\) co-occurrence matrix \(D = \left[ d_{i,j} \right] ~\forall i,j \in \left[ 1,p \right]\) is defined over an \(m \times p\) abundance matrix \(A = \left[ a_{x,y} \right] ~\forall x \in \left[ 1,m \right], y \in \left[ 1,p \right]\) as:
\[ d_{i,j} = \sum_{x = 1}^{m} \bigcap_{y = i}^{j} a_{xy} \]
with row and column sums:
\[\begin{align} d_{i \cdot} = \sum_{j \geqslant i}^{p} d_{ij} && d_{\cdot j} = \sum_{i \leqslant j}^{p} d_{ij} && d_{\cdot \cdot} = \sum_{i = 1}^{p} \sum_{j \geqslant i}^{p} d_{ij} && \forall d_{ij} \in \mathbb{N} \end{align}\]
IncidenceMatrix
)We denote the \(m \times p\) incidence matrix by \(C = \left[ c_{ij} \right] ~\forall i \in \left[ 1,m \right], j \in \left[ 1,p \right]\) with row and column sums:
\[\begin{align} c_{i \cdot} = \sum_{j = 1}^{p} c_{ij} && c_{\cdot j} = \sum_{i = 1}^{m} c_{ij} && c_{\cdot \cdot} = \sum_{i = 1}^{m} \sum_{j = 1}^{p} c_{ij} && \forall c_{ij} \in \lbrace 0,1 \rbrace \end{align}\]
## Load packages
library(arkhe)
These new classes are of simple use, on the same way as the base
matrix
:
set.seed(12345)
### Create a count data matrix
### Data are rounded to zero decimal places, then coerced with as.integer
CountMatrix(data = sample(0:10, 100, TRUE),
nrow = 10, ncol = 10)
#> <CountMatrix: 10 x 10>
#> col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
#> row1 2 6 2 3 9 7 9 3 3 6
#> row2 9 9 8 7 9 10 6 8 9 0
#> row3 7 0 3 10 2 3 6 10 3 2
#> row4 9 7 9 5 2 1 4 0 8 1
#> row5 10 6 6 8 2 2 6 2 1 4
#> row6 7 5 1 4 0 5 9 9 7 9
#> row7 1 0 3 2 9 2 7 6 9 5
#> row8 5 3 10 0 7 6 2 9 0 6
#> row9 10 7 8 0 10 9 4 9 8 8
#> row10 5 9 8 4 8 6 10 6 5 9
### Create an incidence (presence/absence) matrix
### Data are coerced to logical as by as.logical
IncidenceMatrix(data = sample(0:1, 100, TRUE),
nrow = 10, ncol = 10)
#> <IncidenceMatrix: 10 x 10>
#> col1 col2 col3 col4 col5 col6 col7 col8 col9 col10
#> row1 TRUE FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE TRUE
#> row2 TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
#> row3 TRUE TRUE TRUE FALSE TRUE FALSE TRUE FALSE TRUE TRUE
#> row4 TRUE FALSE TRUE TRUE FALSE TRUE FALSE FALSE TRUE TRUE
#> row5 FALSE FALSE TRUE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
#> row6 TRUE TRUE TRUE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
#> row7 TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE TRUE TRUE
#> row8 FALSE FALSE TRUE TRUE TRUE FALSE TRUE TRUE FALSE FALSE
#> row9 FALSE TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE
#> row10 TRUE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
arkhe uses coercing mechanisms (with validation methods) for data type conversions:
### Create a count matrix
<- matrix(data = sample(0:10, 100, TRUE), nrow = 10, ncol = 10)
A0
### Coerce to absolute frequencies
<- as_count(A0)
A1
### Coerce to relative frequencies
<- as_composition(A1)
B
### Row sums are internally stored before coercing to a frequency matrix
### (use get_totals() to get these values)
### This allows to restore the source data
<- as_count(B)
A2 all(A1 == A2)
#> [1] TRUE
### Coerce to presence/absence
<- as_incidence(A1)
C
### Coerce to a co-occurrence matrix
<- as_occurrence(A1) D