Manual - xadmix

First, let’s load the package. If you haven’t installed it yet, you can do so with the install.packages("xadmix") command to get the latest stable release from CRAN. Alternatively, you can install the latest development release from GitHub: devtools::install_github("SpaceCowboy-71/xadmix"). This, however, requires the package devtools.

Simulated Dataset

The xadmix package provides a dummy dataset (“xadmixture”) containing simulated genetic admixture data. Each observation has an accession identifier, the country where the plant material was collected, an entry for the species and five admixture coefficients, K. The values of these ancestries sum up to 1.

data("xadmixture")
str(xadmixture)
#> 'data.frame':    600 obs. of  8 variables:
#>  $ acc    : chr  "AA_193373_" "AA_163594_" "AA_35629_" "AA_151602_" ...
#>  $ country: chr  "GBR" "NLD" "FRA" "ESP" ...
#>  $ species: chr  "lorem" "lorem" "ipsum" "dolor" ...
#>  $ K1     : num  0.0979 0.0196 0.0314 0.093 0.0756 ...
#>  $ K2     : num  0.0205 0.2463 0.0863 0.2586 0.079 ...
#>  $ K3     : num  0.0618 0.2183 0.3026 0.1551 0.1732 ...
#>  $ K4     : num  0.4511 0.4329 0.226 0.2289 0.0472 ...
#>  $ K5     : num  0.369 0.083 0.354 0.264 0.625 ...

# number of observations per country
table(xadmixture$country)
#> 
#> DEU ESP FRA GBR ITA NLD 
#> 100 160 130 100  50  60

# number of observations per species
table(xadmixture$species)
#> 
#> dolor ipsum lorem 
#>   200   200   200

Subsetting

xadmix comes with an optimized data subsetting function, admix_subset(). It enables filtering of the data for ancestry (K) percentages greater or less than a certain value. Additionally, any number of columns can be filtered for values supplied in a vector. By default, it also prints the progress after each subsetting step.

Multiple ancestries along with their percentages can be passed as argument vectors. Subsetting is then done pairwise, filtering the first ancestry by the first pecentage, then the second ones and so on. Some examples:

# keep only observations with K1 > 0.15 and K2 > 0.01
subset1 <- admix_subset(xadmixture, 
                        anc = c("K1", "K2"), 
                        pct = c(0.15, 0.01))
#> observations: 600 
#> 1. subset, K1 > 0.15
#>  observations: 39 
#> 2. subset, K2 > 0.01
#>  observations: 38

# keep only observations with K2 < 0.1 and K3 < 0.1
subset2 <- admix_subset(xadmixture, 
                        anc = c("K2", "K3"), 
                        pct = c(0.1, 0.1), 
                        comparison = "less")
#> observations: 600 
#> 1. subset, K2 < 0.1
#>  observations: 215 
#> 2. subset, K3 < 0.1
#>  observations: 41

# filtering for countries and species
subset3 <- admix_subset(xadmixture, 
                        country = c("GBR", "FRA"), 
                        species = c("lorem", "dolor"))
#> observations: 600 
#> keeping only specified values in col: country 
#>  observations left after this step: 230 
#> keeping only specified values in col: species 
#>  observations left after this step: 144

Subsets can be chained using the pipe operator (%>%) from package magrittr. This way, filtering for percentages greater and less than certain values is possible.
By setting the argument quiet to TRUE, no subset progress will be printed.

library(magrittr)
# keep only observations with K1 > 0.1 and K4 < 0.3,
# without printing subset progress
subset4 <- admix_subset(xadmixture, 
                        anc = "K1", 
                        pct = 0.1, 
                        quiet = TRUE) %>% 
admix_subset(anc = "K1", 
             pct = 0.3, 
             comparison = "less", 
             quiet = TRUE)

# print number of observations for comparison
nrow(xadmixture)
#> [1] 600
nrow(subset4)
#> [1] 157

Admixture Plots

xadmix also comes with a user-friendly function for generating pretty stacked barplots, optimized to use with admixture data.

If the dataset were to contain only the accession numbers in the first column and ancestry percentages in subsequent columns, the function could be used without any further arguments.

However, the xadmixture dataset contains additional columns. Therefore, the columns for all K`s have to be selected by passing an argument. An example plotting the whole dataset:

# ancestries (K) are in the fourth to last column,
# and plotted without showing bar labels
admix_barplot(xadmixture,
   K = 4:ncol(xadmixture),
   names = FALSE
)

Grouped Stacked Barplots & Customized Appearance

More information can be gained by looking at the subsets generated above:

# grouping data by column "country",
# and sorting each group by ancestry column "K1"
admix_barplot(subset1,
   K = 4:ncol(xadmixture),
   grouping = "country",
   sortkey = "K1"
)


# changing color palette to "turbo" from package 'viridis',
admix_barplot(subset2,
   K = 4:ncol(xadmixture),
   palette = "turbo",
   grouping = "species",
   sortkey = "K4"
)

# removing title and changing axis labels text
admix_barplot(subset3,
   K = 4:ncol(xadmixture),
   main = "",
   xlab = "Accessions",
   ylab = "Ancestry [%]", 
   palette = "alternating",
   sortkey = "K1",
   names = FALSE
)

Noclip Feature

Sometimes the group labels can get cut off. When there are not enough observations in the group, the width of the grouped bars can be smaller than the width of the group label. ggplot2 does not yet offer a flag to avoid this problem. In xadmix, however, there is an option to remove clipping altogether:

# directly output grouped plot with clipping removed from elements
# (useful if there are groups with a low number of observations)
subset5 <- admix_subset(xadmixture,
                        anc = c("K3", "K4"), 
                        pct = c(0.3, 0.2), 
                        quiet = TRUE)

# noclip set to "TRUE"
admix_barplot(subset5, 
            K = 4:ncol(xadmixture),          
            sortkey = "K5",
            grouping = "country", 
            palette = "viridis", 
            names = FALSE, 
            main = "Noclip on",
            noclip = TRUE)

# noclip set to "FALSE"
admix_barplot(subset5, 
            K = 4:ncol(xadmixture),          
            sortkey = "K5",
            grouping = "country", 
            palette = "viridis", 
            names = FALSE, 
            main = "Noclip off",
            noclip = FALSE)

♦