library(pureseqtmr)
library(testthat)
library(knitr)
pureseqtmr
is a package to call PureseqTM from R.
PureseqTM predicts the topology of a membrane protein, where the topology can
be either inside, or not inside the membrane.
To be able to call PureseqTM, it needs to be installed:
pureseqtmr::install_pureseqtm()
Note that this code is not actually run, to comply with CRAN guidelines.
PureseqTM supplies some example files. Use get_example_filenames
to
get the path to all these files:
if (is_pureseqtm_installed()) {
get_example_filenames()
}
#> [1] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.fasta"
#> [2] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.pdbtm"
#> [3] "/home/richel/.local/share/PureseqTM_Package/example/1bhaA.tgt"
#> [4] "/home/richel/.local/share/PureseqTM_Package/example/4j7cK.fasta"
#> [5] "/home/richel/.local/share/PureseqTM_Package/example/4j7cK.top"
#> [6] "/home/richel/.local/share/PureseqTM_Package/example/Q8TCT8.fasta"
#> [7] "/home/richel/.local/share/PureseqTM_Package/example/human_20416.fasta"
#> [8] "/home/richel/.local/share/PureseqTM_Package/example/pipeline_v0.20.png"
#> [9] "/home/richel/.local/share/PureseqTM_Package/example/test_proteome.fasta"
In this example, 1bhaA.fasta
will be used. To obtain the full path,
use get_example_filename
. get_example_filename
will give an
error if the file is not found.
if (is_pureseqtm_installed()) {
fasta_filename <- get_example_filename("1bhaA.fasta")
head(readLines(fasta_filename))
}
#> [1] ">1bhaA"
#> [2] "QAQITGRPEWIWLALGTALMGLGTLYFLVKGMGVSDPDAKKFYAITTLVPAIAFTMYLSMLLGYGLTMVPF"
Getting the topology of this protein:
if (is_pureseqtm_installed()) {
topology <- predict_topology(fasta_filename)
kable(topology)
}
name | topology |
---|---|
1bhaA | 00000000001111111111111111111000000000000001111111111111111111100000000 |
Or show the topology as a plot:
if (is_pureseqtm_installed()) {
plot_topology(topology)
}
One needs the exact same code for a full proteome.
Here we use a pureseqtmr
example file,
which is the COVID-19 reference proteome,
as downloaded from https://www.uniprot.org/proteomes/UP000464024
.
fasta_filename <- system.file(
"extdata",
"UP000464024.fasta",
package = "pureseqtmr"
)
expect_true(file.exists(fasta_filename))
Show the (top of the) proteome:
head(readLines(fasta_filename))
#> [1] ">sp|P0DTC7|NS7A_SARS2 Protein 7a OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=7a PE=3 SV=1"
#> [2] "MKIILFLALITLATCELYHYQECVRGTTVLLKEPCSSGTYEGNSPFHPLADNKFALTCFS"
#> [3] "TQFAFACPDGVKHVYQLRARSVSPKLFIRQEEVQELYSPIFLIVAAIVFITLCFTLKRKT"
#> [4] "E"
#> [5] ">sp|P0DTD1|R1AB_SARS2 Replicase polyprotein 1ab OS=Severe acute respiratory syndrome coronavirus 2 OX=2697049 GN=rep PE=1 SV=1"
#> [6] "MESLVPGFNEKTHVQLSLPVLQVRDVLVRGFGDSVEEVLSEARQHLKDGTCGLVEVEKGV"
Getting the topology of this protein:
if (is_pureseqtm_installed()) {
topology <- predict_topology(fasta_filename)
}
Instead of directly showing the raw data, the protein names are shortened first:
if (is_pureseqtm_installed()) {
topology$name <- stringr::str_match(
string = topology$name,
pattern = "..\\|.*\\|(.*)_SARS2"
)[, 2]
}
Show the topology as a plot:
if (is_pureseqtm_installed()) {
plot_topology(topology)
}
And tally the number of transmembrane helices per protein:
if (is_pureseqtm_installed()) {
kable(tally_tmhs(topology))
}
name | n_tmhs |
---|---|
NS7A | 1 |
R1AB | 17 |
NS6 | 0 |
R1A | 17 |
VME1 | 3 |
A0A663DJA2 | 0 |
SPIKE | 1 |
ORF9B | 0 |
NS8 | 0 |
NCAP | 0 |
AP3A | 3 |
Y14 | 1 |
VEMP | 1 |
NS7B | 1 |