This vignette is a workflow template for data import and downstream analysis with mpwR including highlighting number of identifications, data completeness, quantitative and retention time precision etc. It demonstrates significant steps and showcases functions and applicability.
library(mpwR)
library(flowTraceR)
library(magrittr)
library(dplyr)
library(tidyr)
library(stringr)
library(tibble)
library(ggplot2)
library(flextable)
Importing the output files from each software can be performed with prepare_mpwR
. Please put all output files in one folder and follow the guidelines for naming the files. No other files/subfolders are allowed. Details are provided in the vignette Import.
<- prepare_mpwR(path = "Path_to_Folder_with_files") files
Some examples are provided to explore the workflow with create_example
.
<- create_example() files
The number of identifications can be determined with get_ID_Report
.
<- get_ID_Report(input_list = files) ID_Reports
For each analysis an ID Report is generated and stored in a list. Each ID Report entry can be easily accessed:
::flextable(ID_Reports[["DIA-NN"]]) flextable
Analysis | Run | ProteinGroup.IDs | Protein.IDs | Peptide.IDs | Precursor.IDs |
DIA-NN | R01 | 5 | 5 | 5 | 5 |
DIA-NN | R02 | 5 | 5 | 5 | 5 |
Each ID Report can be plotted with plot_ID_barplot
from precursor- to proteingroup-level. The generated barplots are stored in a list.
<- plot_ID_barplot(input_list = ID_Reports, level = "ProteinGroup.IDs") ID_Barplots
The individual barplots can be easily accessed:
"DIA-NN"]] ID_Barplots[[
As a visual summary a boxplot can be generated with plot_ID_boxplot
.
plot_ID_boxplot(input_list = ID_Reports, level = "ProteinGroup.IDs")
Data Completeness can be determined with get_DC_Report
for absolute numbers or in percentage.
<- get_DC_Report(input_list = files, metric = "absolute")
DC_Reports <- get_DC_Report(input_list = files, metric = "percentage") DC_Reports_perc
For each analysis a DC Report is generated and stored in a list. Each DC Report entry can be easily accessed:
::flextable(DC_Reports[["DIA-NN"]]) flextable
Analysis | Nr.Missing.Values | Precursor.IDs | Peptide.IDs | Protein.IDs | ProteinGroup.IDs | Profile |
DIA-NN | 1 | 0 | 0 | 4 | 2 | unique |
DIA-NN | 0 | 5 | 5 | 3 | 4 | complete |
Each DC Report can be plotted with plot_DC_barplot
from precursor- to proteingroup-level. The generated barplots are stored in a list.
<- plot_DC_barplot(input_list = DC_Reports, level = "ProteinGroup.IDs", label = "absolute") DC_Barplots
The individual barplots can be easily accessed:
"DIA-NN"]] DC_Barplots[[
plot_DC_barplot(input_list = DC_Reports_perc, level = "ProteinGroup.IDs", label = "percentage")[["DIA-NN"]]
As a visual summary a stacked barplot can be generated with plot_DC_stacked_barplot
.
plot_DC_stacked_barplot(input_list = DC_Reports, level = "ProteinGroup.IDs", label = "absolute")
plot_DC_stacked_barplot(input_list = DC_Reports_perc, level = "ProteinGroup.IDs", label = "percentage")
A report for Missed Cleavages can be generated with get_MC_Report
for absolute numbers or in percentage.
<- get_MC_Report(input_list = files, metric = "absolute")
MC_Reports <- get_MC_Report(input_list = files, metric = "percentage") MC_Reports_perc
For each analysis a MC Report is generated and stored in a list. Each MC Report entry can be easily accessed:
::flextable(MC_Reports[["Spectronaut"]]) flextable
Analysis | Missed.Cleavage | mc_count |
Spectronaut | 0 | 1 |
Spectronaut | 1 | 1 |
Spectronaut | 2 | 1 |
Spectronaut | 3 | 1 |
Each MC Report can be plotted with plot_MC_barplot
from precursor- to proteingroup-level. The generated barplots are stored in a list.
<- plot_MC_barplot(input_list = MC_Reports, label = "absolute") MC_Barplots
The individual barplots can be easily accessed:
"Spectronaut"]] MC_Barplots[[
plot_MC_barplot(input_list = MC_Reports_perc, label = "percentage")[["Spectronaut"]]
As a visual summary a stacked barplot can be generated with plot_MC_stacked_barplot
.
plot_MC_stacked_barplot(input_list = MC_Reports, label = "absolute")
plot_MC_stacked_barplot(input_list = MC_Reports_perc, label = "percentage")
The coefficient of variation (CV) can be calculated with get_CV_RT
. Only complete profiles are used.
<- get_CV_RT(input_list = files) CV_RT
As a visual summary a density plot for all analyses can be accessed via plot_CV_density
.
plot_CV_density(input_list = CV_RT, cv_col = "RT")
The CV can be calculated with get_CV_LFQ_pep
. Only complete profiles are used.
<- get_CV_LFQ_pep(input_list = files)
CV_LFQ_Pep #> For DIA-NN no quantitative LFQ data on peptide-level.
#> For PD no quantitative LFQ data on peptide-level.
As a visual summary a density plot for all analyses can be accessed via plot_CV_density
.
plot_CV_density(input_list = CV_LFQ_Pep, cv_col = "Pep_quant")
The CV can be calculated with get_CV_LFQ_pg
. Only complete profiles are used.
<- get_CV_LFQ_pg(input_list = files)
CV_LFQ_PG #> For PD no quantitative LFQ data on proteingroup-level.
As a visual summary a density plot for all analyses can be accessed via plot_CV_density
.
plot_CV_density(input_list = CV_LFQ_PG, cv_col = "PG_quant")
Common identifications and intersections between analyses can be highlighted.
Use get_Upset_list
to prepare for Upset plotting.
<- get_Upset_list(input_list = files, level = "ProteinGroup.IDs") Upset_prepared
The Upset plot can be generated with plot_Upset
.
plot_Upset(input_list = Upset_prepared, label = "ProteinGroup.IDs")
Functions of the package flowTraceR are incorporated in mpwR for inter-software comparisons. Software outputs are standardized and easily comparable.
Without standardizing the precursor-level information, the software outputs only form software-dependent cluster.
get_Upset_list(input_list = files, level = "Peptide.IDs") %>% #prepare Upset
plot_Upset(label = "Peptide.IDs") #plot
#> geom_path: Each group consists of only one observation. Do you need to adjust
#> the group aesthetic?
By enabling flowTraceR the precursor-level information is standardized and common identifications can be inferred.
get_Upset_list(input_list = files, level = "Peptide.IDs", flowTraceR = TRUE) %>% #prepare Upset
plot_Upset(label = "Peptide.IDs") #plot
mpwR offers functions to summarize the downstream analysis.
A summary report can be generated with get_summary_Report
.
<- get_summary_Report(input_list = files) Summary_Report
As a visual summary a radar chart for all analyses can be accessed via plot_radarchart
.
plot_radarchart(input_df = Summary_Report)
To highlight individual categories, the generated summary report can be easily adjusted and used for plotting.
#Focus on Data Completeness
%>%
Summary_Report ::select(Analysis, contains("Full")) %>% #Analysis column and at least one category column is required
dplyrplot_radarchart()