Differential Co-Expression and Differential Expression Analysis

Package description

Integrated differential expression (DE) and differential co-expression (DC) analysis on gene expression data based on DECODE (DifferEntial CO-expression and Differential Expression) algorithm. Given a set of gene expression data and functional gene set data, the program will return a table summary of the selected gene sets with high differential co-expression and high differential expression (HDC-HDE).

Reference

Lui, TWH, Tsui, NBY, Chan, LWC, SP Siu, PM, Wong, C, Yung, BYM. (2015) DECODE: an integrated differential co-expression and differential expression analysis of gene expression data. BMC Bioinformatics, May 31;16:182. http://www.biomedcentral.com/1471-2105/16/182?fmt_math=yes&fmt_math_check=on

Input gene expression data format

Data format:

Columns:
- Columns are tab separated
- Column 1: Official gene symbol
- Column 2: Probe ID
- Starting from column 3: Expression for different samples
Rows:
- Row 1 (starting from column 3): Sample class (“1” indicates control group; “2” indicates case group)
- Row 2: Sample id
- Starting from row 3: Expression for different genes
An example for data format:

geneName	probeID	2	2	2	1	1	1
-	-	Case Sample 1	Case Sample 2	Case Sample 3	Control Sample 1	Control Sample 2	Control Sample 3
7A5	ILMN_1762337	5.12621	5.19419	5.06645	5.40649	5.51259	5.38700
A1BG	ILMN_2055271	5.63504	5.68533	5.66251	5.37466	5.43955	5.50973
A1CF	ILMN_2383229	5.41543	5.58543	5.43239	5.49634	5.62685	5.36962
A26C3	ILMN_1653355	5.56713	5.55470	5.59547	5.46895	5.49622	5.50094
A2BP1	ILMN_1814316	5.23016	5.33808	5.31413	5.30586	5.40108	5.31855
A2M	ILMN_1745607	7.65332	6.56431	8.20163	9.19837	9.04295	10.1448
A2ML1	ILMN_2136495	5.53532	5.93801	5.33728	5.36676	5.79942	5.13974
A3GALT2	ILMN_1668111	5.18578	5.35207	5.30554	5.26107	5.26536	5.28932
A4GALT	ILMN_1735045	6.34751	5.56750	6.92335	7.49523	7.12119	6.54748
A4GNT	ILMN_1680754	5.26417	5.28596	5.27560	5.28830	5.08440	5.44869

Input gene set data format

Data format:

Rows:
- Each row consists of a gene set (including gene set names, id, and genes)
Columns:
- Columns are tab separated
- Column 1: Name of gene set
- Column 2: Gene set ID (e.g. GO ID)
- Starting from column 3: Genes (using official gene symbols) in the gene set
An example for data format:

Column 1	Column 2	Column 3, 4, …
positive regulation of epidermal growth factor-activated receptor activity	GO 0045741	EREG FBXW7 EPGN ADAM17 ADRA2C ADRA2A TGFA EGF
pyrimidine-containing compound salvage	GO 0008655	UPP1 TYMP TK1 UPP2 UCKL1 CDA TK2 UCK1 DCK

To load the package:

library(decode)

Example 1:

Running a larger set of gene expression data with 1400 genes. It will take ~16 minutes to complete. (Computer used: An Intel Core i7-4600 processor, 2.69 GHz, 8 GB RAM)

path = system.file('extdata', package='decode')
geneSetInputFile = file.path(path, "geneSet.txt")
geneExpressionFile = file.path(path, "Expression_data_1400genes.txt")
runDecode(geneSetInputFile, geneExpressionFile)

Example 2:

A smaller set of gene expression data with 50 genes to satisfy CRAN’s submission requirement. (No results will be generated)

path = system.file('extdata', package='decode')
geneSetInputFile = file.path(path, "geneSet.txt")
geneExpressionFile = file.path(path, "Expression_data_50genes.txt")
runDecode(geneSetInputFile, geneExpressionFile)

## [1] "Reading gene expression data..."
## [1] "Calculating t-statistics..."
## [1] "Calculating pairwise correlation for normal states..."
## [1] "Calculating pairwise correlation for disease states..."
## [1] "Calculating differential co-expression measures ..."
## [1] "Reading functional gene set data"
## [1] "Identifying optimal thresholds for genes"
## [1] "Gene id: 1"
## [1] "Gene id: 2"
## [1] "Gene id: 3"
## [1] "Gene id: 4"
## [1] "Gene id: 5"
## [1] "Gene id: 6"
## [1] "Gene id: 7"
## [1] "Gene id: 8"
## [1] "Gene id: 9"
## [1] "Gene id: 10"
## [1] "Gene id: 11"
## [1] "Gene id: 12"
## [1] "Gene id: 13"
## [1] "Gene id: 14"
## [1] "Gene id: 15"
## [1] "Gene id: 16"
## [1] "Gene id: 17"
## [1] "Gene id: 18"
## [1] "Gene id: 19"
## [1] "Gene id: 20"
## [1] "Gene id: 21"
## [1] "Gene id: 22"
## [1] "Gene id: 23"
## [1] "Gene id: 24"
## [1] "Gene id: 25"
## [1] "Gene id: 26"
## [1] "Gene id: 27"
## [1] "Gene id: 28"
## [1] "Gene id: 29"
## [1] "Gene id: 30"
## [1] "Gene id: 31"
## [1] "Gene id: 32"
## [1] "Gene id: 33"
## [1] "Gene id: 34"
## [1] "Gene id: 35"
## [1] "Gene id: 36"
## [1] "Gene id: 37"
## [1] "Gene id: 38"
## [1] "Gene id: 39"
## [1] "Gene id: 40"
## [1] "Gene id: 41"
## [1] "Gene id: 42"
## [1] "Gene id: 43"
## [1] "Gene id: 44"
## [1] "Gene id: 45"
## [1] "Gene id: 46"
## [1] "Gene id: 47"
## [1] "Gene id: 48"
## [1] "Gene id: 49"
## [1] "Gene id: 50"
## [1] "Identifying best associated functional gene set for each gene..."
## [1] "Gene id: 1"
## [1] "Gene id: 2"
## [1] "Gene id: 3"
## [1] "Gene id: 4"
## [1] "Gene id: 5"
## [1] "Gene id: 6"
## [1] "Gene id: 7"
## [1] "Gene id: 8"
## [1] "Gene id: 9"
## [1] "Gene id: 10"
## [1] "Gene id: 11"
## [1] "Gene id: 12"
## [1] "Gene id: 13"
## [1] "Gene id: 14"
## [1] "Gene id: 15"
## [1] "Gene id: 16"
## [1] "Gene id: 17"
## [1] "Gene id: 18"
## [1] "Gene id: 19"
## [1] "Gene id: 20"
## [1] "Gene id: 21"
## [1] "Gene id: 22"
## [1] "Gene id: 23"
## [1] "Gene id: 24"
## [1] "Gene id: 25"
## [1] "Gene id: 26"
## [1] "Gene id: 27"
## [1] "Gene id: 28"
## [1] "Gene id: 29"
## [1] "Gene id: 30"
## [1] "Gene id: 31"
## [1] "Gene id: 32"
## [1] "Gene id: 33"
## [1] "Gene id: 34"
## [1] "Gene id: 35"
## [1] "Gene id: 36"
## [1] "Gene id: 37"
## [1] "Gene id: 38"
## [1] "Gene id: 39"
## [1] "Gene id: 40"
## [1] "Gene id: 41"
## [1] "Gene id: 42"
## [1] "Gene id: 43"
## [1] "Gene id: 44"
## [1] "Gene id: 45"
## [1] "Gene id: 46"
## [1] "Gene id: 47"
## [1] "Gene id: 48"
## [1] "Gene id: 49"
## [1] "Gene id: 50"
## [1] "Processing raw results..."
## [1] "Summarizing functional gene set results..."
## [1] "Done. Result is saved in out_summary.txt"