psSubpathway is a method designed to discover the relationship between subpathway and multiple sample phenotype.psSubpathway consists of four parts:
Its capabilities render psSubpathway could find the specific dysregulated subpathways in multi-phenotypes samples, and fill the gap in recent subpathway identification methods which generally found the differentially expressed subpathways between normal and cancer samples. psSubpathway may identify more precision biomarkers and phenotype-related disease mechanisms to help to tailored treatment for patients with cancer.
The function SubSEA can identify subtype-specific subpathways across multi-subtype groups of cancer samples. For each subpathway, we ranked the N samples in the dataset to form a sample list L=<s1, s2…sN> according to decreasing subpathway activity. The samples in a given subtype were mapped to the sample list L. If the samples in the subtype significantly cluster at the top or bottom of the entire ranked list L, the subpathway will be associated with the specific subtype. We used the weighted Kolmogorov-Smirnov statistic to calculate an sample enrichment score (SES), which reflects the degree to which the samples in a subtype is overrepresented toward the extremes (top or bottom) of the sample list L. The SES is calculated by walking down the list L, increasing a running-sum statistic when we encounter a sample in the subtype and decreasing it when we encounter a sample not in the subtype.
This function requires user input the gene expression profile identified and the corresponding phenotype file of the sample (CLS format).In addition, this function requires input of subpathway list data,which has been extracted from KEGG pathway data and saved into the package environment variables. Of course, users can also define their own dataset list data as input, as long as the gene identification and gene expression profile of gene sets are maintained.
We selected the breast cancer gene expression profile data in the GDC TCGA database and pm50 phenotype file containing four breast cancer subtypes for SubSEA and obtained the results of subpathways related to each subtype of breast cancer. The commands are as follows.
# load depend package.
require(GSVA)
#> Loading required package: GSVA
require(parallel)
#> Loading required package: parallel
# get breast cancer disease subtype gene expression profile.
Bregenematrix<-get("Subgenematrix")
# get path of the sample disease subtype files.
Subtypelabels<- system.file("extdata", "Sublabels.cls", package = "psSubpathway")
# perform the SubSEA method.
#SubSEAresult<-SubSEA(Bregenematrix,
# input.cls=Subtypelabels,
# nperm=50, # Number of random permutations
# fdr.th=0.01, # Cutoff value for fdr.when fdr.th=1,get all subpathway
# parallel.sz=2) # Number of processors to use
# get the result of the SubSEA function
SubSEAresult<-get("Subspwresult")
str(SubSEAresult)
#> List of 5
#> $ Basal :'data.frame': 15 obs. of 6 variables:
#> ..$ SubpathwayID : Factor w/ 15 levels "00120_19","00120_9",..: 2 1 3 4 5 6 7 8 9 10 ...
#> ..$ PathwayName : Factor w/ 14 levels "Apoptosis","C-type lectin receptor signaling pathway",..: 10 10 3 13 8 6 4 11 1 14 ...
#> ..$ SubpathwayGene: Factor w/ 15 levels "AMACR ACOX2 HSD17B4 SCP2",..: 1 13 6 7 9 3 15 12 2 4 ...
#> ..$ SES : num [1:15] -0.885 -0.829 -0.848 -0.767 -0.665 ...
#> ..$ Pvalue : num [1:15] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ FDR : num [1:15] 0 0 0 0 0 0 0 0 0 0 ...
#> $ Her2 :'data.frame': 25 obs. of 6 variables:
#> ..$ SubpathwayID : Factor w/ 25 levels "00010_6","00020_4",..: 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ PathwayName : Factor w/ 24 levels "Arginine and proline metabolism",..: 12 3 8 1 16 11 24 22 5 17 ...
#> ..$ SubpathwayGene: Factor w/ 25 levels "ADCY1 ADCY2 ADCY3 ADCY5 ADCY6 ADCY7 ADCY8 ADCY9 ADCY4 PRKACA PRKACB PRKACG CREB1 ATF2 CREM ATF1 ATF3 ATF4 XBP1 "| __truncated__,..: 3 18 2 23 13 22 12 24 7 16 ...
#> ..$ SES : num [1:25] 0.704 0.681 0.725 0.817 0.711 ...
#> ..$ Pvalue : num [1:25] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ FDR : num [1:25] 0 0 0 0 0 0 0 0 0 0 ...
#> $ LumA :'data.frame': 49 obs. of 6 variables:
#> ..$ SubpathwayID : Factor w/ 49 levels "00052_2","00340_5",..: 1 2 3 5 4 6 7 8 9 10 ...
#> ..$ PathwayName : Factor w/ 36 levels "Adipocytokine signaling pathway",..: 15 18 32 11 11 30 13 21 7 14 ...
#> ..$ SubpathwayGene: Factor w/ 49 levels "ADCY1 ADCY2 ADCY3 ADCY5 ADCY6 ADCY7 ADCY8 ADCY9 ADCY4 PRKACA PRKACB PRKACG CREB1 ATF2 CREM ATF1 ATF3 ATF4 XBP1 "| __truncated__,..: 3 23 14 17 12 49 16 42 41 10 ...
#> ..$ SES : num [1:49] -0.731 0.773 -0.554 0.677 0.703 ...
#> ..$ Pvalue : num [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ FDR : num [1:49] 0 0 0 0 0 0 0 0 0 0 ...
#> $ LumB :'data.frame': 22 obs. of 6 variables:
#> ..$ SubpathwayID : Factor w/ 22 levels "00350_2","00563_1",..: 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ PathwayName : Factor w/ 22 levels "Amyotrophic lateral sclerosis (ALS)",..: 22 7 18 15 6 4 12 8 2 13 ...
#> ..$ SubpathwayGene: Factor w/ 22 levels "ADAM17 HBEGF ADAM10 CXCL8 CXCR1 CXCR2 EGFR",..: 6 17 7 11 10 14 12 18 19 21 ...
#> ..$ SES : num [1:22] 0.576 0.559 0.608 0.627 -0.658 ...
#> ..$ Pvalue : num [1:22] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ FDR : num [1:22] 0 0 0 0 0 0 0 0 0 0 ...
#> $ spwmatrix: num [1:725, 1:67] 0.1892 0.0467 0.1907 0.0874 0.1993 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:725] "00010_1" "00010_2" "00010_5" "00010_6" ...
#> .. ..$ : chr [1:67] "Her2" "LumA" "LumB" "LumA" ...
head(SubSEAresult$Basal)
#> SubpathwayID
#> 00120_9 00120_9
#> 00120_19 00120_19
#> 00232_1 00232_1
#> 00280_10 00280_10
#> 00510_8 00510_8
#> 00601_8 00601_8
#> PathwayName
#> 00120_9 Primary bile acid biosynthesis
#> 00120_19 Primary bile acid biosynthesis
#> 00232_1 Caffeine metabolism
#> 00280_10 Valine, leucine and isoleucine degradation
#> 00510_8 N-Glycan biosynthesis
#> 00601_8 Glycosphingolipid biosynthesis - lacto and neolacto series
#> SubpathwayGene
#> 00120_9 AMACR ACOX2 HSD17B4 SCP2
#> 00120_19 SLC27A5 CYP27A1 AMACR ACOX2 HSD17B4
#> 00232_1 CYP1A2 XDH NAT2 NAT1 CYP2A6
#> 00280_10 EHHADH HADH ACADSB AOX1 ALDH2 ALDH1B1 ALDH9A1 ALDH3A2 ALDH7A1 ALDH6A1 ABAT HIBADH HIBCH ECHS1 HADHA ACADM ACADS ACAD8 AGXT2
#> 00510_8 FUT8 MGAT3 MGAT2 MAN2A2 MAN2A1 MAN1A2 MAN1B1 MAN1A1 MAN1C1 MGAT4B MGAT4A MGAT1
#> 00601_8 B4GALT4 B4GALT3 B4GALT2 FUT4 FUT7 FUT3 FUT5 FUT6 ST8SIA1 ST3GAL6 FUT9 B3GNT3 B3GNT2 B3GNT4 B4GALT1 ABO FUT1 FUT2 B3GNT5 B3GALT1 B3GALT2 B3GALT5 ST3GAL3 ST3GAL4 A4GALT B3GALNT1
#> SES Pvalue FDR
#> 00120_9 -0.88462 0 0
#> 00120_19 -0.82945 0 0
#> 00232_1 -0.84790 0 0
#> 00280_10 -0.76735 0 0
#> 00510_8 -0.66495 0 0
#> 00601_8 0.74271 0 0
The function DCSA used the information-theoretic measure of statistical dependence, mutual information (MI), to estimate the dynamically changed subpathways associated with the phenotypic change.
This function requires loading dependent mpmi packages. We selected adrenocortical cancer (ACC) gene expression profile and the disease stage phenotypes from the GDC TCGA database as an example of DCSA function. The sample contains four disease stages(StageI~StageIV). The following commands will perform DCSA functions to obtain dynamic change subpathways related to changes in the breast cancer development stage.
# load depend package.
require(mpmi)
# get ACC disease stage gene expression profiling.
ACCgenematrix<-get("DCgenematrix")
# get path of the sample disease stage phenotype files.
Stagelabels<-system.file("extdata", "DClabels.cls", package = "psSubpathway")
# perform the DCSA method.
#DCSAresult<-DCSA(ACCgenematrix,input.cls=Stagelabels,nperm=50,fdr.th=0.01,parallel.sz=2)
# get the result of the SubSEA function
DCSAresult<-get("DCspwresult")
str(DCSAresult)
#> List of 2
#> $ DCSA :'data.frame': 13 obs. of 6 variables:
#> ..$ SubpahwayID : Factor w/ 13 levels "00220_2","00520_2",..: 1 2 3 4 5 6 7 8 9 10 ...
#> ..$ PahwayName : Factor w/ 12 levels "Amino sugar and nucleotide sugar metabolism",..: 2 1 4 12 6 3 3 10 11 5 ...
#> ..$ SubpathwayGene: Factor w/ 13 levels "CDK1 CCNB2 ANAPC10 CDC26 ANAPC13 ANAPC2 ANAPC4 ANAPC5 ANAPC7 ANAPC11 ANAPC1 CDC23 CDC16 CDC27 CDC20 PTTG2 PTTG1"| __truncated__,..: 7 9 4 11 3 6 5 1 10 13 ...
#> ..$ D(M) : num [1:13] 0.191 0.166 0.254 0.191 0.153 ...
#> ..$ Pvalue : num [1:13] 0 0 0 0 0 0 0 0 0 0 ...
#> ..$ FDR : num [1:13] 0 0 0 0 0 0 0 0 0 0 ...
#> $ spwmatrix: num [1:725, 1:77] -0.146 -0.507 -0.503 -0.502 -0.158 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:725] "00010_1" "00010_2" "00010_5" "00010_6" ...
#> .. ..$ : chr [1:77] "StageI" "StageI" "StageI" "StageI" ...
head(DCSAresult$DCSA)
#> SubpahwayID PahwayName
#> 00220_2 00220_2 Arginine biosynthesis
#> 00520_2 00520_2 Amino sugar and nucleotide sugar metabolism
#> 00982_2 00982_2 Drug metabolism - cytochrome P450
#> 03013_2 03013_2 RNA transport
#> 04066_10 04066_10 HIF-1 signaling pathway
#> 04110_6 04110_6 Cell cycle
#> SubpathwayGene
#> 00220_2 GLS2 GLS GLUL CPS1 OTC ASS1 ARG1 ARG2 NAGS ACY1 NOS1 NOS2 NOS3 GLUD1 GLUD2 GOT1 GOT2 GPT GPT2
#> 00520_2 NAGK PGM3 UAP1 UAP1L1 GNE GNPDA1 GNPDA2 AMDHD2 GFPT1 GFPT2 GNPNAT1 NANS RENBP HEXA HEXB CHIT1 CHIA NPL HK1 HK2 HK3 HKDC1
#> 00982_2 FMO1 FMO2 FMO3 FMO4 FMO5 CYP3A4 CYP2D6 CYP2C9
#> 03013_2 RAN XPO1 NCBP2 NCBP1 PHAX NXF1 NXF5 NXF3 NXF2 NXF2B NUP214 NUP88 NUP42
#> 04066_10 EGLN2 EGLN3 EGLN1 PIK3CA PIK3CB PIK3CD PIK3R1 PIK3R2 PIK3R3 AKT3 AKT1 AKT2 MTOR PLCG1 PLCG2 PRKCA PRKCB PRKCG EIF4EBP1 RPS6KB1 RPS6KB2 RPS6 MAPK1 MAPK3
#> 04110_6 FZR1 ANAPC10 CDC26 ANAPC13 ANAPC2 ANAPC4 ANAPC5 ANAPC7 ANAPC11 ANAPC1 CDC23 CDC16 CDC27 CDK1 PLK1 CDC14B CDC14A YWHAQ YWHAB YWHAE YWHAG YWHAH YWHAZ SFN CHEK1 CHEK2 PKMYT1 WEE2 WEE1 GADD45G GADD45A GADD45B TP53 CDC20 CDC25B CDC25C CCNB3 CCNB1 CCNB2
#> D(M) Pvalue FDR
#> 00220_2 0.1909344 0 0
#> 00520_2 0.1663632 0 0
#> 00982_2 0.2541684 0 0
#> 03013_2 0.1913032 0 0
#> 04066_10 0.1529787 0 0
#> 04110_6 0.1782991 0 0
We provide a set of visual analysis functions including plotSubSEScurve
,plotSpwACmap
,plotSpwNetmap
and plotheatmap
.
The plotSubSEScurve function can plot the subtype set sample enrichment curve graph of the subpathway. The follows commands can plot the subtype set sample enrichment curve graph and we can see the enrichment distribution of the four subtypes of breast cancer samples in the subpathway 00120_9 activity. We can observe that the sample of subtype Basal is specifically enriched to the low expression region of subpathway 00120_9.
# plot enrichment score curve of the subpathway 00120_9 in all breast cancer subtypes
plotSubSEScurve(Subspwresult,
spwid="00120_9", # The subpathway id which subpahtway is ploted
phenotype="all") # Which phenotypic phenotype set enrichment curve is drawn
# plot enrichment score curve of the subpathway 00120_9 in the Basal breast cancer subtype.
plotSubSEScurve(Subspwresult,spwid="00120_9",phenotype="Basal")
The plotSpwACmap function can plot subpathway activity change map (includes subpathway active change box plot and subpathway active change heat map). We can observe the changes in subpathway activity values in breast cancer subtypes or stages and the distribution of each subtype samples. The commands are as follows:
# plot the subpathway 00120_9 in the SubSEA function result.
plotSpwACmap(Subspwresult,spwid="00120_9")
# plot the subpathway 00982_2 in the DCSA function result.
plotSpwACmap(DCspwresult,spwid="00982_2")
The plotheatmap function presents a heat map of the subpathway matrix according to the user’s set conditions. The following commands plot heat map of significant up-regulation or down-regulation subpathways. We can observe the obvious block area from the heat map in each subtype.
# load depend package.
require(pheatmap)
#> Loading required package: pheatmap
# plot significant up-regulation subpathway heat map specific for each breast cancer subtype.
plotheatmap(Subspwresult,
fdr.th=0.01, # Cutoff value for fdr
plotSubSEA=TRUE, # Indicate that the input data is the result of the SubSEA function
SES="positive", # Obtain a subpathway with a positive SES value
phenotype="all")
# plot significant down-regulation subpathway heat map specific for each breast cancer subtype.
plotheatmap(Subspwresult,plotSubSEA=TRUE,fdr.th=0.01,SES="negative",phenotype="all")
We can also draw a single subtype-specific significant pathway heat map. We can see that there are distinct specific regions under the Basal subtype sample. The commands and results are as follows:
# plot Basal subtype specific significant subpathway heat map.
plotheatmap(Subspwresult,plotSubSEA=TRUE,fdr.th=0.01,SES="all",phenotype="Basal")
Since the DCSA function is used to mine the subpathways that change dynamically with the phenotype, the subpathway heat map is scattered. As follows:
# plot a heat map of the subpathway that is significantly associated with breast cancer stages.
plotheatmap(DCspwresult,plotSubSEA=FALSE,fdr.th=0.01)
The plotSpwPSheatmap function presents a heat map of the T-test P-value of the activity of the subpathway between the phenotypes. The lower the number in the cells in the heat map, the greater the difference in the activity of the subpathways between the two phenotypes. The following commands plot the heat map. The activity of subpathway 00120_9 is significantly different from other subtypes in the breast cancer basal subtype.
# get the Subspwresult which is the result of SubSEA method.
Subspwresult<-get("Subspwresult")
# plot significant heat map between the activity of the subpathway in each subtype of breast cancer.
plotSpwPSheatmap(Subspwresult,spwid="00120_9")
The plotSpwPSheatmap function can also be used for the results of the DCSA method.
# get the DCspwresult which is the result of DCSA method.
DCspwresult<-get("DCspwresult")
##' # plot significant heat map between the activity of the subpathway in each stage of breast cancer.
plotSpwPSheatmap(DCspwresult,spwid="00982_2")
The function plotSpwNetmap for visualization of a subpathway network map. The commands are as follows: