Gene expression profiling of ovarian carcinomas and prognostic analysis of outcome

Overall gene-expression profiling standardization

As previously described, we generated original genome expression profiling and mRNA
expression data got from each data set (GSE12470, GSE14764, GSE49997 and GSE63885),
after using GEO database matching the probe ID in the platform to Gene Symbols, corresponding
genes and gene’s IDs were collected from these data sets respectively. A quantitative
genome expression distributions map are showed in type of box-plots (see Fig. 1), values from each dataset were linearized when provided as logarithms, raw files
were converted into pre-processed data by RMA with default parameters 27].

Fig. 1. Box-plots of the distribution of gene expression values for analysis of ovarian cancer
gene expression profiling (with a p-value of 0.05 and FC of 2.0). The abscissa represents
each candidate IDs, while the vertical axis marks the data of genome expression of
related patients, all the datein genome expression profiling are the mean value of
many experiment locations. a GSE12470 dataset, 13356 genes in 53 samples (10 normal, 35 advanced, 8 early), (b) GSE14764dataset, 80 samples in different types of ovarian cancer and 13046 genes,
C) GSE49997 dataset, 204 samples in various epithelial ovarian cancer and 16150 genes,
D) GSE63885 dataset, 101 samples in different ovarian cancer and mRNA and 20693 genes.
a, b, c are consisted of gene-expression data, while (d) reveals mRNA expression levels. Apparently, most genes’ expression values are approximately
in each sample

Ovarian cancer’s genetic screening and pathways analysis

According to the procedure adopted by Dai et al.26], pre-processed data of 53 samples (Fig. 1a) were analyzed by SAM in R environment, samples including data from patients in various
stage and non-cancer individuals. Lists of 3095 differentially expressed genes are
collected (Accompanying Table 1), showing (i.e., fold change (FC) equals 2.0) were generated at SAM p-value thresholds
of 5 %.

Table 1. Pathway analyses between normal person and patients in different stages (top 10)

To identify the biological processes associated with these 3095 differential expressed
genes, we explore the DAVID; http://david.abcc.ncifcrf.gov/). Compared with online human genome database, the top 10 enriched clusters with the
511 genes mainly distributed at cell cycle including mitosis, deposition of nucleosomes
at the centromere, Chromosome Maintenance including Chromosome, telomere maintenance
and nucleosome assembly, Regulation of RNA transcription level including RNA polymerase
I (Table 1, Accompanying Table 2).

Table 2. Top 10 in weighted gene co-expression network analysis

Based on these 511 genes related to top 10 pathways, overall 80 candidates were completely
clustered by principal component analysis (PCA), which indicates a high-performance
of differences genetic screening (Fig. 2).

Fig. 2. Clustering map base on 511 screened differential genes. Red spot indicate healthy
individual, spot in blue indicate patient suffering from ovarian cancer, spots in
different colors are effectively separated from each other

Differences genetic screening and pathways analysis on ovarian cancer in different
stages

By using WGCNA software in R language, gene co-expression networks (Accompanying Table 3) are established from 3095 differential expression genes (Accompanying Table 1). Each gene was weighted and ranked by calculating the network edges, top 10 are
showed in Table 2, Gene RACGAP1 34], RAD51AP1 35], RAE1 36], NEK2 37] had been reported as ovarian cancer related genes, while the others are newly defined
related gene. In addition, these 3095 genes were divided into 17 modules (Table 3) by the block-wise, Modules function of WGCNA package. After further screening on
Global-Ancova package using R language and comparing original gene-expression data
set GSE12470, 4 network modules of differentially expressed cancer genes were identified
(Table 4, details showed in Accompanying Table 4) as the representative module to apply function analyses because most of genes in
the network are expressed in the candidate who suffered from cancer.

Table 3. Differential expressed gene divided into 17 networkmodules

Table 4. Pathway analyses in dominantnetwork modules composed by differential expression ovarian
cancer genes

GO and KEGG analysis on these 4 modules (Table 4) shows blue modules is mainly take part in female metabolism regulation and controlling:
Androgen and estrogen metabolism and Steroid hormone biosynthesis which straightly
related to ovarian functions, Aminoacyl-tRNA biosynthesis which play a key role in
protein synthesis 38] and has been suggested to be associated with the progression of various ovarian cancers
39], 40], most interested is porphyrin and chlorophyll metabolism pathways also be involved
into ovarian cancer progression, porphyrin was reported as treatment elements for
ovarian cancer 41], while chlorophyll as important grapevine iron nutrition for blood 42], 43] which most females are short for it 44], besides, some reporter illustrated cancer resistance protein can against the porphyrin
and chlorophyll metabolism 45], thus, blue module may potentially denotes the progress of ovarian cancer and support
to our subsequence prognosis analysis. Besides, gene UMPS ranked second was involved
in pathway of aminoacyl-tRNA biosynthesis, further suggests that UMPS could be related
to a certain ovarian cancer. And gene IARS belongs to drug metabolism pathway in blue
module ranked eighth in Table 2, suggesting that this gene maybe important for applicability of drug treatment in
specific case.

Greenyellow module is mainly related to PPAR signaling pathway, which is involved
in ovarian follicle development 46] and ovarian cancers progress 47]. Grey module is mainly devoted to melanogenesis. Presently, no representation shows
melanogenesis is related to cancer progression, but melanogenesis is regarded as a
potential instruction for understanding of complex diseases 48]. In currently study, we select the modules to evaluate ovarian cancer in different
stages and various types, thus, this module probably take an important part in subsequence
prognosis analysis for patients in various conditions, beside, the other functions
of this modules also help to analysis cancer proceeding like amino acids metabolism
and energy homeostasis. Tan module is mainly devoted to carbohydrates and sucrose
metabolism, and this is a risk factor for many cancer 49] and female ovarian health 50], also very important to diagnosis of advanced ovarian cancer patients 51], 52].

All these supported researches and relevance data illustrated that we had generated
network modules from differential expression genes of various ovarian cancers successfully,
and these networks are competent for predict ovarian cancer’ subgroups, also potentially
indicate the proceeding of ovarian cancer in different patients.

Prognostic analysis of subgroups of ovarian cancers

1073 differential expression genes involved in the 4 dominant network modules were
generated from GSE12470 expression dataset as previous described. By using SUVIVLE
package in R basing on these differential genes, GSE14764 dataset composed by various
ovarian cancer patients’ gene expression profiles (n?=?80) were classified into 3
subgroups (Fig. 3a). Pair wise comparisons between clusters based on p-values were carried out by Kaplan-Meier
estimates of OS respective 95 % confidence intervals (CIs). Kaplan-Meier estimates
of Fig. 3a has been showed in Fig. 3b with P?=?0.0323.

Fig. 3. Cluster analysis: Heat map profiles of ovarian cancer patients with 1073 extracted
differential genes from GSE12470 data set (n?=?53). a Heat map profiles of extracted differentiated genes and various ovarian cancer patients
from GSE14764 dataset (genome expression, n?=?80), the Kaplan-Meier curves are with
respect to (b) overall survival (OS) rite at non-significant P?=?0.0323, (c) Heat map profiles of extracted differentiated genes and various ovarian cancer patients
from GSE49997 dataset (mRNA expression, n?=?204), corresponding Kaplan-Meier curves
(d) with a non-significant P?=?1.02e – 05, (e) Heat map profiles of extracted differentiated genes and various subtypes of epithelial
ovarian cancer patients from GSE63885 dataset (genome expression, n?=?101), the Kaplan-Meier
curves are with respect to (f) overall survival (OS) rite at non-significant P =0.0781,A) is for prognosis trials,
(c, e) are used to verify the availability of selected modules and extracted differential
expression genes. All estimates of OS respective 95 % confidence intervals (Cis)

In order to verify the availability of the prognostic functions of these 4 modules,
we useGSE49997 and GSE63885 datasets to repeat the same experiment. GSE49997 23] is composed by mRNA expression data from epithelial ovarian cancer patients (n?=?204)
while GSE63885 datasets are consisted by genome expression data from various ovarian
cancers. According to the original articles, candidates in GSE49997 23] dataset are classified into the clinic-pathologic parameters of the histological
serous and non-serous tumor subtypes, each subtypes can be divided into 2 subclasses
derived from International Federation of Gynecology and Obstetrics stage-directed
supervised classification approach (IFGO). One group’s (subclass2) conditions deteriorated
extremely from a certain time point and appear much lower livability in both serous
and non-serous histological subtypes than another (subclass1)’s, as revealed by univariate
analysis (hazard ratios [HR] of 3.17 and 17.11, respectively; P 0.001) and in models
corrected for relevant clinic pathologic parameters (HR 2.87 and 12.42, respectively;
P 0.023). Similarly, candidates in GSE63885 22] datasets adapt the same classification approach(IFGO), and they discovered that histological
type could be a confusing factor and gene expression exploration of ovarian carcinomas
should be performed on histologically homogeneous groups to direct the prognostic
analysis on chemotherapy. In their experiment, clinical endpoints like overall survival,
disease-free survival, tumor response to chemotherapy are not confirmed by validation
either on the same group or on the independent group of patients, just CLASP1 gene
with BRCA1 mutation status related to one ovarian cancer subclass which tend to deteriorate
easily.

Comparatively, heat map profiles in current researches (Fig. 3c and Fig. 3e showed) showed the samples from GSE49997 and GSE63885 dataset had been efficiently
divided into 2 groups base on the same differential expressed genes and 4 network
modules used in Fig. 3a, which are identical with the original dataset information. In Kaplan-Meier estimates
of OS respective 95 % confidence intervals (CIs) were provided for these two heat
maps with p equals to 1.02e-05 and p equals to 0.0781 respectively. According to these
two verification models and similarities in classification to original data sources
we described above, the selected 1073 different genes in 4 majority network modules
is competent to classify ovarian cancer into subtypes that are prognostic of different
chemotherapy outcome, especially for epithelial ovarian cancer and ovarian germ cell
cancer (especially for stage 4 and stage 5), which are notorious for diagnosis and
distinction at the early stage with analogous morphological characteristics. In addition,
the modules we established may prefer much more accuracy and practicability, as GSE63885
22] datasets with less stringent criteria for gene selection (FDR 10%and uncorrected
p-value 0.001).

For further extraction and prognosis of genes directly related to ovarian cancer survival,
we used univariate COX regression method to calculate the correlation between genes
and survival prognosis within the module, GSB14764 dataset genes associated with prognosis
in a total of 35 genes; GSE49997 dataset and prognosis related genes, a total of 47
genes (Additional file 1: Table S5); GSE63885 dataset and prognosis with a total area of Venn diagram with
57 genes (Additional file 1: Table S6). View these three ovarian cancer prognostic gene intersection situations,
find the intersection between any two relatively small (Fig. 4), the intersection of the six genes LRRC8D, TTC304, TFCP2L1, LIBRINEPOR, PAR52. Outstandingly,
dysregulation of this EPOR may affect the growth of certain tumors 53], 54].

Fig. 4. Three data sets COX univariate regression analysis were screened for ovarian cancer
prognostic gene Venn diagram