Patient samples

The OTSCC patient cohort from the Peter MacCallum Cancer Centre has been previously
described (Table 1) 15], 22]. Briefly, the cohort consisted of patients with invasive OTSCC, with comprehensive
clinicopathological details and pre-treatment archival specimen blocks. For this study,
109/131 samples had sufficient DNA for analysis on the HM450K array.

Level 1 raw IDAT files were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov) on 24 June, 2013, and the clinical annotation was downloaded on 22 July 2013. The
data we used was a superset of data used in the published TCGA head and neck cancer
analysis 23]. For our analyses, the TCGA HNSCC cohort consisted of 373 patient samples with HM450K
data available. Only limited clinical annotation was available for these TCGA samples,
thus only the variables of overall survival status and tumour-node-metastases (TNM)
stage were used for downstream analyses. Patients with metastatic disease or those
with an unknown M category (whereby the M category indicates the presence of metastatic
disease) were excluded from further analysis, leaving a cohort of 331 TCGA HNSCC samples,
which included 91 OTSCC. The samples used are listed according to anatomical subsite
in Additional file 1: Table S1. Additional file 2 lists the downloaded TCGA sample IDs, the samples included for analysis and those
also analysed in the TCGA head and neck cancer analysis 23].

The data was grouped for interrogation in three ways. The first group consisted of
109 OTSCC samples processed in one batch from the Peter MacCallum Cancer Centre (“OTSCC
cohort”). The second group combined the pre-processed OTSCC cohort with the 91 OTSCC
samples from the TCGA database, which we termed the “combined OTSCC cohort”. The final
analysis was conducted on the “entire cohort” of samples, comprising 109 OTSCC and
all 331 TCGA HNSCC samples (total ?=?440).

Data analyses

Bioinformatics processing of the raw IDAT files was performed utilising R statistical
software (version 3.0.1, http://cran.r-project.org/). Currently, no consensus guidelines exist on the optimal method of pre-processing
HM450K or the optimal method of feature selection. Given the variety of methods accounting
for the same technical biases without clear indication of superiority between the
techniques, for the pre-processing and subsequent downstream analyses described below,
each method was considered of similar quality. The workflow for the bioinformatics
analysis is summarised in Fig. 1, and the methods of analysis for each of the cohorts is summarised in Additional
file 1: Table S2, according to the R software library packages utilised.

Fig. 1. Workflow for bioinformatics processing of raw data

Data pre-processing, filtering and normalisation of data

Due to the different pre-processing criteria and stringency criteria utilised according
to each R package, different subgroups from each cohort were defined and were analysed
separately. Details of the different criteria utilised according to each library is
outlined in the “Methods” section and in Additional file 1: Table S2. However, regardless of what pre-processing criteria were utilised, the
size of the refined cohort examined or the combinations of cohorts analysed, results
for the downstream analyses were similar.

Downstream processing and feature selection

Unsupervised feature selection

After quality control and pre-processing of 450K data using Minfi and Methylumi, 83/109
samples from the OTSCC cohort, 174 (83 OTSCC cohort plus 91 TCGA OTSCC) from the combined
cohort and a total of 414 samples for the entire cohort were taken through to analysis.
A bimodal distribution of ? values for the OTSCC cohort and combined OTSCC cohort was observed indicating the
absence of a group of samples with promoter hypermethylation (Fig. 2). Similarly, hierarchical clustering, multidimensional scaling (MDS) plots and principal
components analysis did not clearly identify a subgroup of differentially methylated
samples according to phenotypic data (Additional file 1: Figure S1). These findings were again confirmed using another technique in RnBeads
to look at the clustering of data, mean silhouette values. Figure 3 demonstrates the poor classification of samples using this method and the numerous
small clusters identifiable throughout the dataset, with little similarities between
them. A silhouette value of “1” indicates good classification of the observation into
the cluster, and a value of “0” indicates that the observation lies independent of
the groupings, whilst a negative value indicates poor classification and likely incorrect
grouping (http://stat.ethz.ch/R-manual/R-devel/library/cluster/html/silhouette.html)24]. Given that increasing the number of clusters makes the similarities within the cluster
worse (approaches “0”), this suggests that the clustering of samples is based on very
subtle and small differences in methylation.

Fig. 2. Plot of median beta values of the combined OTSCC cohort. A plot of the median beta
values for the combined cohort of 174 OTSCC samples with outcome data, with the blue line indicating the overall median value for the group. A bimodal distribution is observed,
without evidence of a hypermethylated group of samples

Fig. 3. Plot of the average silhouette value according to the number of clusters. This was
assessed on RnBeads using an average linkage-based algorithm. The combined cohort
demonstrates no distinct clustering according to observed methylation values. A silhouette
value of “1” indicates good classification of the observation into the cluster, a
value of “0” indicates that the observation lies independent of the groupings, whilst
a negative value indicates poor classification and likely incorrect grouping

Therefore, despite increasing the OTSCC cohort size by inclusion of the 91 TCGA OTSCC
samples to create a combined cohort of 174 OTSCC samples, DNA methylation data did
not identify any unique subgroups according to phenotypic data. Investigating the
entire cohort of 414 heterogeneous HNSCC samples also failed to demonstrate clear
clustering of samples according to methylation profiles (Additional file 1: Figure S1). This was not surprising as heterogeneous cohorts of head and neck samples
are likely to demonstrate heterogeneous methylation profiles.

Feature selection methods

Utilising a variety of feature selection methods, we sought to determine if methylation
was informative for patient survival or other clinicopathological variables.

For the OTSCC cohort, the relationship between DNA methylation and multiple phenotypic
variables was investigated. This included assessment of relationship with overall
survival, smoking history, history of alcohol excess and age 45 years versus 45 years
due to the recognition of a subpopulation of younger patients that develop OTSCC in
the absence of any risk factors 25], 26]. Of particular interest was to determine if tumours with nodal involvement had a
differential methylation profile compared to those without, given that nodal status
(N category) is one of the most reliable clinical risk classifiers of worse prognosis
27]. Analyses investigating the relationship between DNA methylation and ECOG performance
status, progression-free survival, disease-specific survival, T category, pathological
differentiation status, and gender were also performed.

Based on the clinical annotation available for the TCGA cohort, the supervised analyses
for the combined OTSCC cohort and the entire cohort only investigated the association
between DNA methylation and overall survival status, and DNA methylation and TNM staging.

Linear models for microarray data analyses

A linear models for microarray data (LIMMA) framework was initially utilised to interrogate
the pre-processed OTSCC cohort (?=?83). After correction for multiple testing, significant associations between DNA
methylation and gender, age and pathological differentiation status were identified
(q values 0.05, Benjamini-Hoechberg method). These specific variables were investigated
further to see if they were informative for patient survival. Given the small numbers
of samples in the age less than 45 years category (less than 10 samples), only methylation
status according to gender and differentiation status were investigated for their
impact on patient outcome.

For the analysis according to differentiation status, there were 25 patients with
poorly differentiated tumours compared with 58 well differentiated tumours. Of these
poorly differentiated tumours, 13 patients died. For the analysis according to gender,
there were 57 male patients and 26 female patients, of which 27 male and 11 female
patients survived. The LIMMA framework was used to discern if probes that significantly
defined a variable were also associated with patient survival. However, no significant
association between a tumour’s differentiation status stratified according to differential
methylation profiles, and patient survival was found (q value 0.05). Similarly, no association was found when analysed according to patient
gender (q value 0.05).

When the combined OTSCC cohort (?=?174) was investigated with dmpFinder, after correction for multiple testing, methylation
values were found to be uninformative for survival status and TNM stage. For overall
survival, 28,078 probes had a p value of less than 0.05 but failed to reach significance (q value 0.05) after correction for multiple testing. When disease stage was used as
the classifier, 19,472 probes were identified with p values 0.05, with one probe remaining after correction for multiple testing. This
single probe within a promoter region corresponded to a SNP (cg17508434, a promoter-associated
CpG to the LMTK2 gene). Please refer to the “Methods” section for information regarding
filtering of probes. In a separate analysis where filtering of SNP-associated probes
were removed, no informative probes were identified.

Therefore, these analyses suggested that methylation values did not inform patient
outcome. The small numbers of events as described may have impacted these findings.

Machine learning approaches

Machine learning algorithms were also employed for analyses of the data. These were
wrapped into a tool called “Genome toolbox” (GTB), a custom R library (personal communication
Dr Justin Bedo, IBM) 28], 29]. Initially, the OTSCC cohort (?=?83) was assessed using centroid feature selection (CFS) with three-fold cross validation.
A performance of 1.0 represents a 100 % chance of being able to classify groups according
to the nominated variable, according to the number of methylation probes interrogated.
A prediction performance of 0.5 represents a prediction capability no better than
chance to classify the variable. Overall, prediction performance was poor and ranged
between 0.5 to 0.7 for classification of pathological differentiation status, history
of alcohol excess and smoking history according to DNA methylation. The prediction
performance utilising other phenotypic variables was worse. The top eight ranked methylation
probes identified to be informative for overall survival in the OTSCC cohort (cg141693,
cg079784, cg148610, cg146787, cg184013, cg080496, cg137586, cg0322520) failed to classify
samples according to survival status in the combined OTSCC cohort and the entire cohort.
When the alternate method of recursive feature elimination – support vector machine
(RFE-SVM) was used, again no significant association between DNA methylation and survival
status was identified.

A similar analysis was performed using machine learning methods for the larger combined
OTSCC cohort (?=?174), using overall survival status as the classifying variable. For each method,
ranges of tuning parameters (kernels and lambda values) were employed to optimise
the prediction performance. However, both CFS and RFE-SVM models demonstrated no prediction
capability, with performance ranging from 0.52 to 0.58 for both methods (whereby a
prediction performance of 0.5 indicates no prediction capability beyond chance, Fig. 4).

Fig. 4. Prediction performance plots for overall survival. a Prediction performance plot for overall survival of the combined oral tongue cohort
of the mean value derived from the machine learning method RFE-SVM using a range of
parameters (lambda range 0.01 with a poly kernel). Error bars represent the range of values obtained across the three-fold cross validation performed.
The prediction performance ranged between 0.5 and 0.58 indicating a poor performance.
b Representative prediction performance plot for overall survival of the combined oral
tongue cohort using CFS. The training set demonstrates a decrease in performance with
an increasing number of probes utilised, and the test model derived from the training
set shows poor performance. Error bars represent the range of values across the three-fold cross validation performed. For
both plots, a performance of 1.0 represents a 100 % chance of being able to classify
groups according to the nominated variable, according to the number of probes interrogated.
A prediction performance of 0.5 represents a capability of no better than chance

In addition, using these machine learning frameworks, a variety of other mathematical
models were investigated including the t test and the LIMMA method. These analyses similarly demonstrated poor prediction
performance metrics supportive of the conclusion that a prognostic methylation signature
did not exist for the cohort.