#### Dataset

We obtained isoform-level gene expression estimates and molecular subtype information

for 342 and 155 GBM samples profiled by Affymetrix exon-arrays and RNA-seq, respectively.

The four molecular subgroups are neural (N), proneural (PN), mesenchymal (M) and classical

(CL). Gene expression profiles for 76 (18 are N; 22 are PN; 16 are M; and 20 are CL

subtype) samples were available from both RNA-seq and exon-array platforms. The common

samples were used to assess classification performance for platform transition. We

followed the data pre-processing procedure and obtained patientsâ€™ GBM subtype information

(class labels) from our recent study 14]; briefly, we downloaded the unprocessed Affymetrix exon-array dataset of 426 GBM

samples and 10 normal brain samples from TCGA data portal (https://tcga-data.nci.nih.gov/tcga/); obtained the isoform expression of 114,930 transcript variants (equivalent to

35,612 genes) using the Multi-Mapping Bayesian Gene eXpression program 33]. The estimated expression values were then normalized across the samples, using the

locally weighted scatterplot smoothing algorithm 34], a non-parametric regression method. To select 2,000 most variable transcripts, we

applied Pearsonâ€™s correlation coefficient with cutoff of 0.8 followed by the CV.

See 14] for more details.

#### Data type and transformation

We processed the gene expression data to estimate the FC values, and then three unsupervised

discretization techniquesâ€“Equal-W binning, Equal-F binning, and *k*-means clusteringâ€“on the continuous FC data.

FC is a measure of a quantitative change of gene expression, defined by FC=log_{2 }(*T*/*N*), where *T *is estimated expression values of a tumor sample and *N *is median expression of normal brain samples.

To determine the number of bins for discretization, Dougherty *et al *28] suggested a heuristic to set the maximum number of bins *k *= max (1, 2 log (*l*)), where *l *is the number of distinct values of the attribute. Boulle 29] proposed an algorithm to find an optimal bin number for Equal-F and Equal-W, and

demonstrated the optimal bin number performs similar to the bin number *k *= 10, considered as a default for most cases. While the former approach resulted the

maximum bin number *k *= 11, we extensively evaluated by exploring various bin numbers of *k *= 2*i *for *i=*{1, 2, â€¦, 10}.

Equal-W binning algorithm seeks for maximum and minimum values, and then divides the

range into the user-defined equal width intervals defined as Equal-W=(max(GE(*i*))-min(GE(*i*)))/number of bins, where GE is isoform-level transcript gene expression of sample*i*. Then, continuous variables are assigned into the corresponding bin numbers.

Equal-F binning algorithm sorts all continuous variables in ascending order, and then

divides the range into the user-defined intervals so that every interval contains

the same number of sorted values defined as Equal-F=sort(GE(*i*))/number of bins.

*K*-means clustering algorithm calculates distance-based similarity to cluster the continuous

variables. With the user-defined number of clusters, the algorithm iteratively finds

centroids until no data point is reassigned to the updated centroids.

#### Feature selection methods

To capture most significant features, we first applied Pearsonâ€™s correlation to the

normalized expression data with a cutoff value of 0.8. Second, we used the CV to assess

the degree of variability for each transcript.

CV is defined as CV=*? */*Âµ*, where *? *and *Âµ *are the standard deviation and mean, respectively. Based on the CV scores, we selected

the top 2000 transcripts out of ~115,000. To refine the selected features further,

we employed two advanced feature selection algorithms based on SVM and RF that iteratively

evaluate each featureâ€™s contribution to the classification performance. We adopted

the programs available in R packages â€˜mSVM-RFEâ€™ and â€˜varSelRF.â€™

SVM-RFE is a feature search algorithm that measures featureâ€™s importance to the data

by iteratively eliminating one feature at a time 13]. Adopted from the weight vector **w **of the binary classification problem, the ranking criteria is the coefficients of

; features with the highest weights are the most informative. Thus, the procedure

of SVM-RFE is composed of training the SVM classifier, computing the ranking criteria

for all features, and eliminating the feature with the lowest ranking criterion.

This process is repeated until a small subset of features is achieved.

RF_based_FS method uses both backward elimination strategy and the importance spectrum

to search a set of important variables 31]. Concisely, multiple random forests were iteratively constructed to search for a

set of variable in each forest that yields the smallest out-of-bag (OOB) error rate.

The main advantage of this method is that it returns a very small set of genes while

retaining high accuracy.

#### Classification methods

We considered the four classification methodsâ€“SVM, RF, NB, and PAMâ€“to compare the

performance on platform transition using the 76 GBM samples.

SVM is primarily a two-class classifier that constructs a hyperplane to separate the

data with maximum margin 35,36]. For multiclass classification problems, two techniques are widely used: one is to

build one-versus-all classifiers, and choose the class that yields maximum margin

for test examples; the other is to build a set of one-versus-one classifiers. For

class *C * 2, *C *(*C*?1)/2 binary classifiers are trained and the appropriate class is determined by major

voting. In this study, we used the latter approach with a linear kernel method as

the size of features is larger than samples.

RF is an ensemble learning method that builds decision trees with binary splits 37]. Each tree is grown randomly in two steps. First, a subset of predictors is chosen

at random from all the predictors. Second, a bootstrap sample of the data is randomly

drawn with replacement from the original sample. For each RF tree, an unused observation

is utilized to calculate the classification accuracy.

NB is a simple probabilistic classification method grounded in Bayesâ€™ theorem, for

calculating conditional probabilities, with an independence assumption 38]. For a given instance (example), the NB classifier calculates the probability belonging

to a certain class. The basic underlying assumption is that the features (*x*_{1},*..*.,*x _{}*) of an instance

*X*are conditionally independent given the class

*C*. For example, for a class

*C*that maximizes the likelihood is P(

*X*|

*C*)

*=*P(

*X*|

_{1},â€¦,X_{}*C*). The conditional independence enables the conditional probability as a product of

simpler probabilities defined by P(

*X*|

*C*)

*=*? P(

*X*|

_{i}*C*).

PAM is a sample classification method that uses the nearest shrunken centroid approach

for transcript-variants gene expression data 26]. Briefly, the method computes a standardized centroid for each class. Then, it shrinks

each of the class centroids by removing genes toward the overall centroid for all

classes using a user-defined threshold. A new sample is assigned to the nearest centroid

for which classification is based on the unseen sampleâ€™s gene expression profile.

#### Accuracy

We estimated the overall classification accuracy based on the number of correct predictions

divided by the total number of prediction samples defined as ACC=(number of correct

predictions)/(total number of test samples). In addition, sensitivity (Sn) and specificity

(Sp) for each sub-group (one GBM sub-group vs the rest of the GBM groups together)

are calculated as and where *tp _{i}, tn_{i}*, and

*fn*, are true positive, true negative, false positive, and false negative for class

_{i}*C*, respectively.

_{i}