Laquinimod efficacy in relapsing-remitting multiple sclerosis: how to understand why and if studies disagree

The results of the ALLEGRO and BRAVO trials have been described elsewhere, as have
the findings of the prespecified and post hoc sensitivity analyses 14], 15]. The following summaries are provided as context for the current propensity score
analysis.

Main findings from ALLEGRO and BRAVO

ALLEGRO was a double-blind, international study in 1106 patients with RRMS who were
randomly assigned in a 1:1 ratio to receive either oral laquinimod 0.6 mg once daily
or oral placebo for 24 months 14]. Treatment with laquinimod vs placebo was associated with a reduction in the mean?±?standard
error (SE) ARR (0.30?±?0.02 for laquinimod vs 0.39?±?0.03, for placebo, P?=?0.002). Laquinimod was also associated with significant reductions in the risk
of 3-month confirmed disability progression, the mean cumulative number of gadolinium
(Gd)-enhancing lesions at 12 and 24 months, and the cumulative number of new or enlarging
lesions on T2-weighted images 14].

BRAVO was a placebo-controlled, international study in 1331 patients with RRMS who
were randomly assigned with equal probability to receive oral laquinimod 0.6 mg once
daily, matching oral placebo, or interferon beta-1a (IFN?-1a) (30 ?g intramuscularly
once weekly) for 24 months 15]. Patients who received IFN?-1a were excluded from this analysis, which is simplified
with just two groups. Treatment with laquinimod vs placebo was associated with a nonsignificant
reduction in ARR (0.28?±?0.03 for laquinimod vs 0.34?±?0.03 for placebo; risk ratio
[RR] 0.82; 95 % confidence interval [CI] 0.66–1.02; P?=?0.075) 15]. Percent brain volume change from baseline to month 24 was significantly reduced
with laquinimod vs placebo 15].

Findings from prespecified and post hoc analyses

The BRAVO prespecified sensitivity analysis revealed that the baseline mean volume
of T2 lesions was greater for laquinimod (9.6 cm
3
) than for placebo (7.9 cm
3
, P?=?0.009). Further, more patients in the laquinimod group (40 %) had Gd-enhancing
lesions at baseline, despite randomization, than did those in the placebo group (33 %,
P?=?0.055) 15]. Previous literature has shown that the number of new, active T2 lesions can serve
as a predictor of rate of relapse both in individual-patient analysis and as observed
in the ratio between experimental and control arms in studies 16], 17]. Similarly, the proportion of patients with Gd-enhancing lesions and T2 lesion volume
at baseline was found to be a strong predictor of the rate of relapse during the BRAVO
study (? linear estimates of 0.45 with P??0.0001 for the categorical Gd-enhancing T1 lesions and 0.0112 with P?=?0.0126 for the continuous T2 volume variables); therefore, they were added as covariates
to the statistical model for the purpose of conducting several post hoc analyses 15].

In one post hoc analysis of the BRAVO study that included the two baseline MRI parameters as covariates,
the ARR for laquinimod vs placebo was reduced by 21 % (P?=?0.0264), and the risk of worsening of disability confirmed at 3 months for laquinimod
vs placebo was reduced by 33.5 % (P?=?0.044) 15]. In another post hoc analysis of the BRAVO study, the observed relapse rate in the placebo group at 24 months
was found to be lower (0.34 relapses/year) than expected (0.6 relapses/year) based
on a post hoc power calculation made for the study design, and thus, had the study been conducted
with this knowledge, it would have had only 48 % statistical power to detect a significant
treatment of the observed effect of laquinimod vs placebo on ARR 15].

Propensity score model

The results from BRAVO suggested that, although randomization assigned treatments
in an unbiased manner, imbalances still occurred, and exploring these might improve
the understanding of the results. Thus, exploration via propensity scores might be
useful. The propensity score was defined as the probability of an individual patient being assigned to either
of the study arms (laquinimod or placebo) given a known set of covariates. If balance
is to be obtained in those covariates, it is expected that for treatment the propensity
score would revolve around 0.5 (given 2 treatment groups). The propensity to be allocated
into each group was summarized into 1 score, and that score was used as a covariate
with 1 degree of freedom (in the case of a continuous covariate) in the primary analysis
model; this method allows for adjustment as compared with performing analysis of covariance,
which may involve too many covariates simultaneously.

The goal of using a propensity score was to obtain an estimate of the probability
of being assigned to 1 or another of the treatment arms based on characteristics within
the trial, when the theoretical probability was known to be 0.50 3]. A major concern in the use of propensity score analyses is having unmeasured covariates
critical in the assignment of treatments. This is not the concern here because we
accept that randomization has balanced the unmeasured covariates and we are only adjusting
for known differences as explanations for differences in results. However, the potential
for unmeasured confounding variables cannot be fully ruled out, and it may represent
a potential limitation of the study. Independent variables included pretreatment covariates
that may have been associated with treatment imbalance, as well as the reported number
of relapses. Explanatory variables included age, sex, country, weight at baseline,
time from first symptom, time from diagnosis, tobacco use, indicator for the number
of Gd-enhancing lesions at baseline, log of the total number of exacerbations in the
last year, log of the total number of exacerbations in the last 2 years, baseline
Expanded Disability Status Scale (EDSS) score, baseline Multiple Sclerosis Functional
Composite score, T2 lesion volume at baseline, T1 lesion hypointense volume at baseline,
and normalized brain volume at baseline. All second-degree interactions with the variables
listed were also included in the model, with the exception of interactions with country
because of the small number of patients from some countries.

Baseline covariates not included because of missing values were as follows: EDSS score
on date of onset of last exacerbation prior to randomization, EDSS score on date of
diagnosis of MS, time from date of onset of last exacerbation, time from stabilization
of last exacerbation, and previous exposure to glatiramer acetate; race was also omitted
because nearly all of the patients were white.

The analysis included two main stages: (1) calculation of a propensity score for each
patient, given a broad set of baseline covariates that also included second-degree
interactions, and then (2) incorporating the propensity score as another covariate
into the predefined primary analysis model to test the treatment effect of laquinimod
(0.6 mg/d) vs placebo on ARR. For comparative purposes of this approach, the latter
stage used two adjustments approaches: one included a continuous propensity score
as a covariate, and the other subclassified the range of the continuous propensity
scores into quintiles and included the quintile as a categorical variable (with 5
levels and 4 degrees of freedom).

Statistical analyses

The logistic regression model estimated the probability for each patient to be assigned
to the laquinimod arm; thus, as expected from the baseline imbalances, propensity
scores for patients in both BRAVO and ALLEGRO who actually received laquinimod were
lower than those for patients who actually received placebo. To simplify the presentation
of results, quintiles were used as categorical variables in the current analysis 18]. Propensity score quintiles were calculated by combining the range of values in the
laquinimod and placebo groups.