Cumulative subgroup analysis to reduce waste in clinical research for individualised medicine

Although subgroup analyses in randomised trials have well-known limitations, such as inadequate statistical power and inflated false positive rate, identification of subgroup effects is important for clinical practice and further research. The results of the two case studies and computer simulations presented in this paper indicate that cumulative subgroup analysis should be used to overcome limitations of isolated subgroup analyses in trials, and to encourage appropriate conduct, complete reporting and timely synthesis of subgroup analyses in clinical trials.

The detection of differences in effect between subgroups usually requires larger sample sizes than the evaluation of the overall treatment effect, unless the effects are in opposite directions for different subgroups. The statistical power to detect meaningful subgroup effects is unlikely to be sufficient in a single trial, and it is necessary to shift the focus from subgroup analyses in separate trials to subgroup analyses involving all related trials. Although IPD meta-analyses have been increasingly used for this purpose, cumulative subgroup analysis may detect subgroup effects much earlier than IPD meta-analyses. The early identification of important subgroup effects may help clinicians in patient care, and to inform the design and analysis of further research [48]. After the publication of IPD meta-analysis on beta blockers for heart failure in 2014 [16], there are still debates about the use of beta blockers for patients with heart failure and atrial fibrillation. For example, the results of the IPD meta-analysis was dismissed as a “retrospective subgroup analysis” in the European Society of Cardiology 2016 Guidelines for the Diagnosis and Treatment of Acute and Chronic Heart Failure [49]. Had the cumulative subgroup analysis been reported by 1999, the observed subgroup effect would have been prospectively investigated in the subsequent large scale trials, and potential mechanisms might have been better investigated.

Inflated false positive rate (type I error) is often used as a reason to restrict the conduct of multiple subgroup analyses in clinical trials. However, inflated false positive findings depend on the following two conditions: selective reporting of statistically significant results of multiple subgroup analyses, and a variety of significant subgroup effects when defined by different patient characteristics. The first problem, selective reporting, is not unique to subgroup analysis [50]. It can be addressed by complete reporting of all subgroup analyses conducted, and by clearly stating whether the subgroup analyses reported are pre-specified or post hoc. In addition, the cumulative subgroup analysis should not be stopped early when a statistically significant subgroup effect is observed, particularly at its early stage with only a few included trials. Irrespective of currently estimated subgroup effects, data from all subsequent trials conducted for various reasons should be continuously added to the cumulative subgroup analysis. The second problem may be of limited relevance in practice because it makes little sense to merge subgroup effects defined by different patient variables. For example, purely by chance, the treatment effect may be associated with age in a trial, and with different baseline variables in other trials. The results of subgroup effect by age, or by another variable, should be interpreted separately from other subgroup effects and based on pooled data from all related trials. For the same subgroup variable, the rate of false positive subgroup effect will not be inflated, and will correspond well with the statistical significance level adopted in both conventional and cumulative subgroup analyses (Tables 1 and 2). In addition, the possible harms due to false positive subgroup effects in individual trials will be minimal in practice when clinical guidelines are developed after rigorously assessing the validity of all available evidence [51].

Subgroup analyses in clinical trials may be used to test or generate hypotheses on subgroup effects [2, 52]. Given limited statistical power and lack of clear prior understanding of important subgroup variables, subgroup analyses should be generally considered as hypothesis-generating when single trials are considered in isolation. With the concept of cumulative subgroup analysis, a subgroup analysis in the first trial or a few early trials is for the purpose of hypothesis generation, but the same subgroup analysis in subsequent trials may be considered as hypothesis testing. Because it may be difficult to decide whether a subgroup analysis is hypothesis-testing or hypothesis-generating, a Bayesian approach may provide a more convenient theoretical framework for cumulative subgroup analysis [53]. Analogous to the Bayesian method of combining prior and new evidence, a cumulative subgroup analysis continuously incorporates existing information with data from a new trial.

Data on patient characteristics at baseline are routinely collected in randomised controlled trials for multiple purposes, including a description of study population, assessment of comparability of trial groups, adjusting for possible confounding factors, and conduct of subgroup analyses [54]. During the past several decades, subgroup analyses in trials have not been encouraged [10]. Consequently, data on baseline characteristics collected in trials have been under-used, or completely wasted, for the purpose of subgroup analysis. According to the recommended criteria for credible subgroup analyses [9], the number of subgroup analyses in a trial should be no more than five. Another recommended criterion for credible subgroup analyses is whether subgroup effects across related studies are consistent [9], which will be impossible to assess if the same subgroup analysis has not been conducted and reported in other related studies. The argument that only a few pre-specified subgroup analyses should be conducted in a trial conflicts with the need to compare and combine results of the same subgroup analysis from all related trials [3]. The current emphasis on avoiding false positive subgroup effects has restricted the conduct and reporting of exploratory subgroup analyses in trials, resulting in a waste of research data and missing opportunities of detecting subgroup effects that are meaningful for clinical practice or additional research [2, 48].

Inaccessibility and lack of full information are avoidable waste in biomedical research [55]. Subgroup analyses in meta-analyses using published data are often very limited or impossible due to inadequate reporting of results of subgroup analysis in trials [2, 12]. The development of IPD meta-analyses has facilitated the identification of important subgroup effects. However, trial data on subgroup effects have been wasted before the conduct of IPD meta-analyses, and continue to be wasted where IPD meta-analyses remain unavailable, although the magnitude of such waste is currently unclear. Therefore, more IPD meta-analyses of existing trials should be conducted to identify meaningful subgroup effects. In future, exploratory subgroup analyses using full data on patient characteristics at baseline should be encouraged.

Subgroup analyses in clinical trials should be conducted using appropriate statistical tests of interactions, and reporting of subgroup analyses should be complete, with sufficient information to be included in cumulative analyses. To conduct cumulative subgroup analysis, the same or similar definitions of subgroups of interest need to be adopted in related clinical trials. First, cumulative subgroup analyses should be taken into account in making decisions about data collection at the design stage. Patient subgroups could be defined according to patient baseline characteristics using data routinely collected in clinical trials. Ideally, increased sharing of trial data may enable prospective IPD meta-analysis with cumulative subgroup analyses that starts when data from the first two RCTs are available and is repeated when a new RCT is completed. Prospective IPD meta-analysis will also allow the subgroups to be defined in the same way, for example, using the same cut-points between subgroups.

The usefulness of cumulative subgroup analysis will be limited when the number of related trials is very small. Our search of PubMed (see Additional file 2 for search strategy) identified 60 IPD meta-analyses published in 2014 and only three provided sufficient data for cumulative subgroup analysis. We discussed in detail only two cases to illustrate the usefulness of cumulative subgroup analyses for clinical practice and further research. We believe that our study will inspire others to conduct more cumulative subgroup analyses using data collected in existing and future IPD meta-analyses.