Can screening instruments accurately determine poor outcome risk in adults with recent onset low back pain? A systematic review and meta-analysis

Based on high quality prognostic studies, this systematic review provides evidence that LBP PSIs perform poorly at assigning higher risk scores to individuals who develop chronic pain, than to those who do not. Clinicians can expect that a PSI, administered within the first 3 months of an episode of LBP will correctly classify a patient as high or low risk of developing chronic pain between 60% and 70% of the time. PSIs perform somewhat better at discriminating between patients who will and will not have persisting disability (70–80% probability of correct classification) and appear most successful ( 80% probability) at discriminating between patients who will or will not return to work successfully.

This review also informs about the prognostic performance of specific instruments. The OMPSQ and VDPQ appear to perform well at predicting return to work outcomes and the SBT and the OMPSQ have modest predictive value for disability outcomes, but the included instruments demonstrate little value for informing about likely pain outcomes. Problems associated with using a screening instrument for a purpose other than intended (i.e. based on interest in a specifically defined outcome, at a specific time point) have been introduced in this paper. The instruments included in this study were designed to predict outcomes at time points varying between 3 and 6 months. Two were designed to predict work absenteeism (VDPQ, ASQ), one to predict status on a chronic pain scale (CPRS), one to predict LBP recovery (HCPR), and one to predict functional limitation (SBT). Only two instruments (BDRQ, OMPSQ) were developed to predict more than one clinical outcome. This may have played a role in the poor performance of several of the instruments when evaluated according to the uniform methods we employed.

While our classification of the SBT as a PSI may be arguable, we considered that its clinical use as a prognostic instrument warranted its inclusion in this review. The NICE guidelines [15] recommend that clinicians use tools such as the SBT to identify patients at risk of poor outcome and tailor their management accordingly. Our findings suggest, however, that there is need for caution if the SBT is administered only for the purpose of predicting the risk of poor outcome. As a ‘stratified care tool’ with matched treatment pathways, the merits of the SBT have been reported elsewhere [2, 53].

While it is ideal that stratified care tools such as the SBT have high predictive validity this may not be realistic if the approach is to only include modifiable items during instrument development. Additionally, screening instruments designed for clinical use must be brief and simple to score. A trade-off of these factors may be reduced discriminative performance. It can be noted that the discriminative performance of the SBT is better in a UK General Practice setting than in Physiotherapy or Chiropractic settings – a finding consistent with the understanding that the usefulness of a screening instrument is highly setting-specific [44, 54] and optimal in the cohort for which it was developed [55]. In contrast, however, the ‘excellent’ performance of the OMPSQ for discriminating workers at risk of prolonged absenteeism regardless of country and across varied clinical settings suggests the wider utility of this PSI.

This study was prospectively registered with full adherence to the published protocol. We used the QUIPS methodological appraisal tool [28], a valid and reliable tool for evaluating prognostic studies. The general quality of included studies was assessed to be high with the exception of two studies that had high loss to follow-up [44, 51]. To our knowledge, this is the first quantitative synthesis and analysis of the discriminative performance of PSIs. All previous systematic reviews of PSIs have been unable to conduct meta-analyses of predictive accuracy because of clinical heterogeneity [9, 17, 56, 57]. It is also the first review to include studies testing the SBT. Additional data obtained from study authors facilitated data pooling from similar adult populations, with consistent follow-up time points and identical classifications of poor outcome. Pooling data from instruments that were designed with different purposes in mind may, however, limit the strength of the conclusions that can be drawn from this study.

ROC analyses are recommended for discriminative accuracy studies [58], but come with some limitations. A ROC analysis requires dichotomisation of outcomes, which means that the definition of ‘poor outcome’ can affect findings. In the absence of a general consensus on the definition of ‘poor outcome’, we followed previous studies and recommendations [24, 27, 59]. The selected cut-off score of???3/10 on a pain NRS was based on the understanding that many people with pain scores of??3 consider themselves to be ‘recovered’ [1]. Boonstra et al. [60] support that people with pain NRS scores of???3 describe themselves to be experiencing only ‘mild’ symptoms. We classified participants who were ‘not recovered’ at follow-up (or those experiencing more than mild symptoms) as having a ‘poor outcome’. Since the outcome classification can influence discriminative performance, it would have been interesting to evaluate alternative cut-off points for poor outcome for each of the outcomes considered; this could be considered in further research. The definitions we applied were used by several included studies [25, 39, 42, 61]. In addition, AUC values (derived from the ROC analysis) are a function of sensitivity and specificity – both of which are influenced by cohort characteristics (e.g. symptom severity and psychological profile). Variations are therefore expected for the same instrument among different populations.

Recommendations for the management of LBP in primary care frequently include using available screening instruments to obtain information about ‘risk’ of a poor outcome. This review highlights that clinicians may need be cautious about placing too much weight on PSIs during their clinical assessment, under the misimpression that they are able to accurately determine chronic pain risk. Using PSIs to allocate care carries the risk that patients misclassified by PSIs as low-risk are undertreated and patients misclassified as high-risk are overtreated. Estimation of risk of poor disability outcomes and prolonged absenteeism are likely to be more accurate – indicating that it is necessary to consider the clinical outcomes of interest when seeking prognostic information.

It is important to note, however, that this study investigated the predictive performance of PSIs and does not inform whether the implementation of prognostic screening improves outcomes for adults with recent onset LBP. Alternative research approaches, namely randomised ‘impact’ trials [1], are required to address this question. Furthermore, it is relevant to consider whether the use of PSIs offers more accurate estimation of a patient’s course of LBP than clinician judgement. Previous studies comparing the discriminative performance of screening instruments (including the SBT and the OMPSQ) with primary care clinicians’ estimation of risk of poor outcome [52, 38] have failed to show superior capabilities of the questionnaires.

As highlighted in the PROGRESS recommendations [21], the validation of predictive models requires a succession of steps from development through to external validation and impact analysis – a process which has been only partially fulfilled by the PSIs in this review. Further research according to PROGRESS recommendations will allow improved confidence in the selection and application of available instruments. Less understood factors (e.g. structural pathology, sleep or social factors) should be further investigated and integrated into prognostic models to improve predictive accuracy beyond what is currently achievable. In addition, there remains a need to undertake further prospective clinical trials investigating the effectiveness of screening to direct stratified care approaches for patients with LBP. The performance of a stratified care instrument is best evaluated by an effect size derived from a randomised controlled trial.