Mapping EQ-5D utilities to GBD 2010 and GBD 2013 disability weights: results of two pilot studies in Belgium

DALY-based burden of disease studies play an increasingly important role in public health research [20]. As the required DWs for the health states under study may not be available, flexible and practical ways of eliciting DWs that are comparable with existing GBD DWs are therefore needed. We proposed a mapping approach based on a loess regression model to easily predict any GBD 2010/2013 DW from EQ-5D5L utilities. To our knowledge, the current study is the first to map EQ-5D utilities to GBD 2010/2013 DWs.

Recently, Burstein et al. used loess regression to map SF-12 utilities to GBD 2010 DWs [21]. They showed a weaker rank order correlation (?0.7) between GBD 2010 DWs and SF-12 scores than we found between DWs and utilities (???0.8). They also found that the relationship between GBD 2010 DWs and SF-12 scores was non-monotonic. The mapping function developed by Burstein et al. was less satisfactory than in the first pilot study but better than in the second pilot study. Burstein et al. selected 62 health states, collected data from a sample of 3791 respondents and corrected for outliers. Except that participants were enrolled in Seattle and during two GBD workshops, no information was available on age, educational or income level and disease experience status of their study population.

We observed major differences in the validity of the mapping function between both pilot studies, which could be influenced by the different study populations and questionnaire forms. Indeed, a higher prediction quality was observed when we derived utilities from a written version of the EQ-5D questionnaire compared to a web-based questionnaire. One explanation could be that during the first study, which included the written version of the EQ-5D questionnaire, one of the study leaders was present to answer the questions of the participants. This may have increased the understanding of the health state definitions. The wider standard deviations of the second pilot study underscore this hypothesis. Even though we used the updated version of the lay definitions developed by Salomon et al. [3], we still observed that descriptions were not straightforward to understand for lay population. In the second pilot study, we observed a weaker quality of the prediction for more severe disorders, for example ‘Schizophrenia, acute’, ‘Epilepsy severe’ or ‘Rectovaginal fistula’.

In addition, we observed some overlap between the health state descriptions developed by Salomon et al. [3] and the five dimensions included in the EQ-5D questionnaire that could have influenced the health state valuations. For example, the definition of ‘Fracture of pelvis: short term’ was, “…You have severe pain, and cannot walk or do daily activities” which provided information on the level of mobility, self-care, usual activities and pain/discomfort dimensions included in the EQ-5D5L questionnaire. We observed that utilities for health states with overlap in descriptions had lower variation. For example, in the second pilot study the lowest standard deviation was observed for ‘Amputation of toe’ (SD 0.127) and ‘Asthma, controlled’ (SD 0.128), both of them included information on pain and daily activities in their descriptions.

We also observed that there were some important differences of ranking between GBD 2010/2013 DWs and utilities derived in the second utilities study. Respondents of our study ranked amputation of both legs lower (more severe) and acute schizophrenia higher (less severe) than the respondents of the GBD 2010/2013 studies. In addition to the overlap between the health state descriptions described above, one explanation could be that for participants included in the pilot studies it was more difficult to imagine and evaluate functional limitations (D1 – D3) for psychiatric disabilities than for impaired mobility.

Several limitations may also have influenced the final results.

In the second pilot study, a sample of 27 GBD 2010 health states was used, and each health state was evaluated at least 30 times. Chuang et al. determined that each health state has to be at least evaluated 100 times to be representative of the population of responses [22]. We indeed observed better results in the first study, where each health state was at least evaluated 81 times.

The study population was not representative of the Belgian population. The first study population (n?=?81) was composed of students in public health, mostly female and young adults. In the second study (n?=?393), and despite the snowball strategy, most of the participants were between 20 and 39 years (72%), were highly educated. The web-based questionnaire as well as the age of two main coordinators of the study could be an explanation of the study population characteristics. Hopefully, Haagsma et al. demonstrated no significant effects of educational level on DWs for injury consequences in the Dutch population [17]. However other studies, which also used EQ-5D questionnaire, demonstrated that participants aged 18–59 evaluated health states less severely than those aged 60 and over and that older participants attributed less weight to morbidities and pain experience than younger [23, 24].

In addition, some authors reported that the judgment of people who are in a certain health state and health professionals differ significantly from judgment of healthy people [15]. Thirty-one percent of the respondents included in the second study had a disease experience and mostly were expert in public health, which might have influenced the results. Utilities could have been under or over estimated. However we do not recommend to restrict participants to healthy individuals because we believe that utilities have to represent the average ‘preference’ of health of the studied population.

We chose to include negative values of individual utility in the Cleemput’s model, indicating some health states to be worse than death for some participants. We obtained a negative value for one (25%) health state in the first pilot study and for 11 health states (41%) in the second pilot study. For a study including a larger population, it is recommended to constrain values between 0 and 1 to improve the prediction model [25] but in the practice, the issues raised by the negative values for EQ-5D health states are complex [26]. In addition, to perform the final mapping we arbitrarily chose that when DWs equated to 1, utility equated 0, indicating that it exists health state worse than death. This is one of the methodological and philosophical choice difference between the utilities derived from EQ-5D tool and DWs derived from pairwise comparison that could impact the predictions. This is also why assuming the DW to be equal to one minus the utility do not guarantee comparability with the GBD 2010/2013 DWs.

In addition, both ‘short’ and ‘long’ term health states were included in the study and the time framing was not explicitly defined. Some studies demonstrated that the duration of a health state has an impact on the health state valuations and that poor states of health became more intolerable the longer they last [17, 18, 23, 27]. However Salomon et al. showed that the framing of paired comparsion questions in terms of temporary or chronic outcomes in a pairwise comparison did not affect the valuation of the health states [3].

Finally, there are fundamental differences between the pairwise comparison (PC) and EQ-5D techniques.

First, GBD PC was anchored to Population Health Equivalence answers, while EQ5D Flemish tariffs [10] was anchored to Visual Analogue Scale. Although there are systematic differences between those methods, both are indirect and should not affect the mapping. Second, EQ5D questionnaire was designed to assess Quality of Life (QoL) of patients, with a given scenario of health states but DWs were designed to quantify the severity of a single health state. In other words, utilities are patient specific, whereas DWs are health state specific. In this study, we deviate from this original definition by defining health state specific utilities. Both methods also do not evaluate health states on the same dimensions of health [28]. With EQ-5D instrument participants have to evaluate health states on five dimensions of health, each of them including five levels of severity and with pairwise comparison, two health states are presented to healthy people and they have to decide which they regarded as being healthier based on their own judgment and experience. These fundamental differences of methodology can also explain the differences we observed between GBD 2010/2013 DWs and utilities.