Cut-off values for the applied version of the Beck Depression Inventory in a general working population


Design

lidA is a German prospective cohort study in which individuals will be followed up
in three year intervals. The study investigates the associations of work, health and
employment in an ageing work force. The current validation study is based on respondents
from the first wave. The focus was on the association between depressive symptoms
and functioning. The responses regarding functioning were obtained by a computer assisted
personal interview (CAPI). Depressive symptoms were assessed by the BDI-V (see below).
The BDI-V used was a paper and pencil version returned to the interviewer in a closed
envelope.

Sample

Employees were recruited in the frame of the first wave of the German lidA-cohort
study, which targets work, age, health and work participation in employees born in
1959 and 1965. Forming part of the German baby-boom generation, these employees are
highly relevant for lidA’s primary research goal 3]. The response rate was 27.3 percent. The sample was selected in a two stage random
process: First 222 sample points from all over Germany were randomly chosen. Second,
6585 study participants were randomly selected from the database of the Institute
for Employment Research (IAB) of the Federal Employment Agency on the reference date
31th December 2009. This data, also referred to as Integrated Employment Biographies
(IEB), includes all German employees subject to social insurance contributions. Persons
who are unemployed, self-employed, freelancers as well as civil servants are excluded
by definition 3].

Following the International Labour Organization (ILO), an employee is defined as a
person who works at least one hour a week 14]. Thus 6339 employees were available for this analysis.

In Germany, there are extensive legal requirements for data protection. Therefore,
an application process in accordance with German social legislation was required to
carry out the sampling procedure 15]. The approval was issued by the ethics commitee of the University of Wuppertal.

Measurements

Depressive symptoms

In the original version of the BDI depressive symptoms are assessed by 21 typical
symptoms, each with four statements ranked according their severity level and resulting
in 84 statements. The wording of items and comprehensiveness of the questionnaire’s
structure might be well suited for a clinical setting, but the applicability of the
BDI in epidemiological studies is limited by the length of the instrument. This was
part of the motivation behind constructing an applied version of the BDI in German
by Schmitt et al. 13]. For this reason, only 20 items from the original BDI were selected. One item about
weight loss was dropped. Instead of the original BDI consisting of four statements
for each symptom, only one statement is used in the simplified version. This sole
statement serves as a reference point when participants are asked to rate their frequency
of experience by using a simple six-level scale. The range of the sum scores goes
from 0 to 100. Schmitt et al. 13] compared the reliability of the original BDI and their simplified version: Cronbach’s
alpha of the original version was lower (??=?0.84) than that of the simplified version
(??=?0.94). The simplified BDI and the original version correlated with r?=?0.83.
This high degree of convergence supported by the results of a confirmatory factor
analysis showed only a slight deviation from perfect measurement equivalence, in spite
of the higher efficiency of the applied BDI-V. Moreover, norm values are provided
by Schmitt et al. 16] based on a sample from the general population in Germany. Criterion-referenced validity
was investigated 13] by correlations between BDI-V and similar or convergent self-rating scales of depression
and by comparing the score of BDI-V between different clinical and nonclinical samples.
This kind of evidence supports the assumption that all these instruments are similar
representations of the same concept (depression).

Functioning

Two items of the Work Ability Index 17] were selected in order to define dichotomous criteria for functioning at work:

The respondents are asked “How do you rate your current work ability with respect
to the physical demands of your work?” (Work ability – physical), and

“How do you rate your current work ability with respect to the mental demands of
your work?” (Work ability – mental)

The response categories for both items are “very good”, “rather good”, “moderate”,
“rather poor”, “very poor”. The cut-off is made between “moderate” and “rather poor”.

Three items of the CAPI were extracted from the modified German version of the SF-12
18]. They are related to mental health, emotional problems and social limitations: “During
the last four weeks, how often did you feel that due to …”

“mental health or emotional problems you achieved less than you wanted to at work
or in everyday activities?” (SF-12 role emotional 1),

“mental health or emotional problems you carried out your work or everyday tasks
less thoroughly than usual?” (SF-12 role emotional 2),

“physical or mental health problems you were limited socially, that is, in contact
with friends, acquaintances, or relatives?” (SF-12 social functioning).

The response categories for these three items are “always”, “often”, “sometimes”,
“almost never”, and “never”. The cut-off was made between “sometimes” and “almost
never”.

Statistical analysis

Criterion referenced validity of the BDI-V was assessed by the rank correlation between
BDI-V score and the values of five indicators of functioning. Receiver operating characteristic
(ROC) analysis was used to define cut-off values for the BDI-V (Figs. 1 and 2). ROC curves originate from signal detection theory and were later introduced in
medicine to validate and improve diagnostic measures 19]. In the classical paradigm for evaluating test performance, the result of an index
test is compared with a reference test, typically an instrument widely accepted as
a gold standard. An example is liver biopsy as the standard for evaluating liver fibrosis.
Such a gold standard as this is to date not available for depressive symptoms. There
are however in the field of psychometrics other strategies to evaluate the validity
of an index test 10]. An important source for interpretations and inferences of validity is evidence based
on relations to other variables. The current analysis focusses on the relationship
of the BDI-V to other variables relevant to the participants’ daily functioning and
ability to work. ROC curve analysis is then carried out on the basis of these other
variables serving as external criteria. BDI-V values were chosen for optimally distinguishing
cases with impaired functioning in different areas of life. Participants who have
a sum-score above a given cut-off value are cases with impairment. Those below are
non-cases. Within a range from 0 to 100 there are 100 possible cut-off values each
corresponding to a cumulative percentage. This cumulative distribution is used for
the computation of sensitivity (SENS) and specificity (SPEC). The misclassification
of cases and non-cases is given by 1 – sensitivity (1 – SENS) and 1 – specificity
(1 – SPEC). The area under the curve (AUC) indicates the accuracy of the instrument
to detect impairment. It is generated by plotting 1 – SPEC against SENS (see e.g.
19]). There are different strategies and rules to determine the optimal cut-off and to
reduce the risk for misclassification (1 – SENS, 1 – SPEC). One common approach is
the computation of the Youden-Index (Y): Y?=?SENS?+?SPEC – 1 for each possible cut-off.
The optimal cut-off is then indicated by the highest Y-value. This rule however allows
low values of SENS to be compensated by high values of SPEC and vice versa. Such a
compensation is avoided by computing the maximum of both errors in classification.
The subsequent selection of a cut-off is based on a minimization of these maximum
errors (min-max principle) 20].

Fig. 1. Area Under the Curve for male; sensitivity and 1-Specificity for all BDI-V scores

Fig. 2. Area Under the Curve for female; sensitivity and 1-Specificity for all BDI-V scores