Quality of outcome reporting in phase II studies in pulmonary tuberculosis

Methods

Randomised controlled trials, or quasi-randomised trials, were included in our systematic
review of the available literature. Included studies had to include patients with
smear- and culture-positive pulmonary tuberculosis that were being treated for the
first time, or had known isoniazid mono-resistant organisms on susceptibility testing.
Only trials including regimens containing any combination of historic (rifampicin
(R), isoniazid (H), pyrazinamide (Z), ethambutol (E), thiacetazone (T), para-aminosalicylic
acid (P), and streptomycin (S)), or novel drugs used or proposed for use in first-line
treatment regimens (rifabutin (Rb), rifapentine (Rp), levofloxacin (L), ofloxacin
(O), gatifloxacin (G), moxifloxacin (M), bedaquiline (J), and PA-824 (Pa)) were considered.

A systematic search of databases (PubMed, MEDLINE, EMBASE and LILACs) was conducted
on 1 May 2015 to retrieve relevant peer-reviewed articles. The employed search strategy
drew upon common phrases and terms used in the literature. Keywords (appropriately
truncated to allow a wide search) were combined with medical subject headings (MeSH)
to comprehensively search four databases. The PubMed inclusive search strategy was
as follows, with relevant modifications made as necessary for the other databases:

1. Search (tuberculosis) AND clinical trials

2. Search ((((((((((((((rifampicin) OR isoniazid) OR pyrazinamide) OR ethambutol)
OR thiacetazone) OR pyrazinamide) OR streptomycin) OR rifabutin) OR rifapentine) OR
levofloxacin) OR ofloxacin) OR gatifloxacin) OR moxifloxacin) OR bedaquiline) OR PA-824

3. Search (#1) AND #2

No language restrictions were imposed. The search strategy was supplemented by hand
searching reference lists of included studies and relevant reviews. One author (GD)
reviewed the title and available abstract for all identified citations to determine
relevance. Another author (LB) repeated this process on 1 May 2015 to check for additional
studies. Following the initial review, both authors (LB and GD) independently reviewed
full-text publications to make a final selection of included Phase II studies evaluating
either monotherapy or combination regimens. A structured form was used to record relevant
information and ensure uniformity of evaluation for each study. Extracted data included
study characteristics including country of study, sample size, treatments (including
dosages and regimens), and all reported outcomes. Risk of bias was considered via
sequence generation, allocation concealment, blinding, reasons for exclusions, and
selective reporting.

Results

The flow of studies through the review is shown in Fig. 1. The main reasons for exclusion were failure to meet the inclusion criteria, and
study design other than randomised controlled trial. In total, 55 relevant studies
were identified and included.

Fig. 1. Flow of studies in the review

A bar chart summarising the year of publication of the included studies can be seen
in Fig. 2. In 1996, CONSORT guidelines were first published for transparent reporting of clinical
trials 4]. The majority (79 %) of studies included in our review were published after 1996,
and consequently should conform to the CONSORT guidelines and present thorough information
on items such as trial design, intervention, participants, and outcomes, which must
be completely defined 4].

Fig. 2. Year of publication of included studies

Despite the CONSORT guidelines, a core outcome set for TB that defines a minimum set
of clearly defined outcomes to be reported in each future study 5] has not yet been developed. Consequently, there is wide variation in the definition
and type of outcomes reported. Most studies included in our review reported more than
one outcome, and the distinction between primary and secondary outcomes was often
unclear. Therefore all outcomes included in each study have been considered and are
summarised in Table 1.

Table 1. Reported outcomes

The methods and purposes of Phase IIA and IIB studies in tuberculosis differ, and
we have considered these study types separately in the quantitative review that follows.
Of the 55 included studies, 32 were Phase IIA studies, 20 were Phase IIB studies,
and three were designed to consider alternative outcomes such as contamination 6], and Gaffky code – a numerical rating for the classification of tuberculosis according
to the number of tubercle bacilli in the sputum 7]. One study considered both Phase IIA and IIB outcomes 8].

In addition to the differing types of phase II studies, different culture media were
considered across trials. For example, 34 studies (26 Phase IIA studies, five Phase
IIB studies, one study considering EBA and culture together 8], and two that considered alternative outcomes 6], 9]) reported results obtained using solid media such as Lowenstein-Jensen and Middlebrook
7H10. In 12 cases (eight Phase IIB studies and four Phase IIA studies), the laboratory
methodology described the use of both solid and liquid media, such as the BACTEC or
MGIT system, and a single set of results combined over the multiple media were presented.
Three studies (all Phase IIB studies 10]–12]) described culture on both liquid and solid media but presented disaggregated results
per medium. Several studies did not describe the medium used – one Phase IIA study
13], four Phase IIB studies 14]–17], and one study looking at an alternative outcome 7]. Three of these studies were published pre-CONSORT 7], 14], 15]. One study was a conference abstract 13] where space constraints meant methodology could not be reported, and two were published
in Russian language journals, which appeared not to adopt the CONSORT reporting guidelines
16], 17].

A range of analytic approaches to these varied data were considered, with multiple
and wide-ranging methods being reported in most publications. Some authors opted to
analyse their data using regression models such as logistic 12], 18] or linear 10], 19]–21] analysis. More commonly, authors considered t-tests 6], 22], 23], ANOVA methodology 24]–26], and chi-squared tests 11], 18], 27] for normally distributed data, or Kruskal-Wallis 28], 29], Mann-Whitney U 30], 31], or equivalent tests when the data was not normally distributed. Infrequently, time-to-event
analysis methodology was used 12], 28], 32], as well as correlation methodology, including the Wilcoxon signed-rank test 8], 30], 31]. None of the studies were adjusted for multiple comparisons.

Regarding risk of bias, 24 (44 %) studies did not report the method of sequence generation.
All but four studies (83 %) used random allocation (with stratification in some cases),
rather than consecutive allocation. Only four studies (7 %) mentioned allocation concealment,
mainly via opaque envelopes. Six studies (11 %) were of a double-blind design, and
another six were single-blind studies. Twenty-eight studies (51 %) provided reasons
for exclusions, or numbers lost to follow-up. Seventeen (31 %) studies were published
pre-CONSORT when selecting reporting was not considered as a possible source of bias.
In all studies published post-CONSORT, the risk of bias is unclear, as there is insufficient
information to determine whether the published reports include all expected outcomes,
including those that were pre-specified.

Phase IIA studies

More than half (56 %) of the included studies were designed to assess EBA, although
authors did not always precisely define this term and explicit definitions differed
between studies. In most cases, EBA was defined as the fall, or mean rate of change,
in log
10
colony-forming units (CFU) per ml sputum over various time periods or between two
time-points. Some authors did not define their outcome as EBA but used methods that
conformed to this approach, for example, decrease in sputum bacillary load of Mycobacterium TB (M. TB) from pre-treatment to day 15 of study drug treatment 33], or mean rate of decline of CFU 20], or decrease in viable count 34]. In one case, EBA was reported over 8 weeks 8]. Figure 3 and Fig. 4 summarise the reported time points in Phase IIA studies, showing that for the majority
of studies included in this review, endpoints were focused only on the first week
of treatment.

Fig. 3. Reported time points in Phase IIA studies – discrete quantitative bacteriological
time points

Fig. 4. Reported time points in Phase IIA studies – interval quantitative bacteriological
time points

EBA studies showed a range of durations from 2 to 90 days. Seven studies lasted 2 days,
three lasted 7 days, and seven lasted 14 days. Other frequent durations were 5 days
(six studies), 28 days (three studies), and 15 days (two studies). Infrequently chosen
durations were 8, 9, 30, 56 and 90 days, each of which was used for individual studies
only. EBA results were most frequently reported in a table with differing time intervals
from zero to 14 days. Measures included EBA 2 to 14 days 35], 2 to 5 days 24], and 7 to 14 days 36], along with the more common 0 to 2 days, 2 to 7 days and 0 to 7 days 37]. In other studies, EBA was reported in a figure and was therefore presented at a
number of time points, for example, once daily from days 0 to 5 38] or, once daily on days 0, 1, 2, 3, 4, 6, 8, 10, 12 and 14 35].

Some authors referred to EBA specifically as the change in log
10
CFU/ml sputum during the first 2 days of treatment and referred to ‘extended EBA’
as the decline in bacilli during the last 5 days of study drug administration (for
example, days 2 to 7) 26], 29]. Several studies, instead of reporting fall or change in log
10
CFU/ml, defined as EBA above, reported mean concentration of viable bacilli at a fixed
time point, or mean viable count (log
10
CFU/ml) 39]. CFU count was always presented in table form, and there was better agreement among
authors about the definition of this outcome. However, in one case 40] the CFU counts were standardised and in another the rate of fall of CFU counts was
reported (referred to as the ‘kill index’) 41]. The time interval over which CFU counts were presented ranged from 2 days to 56 days
in one case 6].

Phase IIB studies

Within studies designed to consider 2-month outcomes, the most frequently reported
outcome (30 %) related to culture positivity. This was measured in many ways including
time of last positive sputum culture or smear 9], and time to stable culture conversion. This was defined as the number of days from
study treatment initiation to the time of sputum collection yielding the first negative
culture that was followed by at least one subsequent negative culture and no subsequent
positive culture 42]. In one study, positivity was expressed as the percentage of cultures positive at
fixed time points such as at 28 days 39].

Regarding culture negativity, whilst most (24 %) authors opted to present the proportion
of negative cultures at a time point (usually 2 months), some used time to 28], or speed of 12], culture conversion. Negativity was more simply expressed as either a binary outcome
at a fixed time point 27], or the proportion of patients whose culture had converted at a fixed time point
10], 42]. The definition of time to culture conversion varied between studies. One study defined
time to culture conversion as the time from the start of treatment to the first of
two consecutive culture negative sputum samples on non-consecutive days that were
not followed by a positive sputum sample 10]. Another defined the outcome as the time point after which all sputum cultures were
negative 12]. All relevant studies presented results at 2 months, but also additional time points
where culture status was considered (weekly, or biweekly from zero to 8 weeks 18], 27]). Figure 5 summarises the reported time points in Phase IIB studies. Infrequently, studies reported
culture conversion over a range of days, for example, 0 to 2 days.

Fig. 5. Reported time points in Phase IIB studies – discrete time points

Notably, few studies clearly reported numbers of culture samples missing due to non-attendance,
sample contamination, or lack of sputum production at each time point.

Trial characteristics

In addition to the diversity of the outcomes reported, and the variation in their
definitions, it should be noted that there was also variation across trial characteristics.
As mentioned above, different time points were used for the reporting of outcomes.
This adds to the complication when attempting to compare the results from multiple
studies to synthesize evidence to support treatment regimens. Finally, in most cases
an estimate of variability, such as a standard deviation or 95 % confidence interval
was provided together with the point estimate. Some studies, however, only presented
a point estimate (for example, 43]). These issues all make combining evidence from multiple studies via meta-analysis
challenging.