Fidelity to and comparative results across behavioral interventions evaluated through the RE-AIM framework: a systematic review


The original search yielded 241 potentially eligible articles. After title and abstract
review, 107 articles were fully reviewed for potential inclusion and 37 were excluded.
See Fig. 1 for more details. Thirty-one additional articles were referenced in eligible articles
and coded as companion documents. These eligible papers (N?=?101) represented 82 unique intervention studies for inclusion in this review. Notably,
some of the original full articles assessed were based on the same intervention (i.e.,
companions to each other). For the remainder of the manuscript, a compilation of studies
is referred to as a “trial.” See Additional file 3 for the PRISMA checklist.

Fig. 1. Results of literature search. PRISMA representation of search strategy and results

Overall summary

Inter-rater reliability was 76 % across the first four articles coded by all reviewers.
The reviewers met to clarify operational definitions of codes. Across the remaining
97 articles (and 163 collected variables for each article), inter-rater reliability
was over 80 %. All discrepancies were resolved.

For those trials that were represented across multiple articles (?=?12), there were an average of 2.58 (±1.24) with a range (R) of 2–6 articles. There was a significant difference (p?=?0.02) in the average number of reported indicators between multiple-paper interventions
(7.9?±?3.8 indicators) and one-paper interventions (5.78?±?2.8 indicators).

Figure 2 describes (1) whether the dimension was included in the trial (i.e., reported or
not reported) as well as misreported (i.e., misidentification of indicators) and (2)
if the dimension was included, did the research design intervene for improved outcomes
related to the said dimension or was it described for context. Related to the latter,
some trials provided information describing information on a particular dimension,
but the research design did not include methods to improve that particular dimension.
For example, an author might describe that they approached five eligible schools to
deliver an intervention (adoption), but there might not be strategies or evaluation
regarding the increased uptake of the intervention at all five eligible schools. Whereas
an intervention that aimed to improve the adoption rate at the school-level would
include data on these efforts (e.g., attendance at relevant school meetings, identifying
program champions, and provision of incentives). The most accurately reported dimension
was reach (89 %), yet it was the dimension least intervened to improve (3 % of the
time). All misreporting related to misidentification of individual-level variables
(i.e., those that relate to the end-users) and setting-level variables.

Fig. 2. Accuracy of reporting and intervening status by dimension. This illustrates the proportion
of interventions that accurately reported, misreported, or did not report on each
dimension as well as the proportion of interventions that intervened to improve each

Fifty-three percent of the trials were tested using randomized controlled trial design,
17 % were evaluation studies, 9 % were quasi-experimental, 8 % were translational/dissemination
studies, 4 % were pre/post design, 3 % were cross-sectional, and 6 % were others (e.g.,
design included cross-sectional and observational methods). Sixty-nine percent of
the studies used a quantitative methodology, 30 % were mixed methods, and one study
used a qualitative approach only. Fifty-seven percent of the studies reported on the
individual-level, 26 % were both at the individual- and setting-level, 14 % were at
the setting-level, and 2 % accounted for individuals clustered within a setting (i.e.,
athletes on a team and church members within a congregation). Twenty-six trials (32 %)
targeted two or more behavioral outcomes (e.g., dietary improvements and physical activity participation) and were operationalized as “multiple behavioral
outcomes.” The remaining studies targeted smoking/substance abuse (15 %), physical
activity (10 %), disease self-management (5 %), diet (5 %), weight (2 %), and other
(12 %) or had no targeted individual behavioral outcome (19 %). The trials were conducted
in the United States (70 %), Australia, (7 %), the Netherlands (7 %), Germany (4 %),
Finland 3 %), Canada (4 %), Belgium (3 %), and one trial was conducted in both the
United States and Australia. The text of this manuscript refers to the 82 trials (all
articles included in the study (N?=?101) which are summarized in Additional file 4).

RE-AIM dimensions

The results section for each dimension describes study reporting across indicators,
the outcomes that were reported, and any qualitative or cost information that was
provided. Table 2 details the constitutive definition of the RE-AIM framework, while the text below
provides information on each collected indicator (i.e., full employment of RE-AIM).

Table 2. Individual- and staff/setting-level RE-AIM dimensions by targeted behavioral outcome
summary table

Individual-level outcomes


Overall, 17 % of the trials reported on all four indicators of reach (see Table 2). Those that reported a method to identify the target population (?=?50) used existing records (e.g., medical and registry). Sixty-eight percent of
the trials reported at least one eligibility criterion, and of those, 25 explicitly
stated exclusion criteria. These eligibility criteria were typically related to: age
(?=?37), membership (?=?33; e.g., church and school), physical or mental condition (?=?14), language (?=?14), tobacco use (?=?11), location (?=?9), activity level (?=?9), access to phone (?=?4), and others (?=?3: gender, lost job, and completed screening). The participation rate was accurately
reported for 55 % of the trials, 10 % of the trials misreported participation rates,
and one trial accurately reported reach in some articles but not others.

The median number of participants was 320 (mean (M)?=?4817 (±28,656); R 28–234,442). The trials that accurately reported on the participation rate were able
to reach 45 % (±28) of eligible and invited individuals, with a range from 2 to 100 %.
Thirty-seven trials (48 %) reported on representativeness. The number of characteristics
compared ranged from 1 to 13 with a mean of 3.90 (±3.30). Of those that examined representativeness,
17 (46 %) found at least one significant difference between those that participated
and the target population; the most common characteristics were that participants
in these behavioral trials were more often of Caucasian race (?=?5), of higher income (?=?3), and of higher education (?=?2). There were also seven studies that found significant differences in age between
participants and nonparticipants; some were older than the target audience (?=?4) and others were younger than the target audience (?=?3). All other characteristic comparisons were only reported as a significant difference
in one trial (e.g., profession, comorbidities, and English language).

Four trials (9 %) included qualitative data to address reach. One telephone interview
protocol evaluated the reach of program awareness, in which they found that 35 % of
eligible residents responding were aware of the program 26]. In a hospital worksite obesity prevention trial 27], 28], researchers captured open-ended responses for the reasons eligible persons declined
participation and found reasons to include lack of interest (56 %), no time (19 %),
and personal health or family obligations (2 %) while 22 % gave no reason. For one
trial, interviewees from ten focus groups described barriers and facilitators of participation
in a worksite smoking cessation intervention 29], 30]. Respondents provided data related to the recruitment methods to which they were
exposed and reported that better marketing, supervisor encouragement, weekly bulletins,
and announcements at worksite meetings would increase participation 29], 30]. Four trials also reported on the costs of recruitment. Of those, three reported
numerical values (R$10–252.54 per participant 31]–40]), while one study reported information that could be used to determine recruitment
costs (e.g., the costs associated with interactive voice response system that made
40,185 calls across 3695 individuals 41]).


One trial 42]–45] accurately reported on all five indicators within this dimension. Of those that accurately
reported effectiveness on individual behavior outcomes (?=?55), 89 % had positive findings on the behavioral outcome and 11 % had null findings.
These results are presented by targeted outcome in Table 2.

Twenty-five percent of the trials (?=?19) included a moderation analysis to determine robustness across subgroups. Eleven
trials (14 %) reported broader outcomes, QOL, or unintended negative outcomes. Some
measures included the Centers for Disease Control and Prevention’s Healthy Days measure
46], Patient Health Questionnaire (PHQ) 46], 47], and Problem Areas in Diabetes 2 (PAID-2) scale 47]. Five trials used qualitative measures of effectiveness; three of which used open-ended
survey items and two conducted interviews. Twenty-one trials reported attrition rates
(M?=?22 %). Qualitative data related to effectiveness primarily focused on participant
experiences 29], 30], 41], 42], 48], 49] and suggested that program adaptations for specific sub-populations could improve
participant perceptions of effectiveness 47]. Only three trials reported any measure of the costs associated with effectiveness:
two reported costs per participant ($4634 and $1295 33]–40]) and the other one reported that costs were considered in the design and analysis

Individual-level maintenance

None of the studies reported on all three indicators of individual-level maintenance.
However, nine trials (11 %) reported individual-level behavior change at least 6 months
post-treatment. All nine reported positive outcomes when compared to baseline. One
study included qualitative interviews through which participants indicated the need
for stronger volunteer and staff support to bolster individual-level maintenance 52]. None of the studies reported individual-level maintenance costs.

Setting-level outcomes


One trial, across two studies 31], 32], reported on all six indicators of adoption. Sixty-three percent of the trials (?=?52) reported on both staff- and setting-level adoption factors. Forty percent of
the trials reported setting-level adoption rates, which was, on average, 75 % (±32).
Fifteen of the trials (19 %) reported setting-level eligibility criteria; these criteria
included size, location, demonstration of need, and being within a particular health
insurance network. Twenty trials (26 %) compared the characteristics of participating
settings to all targeted settings. Five trials found significantly different characteristics,
which included: single-physician practices being less likely to participate, governmental
sector being more likely to participate, and those who had an increase in the number
of patients/respondents over time were more likely to participate.

The average staff-level adoption rate was 79 % (±28). Sixteen studies (20 %) reported
delivery agent eligibility (i.e., criteria that enables an individual to deliver the
intervention (e.g., education and role within the system)). These criteria were usually
based on expertise (?=?6), affiliation with targeted setting (?=?4), and other disparate criteria such as not planning on retiring or having enough
patients. Ten trials (12 %) compared the characteristics of participating settings
to all targeted settings (M?=?1.30 comparisons (±0.9); R?=?1–4). Only one study found significant comparisons of participating staff to eligible
staff. In this case, the delivery staff was more likely to be women and reported more
years of experience in physical activity program delivery 26]. All setting and staff indicators can be found in Table 2.

Thirteen studies used qualitative measures for adoption and found that adoption rates
were improved through partnerships and increased awareness. For example, Vick et al.
53] found that the lack of awareness, combined with scheduling conflicts, decreased the
likelihood of staff attending training; whereas partnering with representatives within
the organization led to strategic, feasible, and well-accepted training sessions and
intervention 54]. Only two studies reported monetary values associated with adoption. One reported
a total adoption cost of $21,134 35]–40] while the other indicated $15 per hour to train coaches 55].


One study reported all three implementation indicators 21]. Thirty-five trials (44 %) reported on the degree to which the program was delivered
as intended. Across all targeted outcomes, the average percent fidelity was 81 % (±16.49).
Seventeen trials (22 %) reported that adaptations were made to program delivery. Thirty
trials (39 %) provided information on the number and frequency of trial contacts,
which represented the resource of “time.” Eighteen of the trials (24 %) used qualitative
inquiry for implementation: surveys (?=?7), interviews (?=?6), observations and interviews (?=?2), focus groups (?=?2), and an implementation checklist (?=?1). Qualitative inquiry identified barriers and facilitators of implementation.
Example barriers included scheduling and staff turnover 56] as well as a lack of role clarity (i.e., understanding ones responsibilities related
to the intervention) 57] while successes were attributed to increased patient trust of care providers 49] and multilevel commitment (e.g., management and investment of partnerships 57]). Eight percent of the trials (?=?6) reported at least some data around implementation monetary costs (e.g., program
updates and manuals) but did not include raw data on costs.

Organizational-level maintenance

None of the studies reported on all three indicators within maintenance. Eleven of
the trials (13 %) reported alignment with an organizational mission. Twenty-eight
of the trials (34 %) reported on whether or not the program was still in place. Of
those that reported on institutionalization of the program, 16 (62 %) were still in
place. Eleven trials (13 %) included information on modifications that were made for
system-level maintenance. Seven trials reported on organizational attrition (M?=?9.82 % (±10.55)). Finally, 15 % reported qualitative measures of maintenance via
interviews (?=?10) and open-ended surveys (?=?2). These data indicated compatibility with their delivery system and delivery
agent skill set as well as a wide array of themes from ongoing staff and management
support (support of duration, frequency, and type of trial). No salient barriers were
identified via the interviews and open-ended surveys. No data were reported on costs
of organizational-level maintenance.