[ 123 I]FP-CIT ENC-DAT normal database: the impact of the reconstruction and quantification methods

The impact of the reconstruction method and of the quantification has been analysed for the EARL ENC-DAT database. Two simple reconstruction methods, FBP and OSEM without any corrections have been compared to the “gold standard” recommended by EARL, OSEM with AC, and SC corrections. The salient characteristics of these databases—the mean age-corrected SBR value and its natural variability—are summarised in Table 1.

Contrary to expectations, the difference between the FBP and IRNC databases, although small was found to be significant, even after phantom calibration (Table 1). This may be an indication that OSEM has not reached full convergence. On this point, the work by Seret and co-workers [12] recommended a much higher number of iterations (24 iterations with 8 subsets) for state-of-the-art quantitative OSEM algorithms that incorporates attenuation, scatter, resolution recovery, and noise suppression. Even accounting for the fact that resolution recovery per se requires an increase in iterations, this number is comparatively higher than the one used for the ENC-DAT database [2]. Such high number of iterations, however, would not be feasible with the ENC-DAT databases for two fundamental reasons. Firstly, the ENC-DAT data have been acquired following a clinical protocol (in terms of injected activity and acquisition time), which results in a number of total counts (~2 million) two order of magnitude lower than those obtained in their phantom work (~100 million). Secondly, the basic OSEM used in the ENC-DAT reconstructions does not have the noise-suppression capabilities that come with resolution modelling reconstructions to counteract the steady increase of noise with the number of iterations. Furthermore, the observation that calibration cannot fully resolve the difference between FBP and IRNC databases seems to suggest possible differences in the OSEM performance when dealing with both phantom and human data. OSEM convergence has been observed to be variable across different phantom filling ratios and, in particular, to struggle at the higher SBR values associated with low count concentrations in the background compartment used in this study [2], due to its non-negativity constraint. Human studies, on the other hand, are expected to have relatively similar background concentrations irrespectively of their striatal uptake, and should therefore be similarly affected by the non-negativity constraint across all SBR values; variability in the background concentration and striatal uptake, however, would be expected across different cameras and collimators. Convergence with iterative reconstruction in clinical studies is complex and deserves further investigation, but this is outside the remit of this work.

The ACSC reconstruction, recommended by ENC-DAT, brings a significant increase of the SBRs which, prior to phantom calibration, is of the order of 47% for both BRASS and Southampton databases (Fig. 3a, b and Fig. 4a, b, Table 2). This increase is in line with the expectations of AC boosting striatal counts relatively to the peripheral background, and SC improving the contrast between hot and cold regions.

The main imaging factor affecting quantification, however, remains the partial volume effect (PVE). Its magnitude can be estimated from the percentage difference of the BRASS and Southampton databases before the introduction of phantom calibration. The systematic difference of these two methods, in fact, is ultimately due to their different approach to the PVE (Fig. 1). While BRASS, based on direct measure of counts concentration from tight striatal ROIs, is susceptible to partial volume losses, the Southampton method, based on the Specific Uptake Size Index (SUSI) approach, is able to overcome them [5]. Furthermore, the respective choices for the reference region—the occipital cortex for BRASS and the whole brain without the striatal VOI for the Southampton method—will also contribute to their different outcome [13]. Their difference can be fully appreciated by comparing their respective pre-calibration graphs in Figs. 3 and 4a, b, where the BRASS SBRs range is much lower than the Southampton one. The magnitude of this difference, of the order of 96% (Table 3), is a clear indication of how partial volume losses outweigh by far the 47% under-estimation related to attenuation and scatter/septal penetration (Table 2). This is in line with published literature [14].

Phantom calibration brings significant changes to the databases, with a large increase of the SBR values particularly for BRASS. The aim of the calibration is, in fact, not only the harmonization of the differences in performance between different camera models, but also the recovery of the “true” SBR values. In a sense, the calibration can be thought as having three “recovery components”, dealing with the AC, SC, and PVE degradations respectively. Depending on the database used, these components may be “turned on” or “off”, and can act in combination or in isolation. For example, the AC and SC recovery components will always be “turned on” when calibrating FBP or IRNC databases, but “off” for ACSC ones; the PVE component will be always “on” for BRASS databases but “off” for Southampton ones. As for their relative magnitudes, PVE recovery is the dominant component, hence responsible for the larger changes of the calibrated SBRs, followed by SC and finally by AC (Table 6).

Accordingly, the calibration corrections needed to recover the true SBR values are much larger for BRASS than for the Southampton method, as summarised in Table 4. For all BRASS databases, the outcome of the calibration is in fact dominated by the PVE recovery, which leads to a 67% increase of the ACSC database (Fig. 3b, d). Calibration of the FBP/IRNC databases, which incorporates the additional AC and SC recoveries, leads to an increase of the order of 100% (Fig. 3a–c). In the case of the Southampton method, the calibration has to deal, in principle, with AC and SC compensations only when not applied during reconstruction; consequently, it is expected to have a significant impact on the FBP/IRNC databases but not on the ACSC one. This is confirmed by the results of Fig. 4 and Table 4, which reveal a significant increase of ~31% when comparing FBP/IRNC pre- and post-calibration (Fig. 4a, c), but no significant effect on the ACSC database (Fig. 4b, d, p?=?0.44).

When considering the inter-subject variability of the databases, as expressed by the standard deviation of the age-corrected SBRs (Table 1), it is noticeable how the calibration tends to increase the variability, particularly for BRASS. At first, this may appear disconcerting given the expectation that phantom calibration is aimed to harmonize camera performance and therefore to reduce variability. One possible explanation is that calibration, in recovering the “true” values, is actually restoring the true natural variability, which was somehow “lost” or “masked” by SPECT degradations. For BRASS ACSC, therefore, calibration will bring a pronounced increase in data variability, as its primary effect is to unmask and compensate for the differences in resolution performance across the various gamma cameras. For the Southampton databases, on the other hand, the data-variability is more consistent, as the confounding factor of PVE is inherently eliminated at source. In particular, the Southampton ACSC is the only case where the calibration brings a minor (and not significant) decrease of variability, likely to represent the result of harmonisation of residual equipment-related differences.

In principle, if full recovery was possible, phantom calibration should lead to equivalence of all databases, no matter what reconstructions or quantification methods was used. In reality, despite becoming much closer to each other, the calibrated databases remain significantly different. The success of calibration is ultimately determined by the ability of the phantom study to reproduce the clinical situation. The striatal phantom, however, is an approximate representation of a human study, due to a combination of factors such as the shape of the striatal vessels somewhat different from the human anatomy, the uniformity of the “non-specific” background that ignores the ventricular space void of activity and, above all, the lack of scatter and septal penetration of the radiation emitted from distant parts of the body. This is particularly relevant for 123I because of the presence of low-abundance highly-penetrating emissions, as demonstrated by the comparison of phantom and human results at both raw data level (projection counts, Table 5) and quantification level (SBR, Table 6). In Table 5, differences in scatter and septal penetration between phantom and human data are negligible for the SCl window (they oscillates around 0) but show a marked increase in humans for the SCu window (last two columns). As expected, the stopping capability of medium energy collimators (Philips IRIX) is noticeably superior to the low energy ones used in all other cameras (columns 3 and 5). Interestingly, a marked difference in collimator performance across manufacturers is also evident.

The fact that the phantom is not fully representative of a human study suggests that calibration can be though as a “first order”, camera-specific, compensation. Subject-specific “second order” effects, associated with the individual anatomy and tracer binding, can only be corrected by subject-data driven approaches. Consequently, calibration alone is not sufficient to fully resolve databases differences nor can ensure full recovery of the “true” SBR.

This would give a new insight in explaining the results in Table 1. The significant differences, between the non-corrected (FBP and IRNC) databases and the ACSC one, still present after calibration, can be explained as “second order effects” related to the fact that scatter and septal penetration correction are performed on individual basis for the latter, but as generic camera-dependent compensations for the former. Furthermore, the observation that the differences between the calibrated FBP, IRNC and ACSC databases are relatively smaller for BRASS compared to Southampton, can be explained as a direct reflection of the dominance of partial volume recovery in the BRASS calibration for all three databases, the magnitude of which would mask the more subtle second-order effects associated with their different approaches, generic or patient-driven, to scatter compensation.

Similarly, the fact that the differences between the BRASS and Southampton databases remain significantly large after calibration, ACSC mean values of 6.4 and 9.0, respectively, underlies the phantom capability to compensate for PVE at a first-order level only. The proposition that the Southampton ACSC mean SBR of 9.0 could be a close representation of the “true” value is supported by the work by Soret et al [15], which reports a mean of 8.6 in patients suffering from Alzheimer’s disease (this neurodegenerative disease is not characterized by loss of striatal dopamine transporters) obtained by applying, beside ACSC, an individualised MRI-driven partial volume correction [16] to a counts concentration “BRASS-like” calculation of the SBR.

Of the compensation methods for scatter and septal penetration, a known disadvantage of TEW compared to alternatives such as transmission-dependent convolution subtraction (TDCS) [17] is the increase of Poissonian noise in the projections data. However, being patient-driven, TEW has the advantage of being able to take into account the individuality of the tracer distribution in the whole body and to correct for its effect on the brain image, an individuality which is ignored by the pre-determined camera-specific factors used by TDSC. The observed reduction, ~10%, in inter-subject variability recently reported for the ENC-DAT ACSC database when using TDCS as opposed to TEW [13] could therefore be explained as natural variability which is missed by this methodology.

The ENC-DAT database has been acquired without the CT component because it was not available on most of the participating cameras. Access to SPECT/CT systems in clinical practice would provide CT-derived attenuation maps which, besides delivering a more accurate attenuation correction, could also be incorporated in iterative reconstructions for driving scatter corrections based on Monte Carlo simulation algorithms [18]. In these cases, however, the adoption of the ENC-DAT database in clinical use would require further validations, to assess the extent of the differences of the SBR values obtained with the different attenuation and scatter correction methods. The latter, again, would not be able to account for the extra-body activity and, therefore, could lead to SBRs significantly different from those obtained with TEW.

Although outside the scope of this study, it is worth mentioning that there are further confounding aspects encountered in routine clinical investigations which have an effect on resolution and SBR quantification. Tremor-related patient movements and radii of rotation larger than that standard 15 cm used for the ENC-DAT database, sometimes necessary to accommodate for patient anatomy or claustrophobia, may lead to significant reductions of the SBR [1921]. In particular, the radius-dependence of the SBR should be considered as a “second order” effect which the phantom calibration cannot correct for, and whose severity depends on both reconstruction and quantification methods. While the Southampton method was found to be not affected by it, the use of morphological VOIs led to a loss of approximately 3% per cm additional radius [21], which should be taken in consideration when using the ENC-DAT database in clinical practice.

Ultimately, the results of this study are representative of the on-going struggle between robustness and accuracy in SPECT imaging. The determination of an accurate SBR would require attenuation, scatter and partial volume correction to be subject-data driven, with phantom calibration having the function of removing residual camera-related variability. On the other hand, phantom-driven compensations will produce an “index” less dependent on the reconstruction method at the expense of accuracy and loss of individual variability. The results of the present study do not allow conclusions about the impact of reconstruction and quantification methods on the diagnostic utility of the specific binding ratio; its clinical relevance in the context of the six databases here considered has been investigated in the companion paper by Dickson et al [8].