Monte Carlo-based noise compensation in coil intensity corrected endorectal MRI

In this section, the problem is formulated and the process of how the noise-compensated
image is reconstructed is discussed.

Problem formulation

The acquired MRI magnitude image, V, can be expressed as the following relationship 35]:

(1)

where s is the pixel location, G is the noise-compensated reconstruction and N is the non-stationary noise. Knowing the noise process N, Eq. 1 can be reformulated as an inverse problem where the noise-compensated reconstruction
G can be found. Bayesian least-squares estimation 36]–38] is used to estimate G that minimizes the expected squared estimation error. This formulation is shown below:

(2)

Taking the derivative of Eq. 2:

(3)

Then setting the derivative in Eq. 3 to zero:

(4)

Simplifying to:

(5)

In Eq. 5, G(s) can be estimated using the conditional mean of G(s) on V(s), E(G(s)|V(s)), or the mean of the posterior distribution, p(G(s)|V(s)). An estimate of the posterior distribution, p(G(s)|V(s)), can be calculated using a spatially-adaptive importance-weighted Monte-Carlo sampling
approach. The approach is adapted to account for the non-stationary Rician characteristics
of MRI magnitude data. This is explained in more detail in the next section.

Spatially-adaptive Rician distributed Monte Carlo Posterior estimation

MRI magnitude data is Rician distributed, following:

(6)

where ? and ? are parameters that control the distribution’s scale and skew and I0
is the modified Bessel function of the first kind with order zero. As a result of
coil intensity correction, the data’s Rician distribution becomes spatially-dependent
and results in the following distribution, where x0;?,??0:

(7)

This distribution can be accounted for in estimating the posterior distribution via
an importance-weighted Monte Carlo sampling approach 39]. The approach forms ?, a set of samples and importance weights selected from a search space, ?. Pixels, s k
, are selected in a region around a pixel of interest, s0
, and from these samples, a subset are collected randomly using an instrumental distribution,
Q(s k
|s0
), such as a uniform distribution. For each randomly drawn pixel, s k
, an acceptance probability, ?(s k
|s0
) (Eq. 8), is calculated which indicates the probability that the neighbourhood of s k
is similar to the neighborhood of s0
:

(8)

where x(j)=h k
[j] and ?(j)=h0
[j]. The terms h k
[j] and h0
[j] denote the j t h
pixels in the neighbourhoods around s k
and s0
. The variable ? normalizes ?(s k
|s0
) so that in the case the neighbours of s k
are duplicates of s0
, ?(s k
|s0
)=1. The variables is the estimated scale, for the pixel of interest, s0
(its estimation is explained in more detail in the following section). This acceptance
probability is used to determine if the sample s k
is a realization of the posterior p(G(s)|V(s)) and should be accepted into the set ?. The acceptance probability reformulates the Rician-distributed statistics to handle
the non-stationarity of the coil-intensity corrected MRI data when deciding whether
a pixel is accepted or rejected. To use the acceptance probability, a random value
u is first generated from a uniform distribution. Then, the pixel s k
is accepted into the set ? if u??(s k
|s0
), otherwise it is rejected. The process of selection and acceptance is continued
until N samples are accepted into ?. The posterior distribution estimate can then be calculated using a weighted-histogram
39]:

(9)

where ?() is the Dirac delta function and Z is a normalization term to enforce . The posterior distribution can then be used to calculate the noise-compensated reconstruction
using Eq. 5.

Non-stationary unified ERC parametric model

To estimate the posterior distribution p(G(s)|V(s)) in a spatially-adaptive manner, the scale parameter of each pixel of interest,
, is estimated using maximum likelihood estimation,

(10)

where x are the observed intensities in V(s) and ? are the parameters to be estimated: in this case, the scale parameter, . To refine the scale estimation, an existing SNR profile, defined as ?(?), which is characteristic to a given ERC, is fitted. Given an ERC, an SNR profile
can be mapped to characterize the change in SNR as a function of distance from the
ERC surface. Literature has shown that the ERC SNR profile differs from a rigid and
inflatable coil, however both coils share a common trend where there is an SNR gain
nearest the coil surface which diminishes with distance 40]–42]. Considering the SNR depth profile from posterior to anterior of a rigid coil, a
sharp increase in SNR of 3 to 5 times the normal SNR is demonstrated at the ERC surface.
This increase is followed by a decrease through the peripheral zone and central gland.
Despite the quick decline in SNR, the peripheral zone still experiences a gain in
SNR of 1.5 to 3 times. The continual decrease then finds the central gland with only
a fraction of the SNR 40]–42]. An inflatable coil has demonstrated a weaker response with less SNR increase near
the coil. In addition to the variation between SNR profiles for inflatable and rigid
ERC, ERC brands have their own characteristic profiles which can be determined by
measuring phantoms. Two SNR profiles were modeled in this study using the findings
from Venugopal et al. 40] for two ERCs: a Hologic rigid ERC and a Medrad inflatable ERC. The inflatable and
rigid ERC SNR profiles demonstrate a 1 and 5-fold improvement in SNR at the ERC surface
respectively with an exponential drop leading to a final abrupt drop. The full algorithm,
Adaptive Coil Enhancement Reconstruction (ACER), is summarized in Algorithm 1.

Experiments

To interpret the performance of the proposed approach, clinical patient data and phantom
data were used. Clinical patient and phantom endorectal T2 (spin-spin relaxation time)
and axial diffusion-weighted MRI (DWI) corrected with the pre-calibration coil intensity
correction approach by GE called Phased array UnifoRmity Enhancement (PURE) was collected
at Sunnybrook Health Sciences Center. Two types of coils were used to acquire the
data: an inflatable Medrad eCoil ERC and a rigid Hologic ERC. The data was collected
using a GE Discovery MR750 3 T MRI for phantom data (inflatable coil only) and a GE
Signa HDxt 1.5 T MRI for patient data (collected with both rigid and inflatable ERCs).

The proposed approach was compared against three other MRI denoising approaches: 1.)
an optimized variance-stabilizing transformation for Rician distributions (ROVST)
19], 2.) noise removal by a multi-resolution adaptive non-local means approach (ANLM)
29] and 3.) a linear minimum mean squared error estimator (LMMSE) 32]. The ROVST, LMMSE and ANLM codes used for comparison were provided by their respective
authors. All approaches were implemented using MATLAB and the parameters were selected
to provide a reasonable balance between prostate detail and noise compensation in
the background. The experimental setup for the phantom and patient experiments are
described in more detail in the following sections.

Phantom experimental setup

For phantom experiments, a multi-modality prostate training phantom from Computerized
Imaging Reference Systems Inc (CIRCS Model 053) was used. The phantom is contained
within a 12×7.0×9.5 cm clear container made of acrylic. The container has two openings
for the probe (front – 3.2 cm diameter and rear – 2.6 cm diameter). Located inside
the container is a prostate replica composed of high scattering Blue Zerdine (5.0×4.5×4.0
cm) that is placed in a water-like background gel with little backscatter attenuation
(?0.07 dB/cm-MHz). Within the prostate itself, there are three 0.5?1.0 cm lesions
placed hypoechoic to the prostate. The urethra and rectal wall are made of low scattering
Zerdine with diameter of 0.70 cm and dimensions 6.0×11×0.5 cm respectively.

This phantom was then placed in a tub of water to increase signal amplification and
placed between cushions to elevate and stabilize the phantom during acquisition and
to improve the realism of the phantom. The phantom was then imaged with the inflatable
Medrad Prostate eCoil MR endorectal coil using T2 MRI and DWI. Both T2 and DWI MRI
were acquired with the built-in pre-calibration correction approach PURE using one
excitation with a 3 T GE Discovery MR750. The three phantom data sets were acquired
using 1) DWI b=0 s/ mm
2
, 2) DWI b=1000 s/ mm
2
and 3) T2 and the central slice selected for experimentation. As a result of PURE
correction, the cushion in these slices are emphasized by a noise band shown in Fig.
2 for T2. To focus on the phantom itself, these slices were cropped.

Fig. 2. Left: Uncorrected uncropped T2 slice. A noise band (red) is present due to PURE correction
which amplifies the noise around the cushion used to stabilize the phantom during
imaging. Right: Corresponding slice cropped for processing to include only the ROI
with selected regions (blue and red) for SNR and CNR calculation on phantom DWI and
T2

The T2 data collection protocol using the GE Discovery MR750 scanner used for phantom
imaging is based on a standard clinical endorectal DWI data collection protocol used
at the Sunnybrook Health Sciences Centre, and described in detail in Table 1. Furthermore, the DWI data collection protocol using the GE Discovery MR750 scanner
used for phantom imaging is a standard clinical endorectal DWI data collection protocol
used at the Sunnybrook Health Sciences Centre, and described in detail in Table 2. Central slices from each modality were then considered for SNR and CNR.

Table 1. Phantom endorectal T2 data collection protocol

Table 2. Clinical endorectal DWI data collection protocol

Patient experimental setup

The second experiment evaluates the image reconstruction performance of the various
tested approaches on endorectal T2 axial MRI with PURE within a clinical scenario.
The data was collected and then selected for this study retrospectively using a GE
Discovery 1.5 T Signa HDxt MRI scanner, a Medrad eCoil inflatable ERC or a Hologic
rigid ERC. Institutional research ethics board approval was obtained from the Research
Ethics Board of Sunnybrook Health Sciences Centre and patient informed consent for
this study was obtained. For the purpose of evaluating imaging reconstruction performance,
fourteen patient cases were used in this study. Eleven patients were imaged using
an inflatable Medrad coil and the central slices were selected for analysis. Three
patients were imaged using a rigid Hologic coil and three slices were selected from
each volume and considered as a separate case. The patients ranged in age from 54?79
years with a median age of 72 years. The T2 data collection protocol using the GE
Discovery 1.5 T Signa HDxt MRI scanner used to perform patient imaging is a standard
clinical endorectal T2 data collection protocol used at the Sunnybrook Health Sciences
Centre, and described in detail in Table 3. The central slices from each patient case were assessed using SNR, CNR and edge
preservation and 3 cases were selected to be qualitatively assessed via a subjective
scoring method.

Table 3. Clinical endorectal T2 data collection protocol

Results and discussion

Following the experimental setup, a number of quantitative and qualitative analysis
methods were executed to evaluate the performance of the proposed approach against
the state-of-the-art techniques.

Phantom experiment

For the phantom experiments, the noise suppression approaches were compared using
signal-to-noise ratio (SNR), contrast-to-noise ratio (CNR) and visual analysis. P-values were also calculated to determine the statistical significance of the SNR
and CNR results. The null hypothesis used was that the proposed ACER approach had
no improvement for a subjective metric as compared to a given correction approach.
P-values were calculated for a two-tailed normal distribution with a statistical significance
level of 5 %.

Due to the known homogeneity of the phantom, for quantitative analysis, SNR and CNR
were calculated for two regions: one region on the phantom farthest away from the
coil and a second region on the prostate itself. These regions are shown in Fig. 3 in the uncorrected image in blue and red respectively. SNR and CNR (in decibels)
were calculated as follows:

(11)

Fig. 3. Noise suppressed T2 phantom experiment results: A background region (blue) and a prostate
region (red) are shown where the SNR and CNR were calculated in the uncorrected (UC)
slice. ACER maintains a good balance between noise compensation in smooth regions
while retaining edges

In the SNR equation, the parameter, , defines the mean value of the region and ? signifies the standard deviation of the region. The SNR metric used in this study
is based on the region-of-interest (ROI) approach commonly used in clinical practice
43], 44]. In CNR, and , denote the mean values of the selected background and prostate regions respectively
and ? is the standard deviation of the background region which is more indicative of the
noise process.

The final SNR and CNR results are shown in Tables 4, 5, and 6 with visual results for the T2 phantom case in Fig. 3. All approaches demonstrated improvement upon the uncorrected (UC) slice with the
proposed approach, ACER, having the highest average SNR in the selected background
and prostate regions. The uncorrected (UC) slice refers to the slice with no application
of any algorithm. ROVST and LMMSE proved to have the next best SNRs in the two regions
however, considering the visual results, noise was under or overcompensated with deterioration
of structure. In the case of DWI at b=1000 s/mm
2
, where noise was more prominent and contrast was already low, ROVST had greater SNR
metrics over ACER however, at the cost of structure preservation. Finally, ANLM exhibited
the least SNR improvement in both selected regions, indicating an inaccurate noise
estimate.

Table 4. Phantom SNR analysis of a selected prostate region (in dB with highest measures in
bold). ACER proved to have the greatest SNR improvement in the prostate regions. ANLM
showed an inaccurate noise variance estimate which led to less significant SNR improvement

Table 5. Phantom SNR analysis of a selected background region (in dB with highest measures
in bold). ACER proved to have the greatest SNR improvement in the background regions.
ANLM showed an inaccurate noise variance estimate which led to less significant SNR
improvement

Table 6. Phantom CNR analysis based on the selected background and prostate regions (in dB
with highest measures in bold). ACER demonstrated the greatest improvement in CNR
illustrating its capacity to augment the detail within the prostate

CNR analysis (Table 6) showed that ACER had the highest average CNR. ROVST had the second highest average
CNR and ANLM with the least improvement. These results indicate ACER’s ability to
increase the contrast between the background and prostate regions, thereby improving
the visibility of detail within the prostate.

P-values (Table 7) were also calculated for the SNR and CNR results to verify the statistical significance
of the differences between the average SNR and CNR compared to ACER. ACER demonstrated
statistical significance for SNR and CNR compared to all other methods except for
ROVST, in which it was shown that the improvements of ACER over ROVST is statistical
insignificant for SNR and CNR.

Table 7. The p-values for the metrics measured for the phantom experiments compared to the
proposed ACER method. Values below 0.05 are shown bolded which indicate the average
score across the cases has statistical significance

The noise suppressed T2 phantom slices for each approach are shown in Fig. 3. The proposed method demonstrates the best noise compensation while enhancing the
detail contrast within the prostate. LMMSE and ROVST also compensate for noise however
at the cost of visible structure and edge blurring.

Patient experiment

The noise suppression approaches were then compared using patient data by analyzing
SNR, CNR (Eq. 11), edge preservation (Eq. 12) and subjective scores. P-values were also calculated to determine the statistical significance of the SNR
and CNR results. Here, the null hypothesis used was that a given approach had no improvement
for a subjective metric as compared to the uncorrected image. P-values were calculated for a two-tailed normal distribution with a statistical significance
level of 5 %.

For the SNR and CNR assessment, a high noise, structure-free region in the background
was selected similar to the phantom experiments. A second homogeneous region with
higher intensity was then selected for CNR calculation. The results are shown in Tables
8 and 9 where all approaches improved upon the background SNR of the uncorrected slice. In
the case of background SNR, ACER had the highest average SNR with ROVST in second.
The visual results for ROVST and LMMSE demonstrated that in regions far away from
the ERC, noise was effectively removed however, at the cost of detail within the prostate.
ACER and ANLM were more effective in retaining the prostatic detail, with ACER having
an average improvement over the uncorrected slice of 11.7 dB. Similar to the background
SNR results, ACER had the highest average CNR with ROVST in second. ACER demonstrated
an average 11.2 dB improvement over the uncorrected slice in CNR. Subsequent p-value
analysis (Table 10) showed the average improvement over the uncorrected slices for each approach was
statistically significant with p-values of less than 0.05.

Table 8. The patient experiment SNR of a background region are shown (largest values are shown
in bold). ACER demonstrates an average increase of 11.7 dB for SNR over the uncorrected
(UC) slice which has no noise suppression applied

Table 9. The patient experiment CNR of two regions are shown (largest values are shown in bold).
ACER demonstrates an average increase of 11.2 dB for CNR over the uncorrected (UC)
slice which has no noise suppression applied

Table 10. The p-values for the metrics measured for the patient experiments. Values below 0.05
indicate the average score for all slices corrected by each approach represents statistically
significant change from the uncorrected slices. All approaches have p-values below
0.05

The edge preservation (EP) measurement evaluates image edge degradation. The EP measurement
compares the noise-free reconstruction with the uncorrected image and can be calculated
as follows 45]:

(12)

where ?
2V and are the Laplacian of the intensity bias corrected image and noise-free reconstruction
respectively using a 3×3 filter. The parameters, and , are the mean values of a neighbourhood around ?
2V and . An image where there is perfect EP results in a measurement of ?=1. This refers to the technique’s ability to retain the structure and edges of the
image. For the purpose of this study, since noise can be recognized as edges or details,
the EP metric is calculated for the prostate gland only using a user defined mask.
This region was selected for high SNR and high importance for detail preservation.

Considering the EP of the noise compensation approaches (Table 11), ANLM had the highest average EP with ACER having the second highest. In the real
T2 cases, noise was more prominent than compared to the phantoms and as a result,
more compensation was required to suppress the noise. This led to overcompensation
in other regions where detail is important. Following the conclusions made in the
phantom experiment, ROVST and LMMSE led to over suppression of noise and a lower EP
measurement. ANLM however, had better EP for all but one case, as a consequence of
its insufficient noise compensation. Due to the strong presence of noise in these
slices, the Laplacian operator of the EP metric realized noise as edges. The insufficient
noise suppression by ANLM resulted in structure preservation in the prostate, however
also retained noise in regions of low SNR. This was demonstrated by the lower average
background SNR compared to ACER. ACER proved to have a suitable balance of noise suppression
and EP as a result of the non-stationary unified ERC parametric model used.

Table 11. Patient experiment edge preservation results: ANLM has the highest average edge preservation
(EP) metrics as a result of insufficient noise suppression. ROVST and LMMSE demonstrate
lower average metrics as a result of overcompensation. ACER defines an optimal balance
between noise suppression and edge preservation which enhances visualization with
the second highest EP metrics

The EP analysis is further supported by the visual results shown in Figs. 4, 5 and 6. LMMSE and ROVST are able to apply moderate noise suppression in the background regions
where signal is low, however nearest the coil the prostate details are difficult to
visualize due to overcompensation. ANLM is more effective in retaining the detail
within the prostate region however at the cost of retaining the noise farther away
from the ERC at high noise levels. ACER strikes an optimal balance between detail
preservation within the prostate where signal is higher and effectively suppresses
noise in the regions with low signal. This correction allows for improved visibility
for diagnosis.

Fig. 4. Case 12: A central T2 MRI slice from a patient imaged using a Hologic rigid ERC with
moderate noise compensated by various approaches. ACER maintains the detail within
the prostate while compensating for the background noise

Fig. 5. Close-up views of background (left column) and prostate (right column) regions for
Case 12. The selected regions are shown in Fig. 8

Fig. 6. Case 3: A central T2 MRI slice from a patient imaged using a Medrad inflatable ERC
compensated by various approaches. LMMSE and ROVST suppress noise in the background,
consequently blurring details within the prostate. ACER effectively compensates the
noise in low signal regions while taking advantage of the high signal near the coil.
ANLM maintains similar detail preservation however retains some noise

Image analysis and subjective interpretation

To appropriately assess the quality of the noise compensation approaches, a blind
subjective scoring system similar to the evaluation system proposed by Walsh et al.
46] was used. In this system, the scorers were unaware of which approach was applied
on the compensated data presented to them. A central slice from three volumes was
selected and evaluated by seven evaluators ranging in experience. They are listed
below from most to least experience:

MH, 16 years of clinical radiology experience with specialization in genitourinary
cancers and 11 years of experience interpreting prostate MRI

LM, 7 years of clinical radiology experience with specialization in cancer imaging

FK, 5 years of prostate MRI research experience

HC, 1.5 years of clinical radiology experience

AM, 1.5 years of clinical imaging research experience

JK, 2 months of clinical prostate MRI experience

KC, 50 hours of clinical prostate MRI experience

To collect the subjective scores, the noise-suppressed and uncorrected versions of
three slices were presented to the evaluators in an unknown and random sequence. Based
on the individual slice, they were asked to assess the reconstruction based on the
following criteria: contrast, sharpness, lack of noise and fitness for purpose. These
criteria can be scored using the following terms: very poor, poor, satisfactory, good
or very good. For the sake of our evaluation, we assigned these scores from 1 to 5,
with 1 being very poor and 5 being very good. The rank sums (Eq. 13), median and F-pseudosigma scores (Eq. 14) across all slices and evaluators were calculated and are shown in Tables 12, 13 and 14. Histograms of each scoring criterion and the frequency of each score across all
evaluators is included in Fig. 7.

Fig. 7. Subjective scoring histograms for the compared approaches. The y-axis depicts Frequency
(0 to 22) and the x-axis depicts the subjective score (1 to 5)

Table 12. The rank sum subjective score values (with highest scores shown in bold): ACER has
the highest rank sum for contrast and lack of noise

Table 13. The median subjective score values (with the highest scores shown in bold): ACER and
ANLM demonstrated the same median scores as UC except for lack of noise where all
approaches improved upon UC

Table 14. The F-pseudosigma subjective score values (with the lowest scores shown in bold):
With the exception of the unanimous decision that LMMSE had poor sharpness, most of
the criteria for the approaches had high variance indicating large inconsistencies
in opinion implying that personal preference has a large impact upon the approach

The rank sum, S R
, is the total of all subjective scores by all the evaluators for a particular criterion.

(13)

where N is the number of evaluators and M is the number of slices evaluators evaluated and S ij
are the individual scores of each evaluator for each slice. The total rank sum can
then be used to determine whether in general the evaluators decided a particular criterion
was high or low for a given approach.

The next metric considered is the F-pseudosigma, F ?
, which is a measurement of variance and is calculated using:

(14)

where IQR is the interquartile range. A smaller F-pseudosigma denotes a more precise
score.

Considering the histograms (Fig. 7), rank sum (Table 12), median (Table 13) and F-pseudosigma (Table 14) metrics for contrast, ACER had the highest rank sum with a median score of satisfactory.
It also had the smallest F-pseudosigma which indicates there was little variation
between all scores. For the sharpness criterion, it was interesting that the uncorrected
image had the largest rank sum with ANLM having the next highest rank sum. ACER, ANLM
and uncorrected tied with the highest median scores of satisfactory however also had
the highest F-pseudosigmas indicating large variation in opinion. It was unanimous
however that LMMSE had very poor sharpness and was found to be less sharp than the
uncorrected slices. For the lack of noise criterion, ACER again had the largest rank
sum with a median score of good. All correction approaches had high rank sums and
median scores of good however again, F-pseudosigmas hinted at large variance in opinion.
This may have been caused by the large number of evaluators and the variance in noise
level between cases. Finally, ACER and ANLM had the highest rank sums for fitness
for purpose with a median score of satisfactory. LMMSE and ROVST were found to be
unfit for the purpose in comparison to uncorrected slices. It is intriguing to point
out that evaluators found that the uncorrected slices were just as sufficient for
analysis as ACER and ANLM however there was large variance in opinion with large F-pseudosigma
scores.

Visual analysis

Visual results for two different cases are shown in Figs. 4 and 6 for a Hologic rigid ERC and a Medrad inflatable ERC respectively. The results demonstrate
ACER’s ability to retain prostate detail with effective compensation of background
noise using different ERCs with different SNR characteristics. In Fig. 4, it is evident that LMMSE and ANLM are able to reduce the noise in the background
regions however with noise still visibly present. ROVST does a better job at compensating
for noise however upon closer inspection of Fig. 4 in regions specified by Fig. 5 for a background and prostate region (Fig. 8) it is apparent that the level of detail is compromised for these approaches. ROVST
and LMMSE approaches were unable to preserve the tissue texture within the prostate,
demonstrating oversmoothing in the prostate in order to compensate for the high level
of noise in the background. In contrast, ANLM was able to retain the detail within
the prostate however showed some noise in the background. ACER successfully balances
the noise reduction and the detail preservation by incorporating the ERC SNR profile
as well as the non-stationary characteristics of the MRI data. Similar conclusions
can be made when considering the performance of the approaches for the inflatable
ERC case (Fig. 6). In this example, the ANLM applies insufficient noise compensation and shows evidence
of noise. ROVST and LMMSE suppress the noise however at the cost of removing detail
in the prostate. Again, ACER exhibits apt noise compensation while retaining tissue
texture and details.

Fig. 8. Selected background and prostate regions (shown on the uncorrected image) for closer
inspection in Fig. 5

Timing analysis

The various MRI compensation approaches were also analyzed based on their computation
times. Tests were completed on a 3.10 GHz AMD Athlon(tm) II X3 445 processor with
4.00 GB of RAM. The various approaches were not optimized for timing performance.
The timing analysis is shown for the patient data in Table 15. The LMMSE approach demonstrated the fastest computation times with an average computation
time of 0.13 s while ANLM exhibited the slowest computation time with an average calculation
time of 1060 s. The proposed approach, ACER, showed middle range performance with
an average computation time of 284 s.

Table 15. Computation times for each approach on the real T2 endorectal MRI shown in seconds.
Shortest computation times are shown bolded. LMMSE had the shortest average computation
time with 0.13 s while ANLM had the longest average computation time with 1060 s