Pupil diameter changes reflect difficulty and diagnostic accuracy during medical image interpretation

The present pilot study was designed to test for tonic and phasic modulation of pupil diameter while pathologists interpreted and diagnosed digitized whole slide images of breast biopsies. In accordance with recent theories of the LC-NE system’s role in modulating pupil diameter through dynamic changes in exploitation-exploration cognitive processes, we made two primary hypotheses. First, we expected that tonic pupil diameter would be influenced by perceived difficulty of a case, reflecting the overall engagement of exploratory search processes. Second, we expected that phasic pupil diameter upon fixating in a predefined diagnostic region of interest would be modulated by case difficulty, and predict whether a diagnosis converged with consensus reference diagnosis. This pattern would demonstrate the recognition of task-relevant visual features and engagement of exploitation-based processes. Overall, we found support for these hypotheses, and also identified some patterns that motivate continuing research.

Tonic pupil diameter has been shown to reliably correlate with the level of mental effort involved in processing and interpreting visual information [7]. Data from this study support and extend this concept among physicians processing and interpreting visual medical data, demonstrating that biopsy cases associated with higher levels of perceived clinical difficulty elicited the largest pupil diameters. Using individual pathologists’ difficulty ratings, we found that the lowest tonic pupil diameters were noted in the low difficulty cases, and there was a step-wise increase in pupil diameter corresponding to the pathologists’ difficult ratings of biopsy cases. Thus, tonic pupil diameter may be a useful, unintrusive indicator of perceived difficulty during physicians’ interpretive process. Pupil diameter data may prove a reliable surrogate for subjective difficulty ratings in training contexts, objectively assessing the mental effort a pathologist is exerting toward interpreting and diagnosing a case; this effort likely reflects active exploration of image areas toward identifying diagnostically relevant regions. In the future, monitoring tonic pupil diameter may also inform the timing and administration of medical decision support tools, interventions designed to aid interpretation and facilitate access to information and second opinions [25].

Phasic pupil diameter correlates with dynamic event-related variations in pupil diameter in response to viewing and exploiting information particularly relevant to interpretation. Though speculative, some theories suggest that phasic increases in pupil diameter may even be necessary for gathering the clinical information required for successful diagnosis [5, 6]. Data from this study provide some support this notion, demonstrating that pupil diameter is temporarily modulated by the difficulty of a case and reflects the pathologists’ diagnostic agreement with consensus reference diagnosis, possibly indicating the perceived diagnostic resolution of a case. Cases rated low difficulty tended to not elicit any positive- or negative-going deflection of pupil diameter upon arrival in a diagnostically relevant region of interest. Though entirely speculative, in relatively easy cases pathologists may have already arrived at successful interpretation prior to viewing a dROI. For instance, though benign and invasive cases may have particular regions most representative of those diagnoses, regions adjacent to the dROI likely hold similar (and easily extracted) informational value.

In contrast, cases rated high difficulty tended to elicit robust pupil diameter changes upon viewing a dROI. Interestingly, the positive- versus negative-going waveform reflected convergence versus divergence from the pathologists’ agreeing with the consensus reference diagnosis, respectively. When the pupil diameter waveform was positive-going upon viewing, pathologists ultimately delivered a diagnosis in agreement with consensus. It could be the case that this positive deflection reflects pathologists rendering particular regions of the image as highly relevant to their successful interpretation. Of course, this could occur prior to conscious awareness, affording the accurate perception and integration of relevant information [7]. In addition, we found a strong negative-going deflection when pathologists ultimately delivered a diagnosis different than the consensus reference diagnosis. This particular pattern was not hypothesized, though we believe it is compatible with some extant literature [6]: though speculative, when pathologists view a dROI but do not interpret it as relevant to their task, pupil diameter decreases and valuable visual details are not sufficiently processed. This failure to adequately identify regions as relevant to interpretation may ultimately result in assigning a less accurate diagnosis to the case. To our knowledge this is the first time such a pattern has been reported, extending current theories of pupil response [4, 5] and motivating continuing research into this phenomenon.

The present data suggest that phasic changes in pupil diameter may be used to monitor and guide the interpretive process. Using standardized cases with pre-determined regions of diagnostic relevance and consensus reference diagnoses, future computerized training platforms could monitor phasic pupil response and adaptively customize real-time feedback and cueing. For instance, if a trainee’s pupil diameter remains constant or shows a negative-going deflection upon fixation in a pre-determined dROI, they may have failed to identify a region as relevant to interpretation. Adaptive learning systems might leverage this information for more personalized, timely, and effective feedback [26].

Some limitations of the present study are worth considering. First, though our sample size is substantially larger than other eye tracking studies [1, 3, 2729] using samples of medical practitioners, a sample with even greater breadth of pathology experience and specialization may reveal additional or different patterns of interest, allowing also an assessment of reliability for the present results. Second, though the present results were seen with breast biopsy images, we cannot draw any conclusions regarding whether the patterns of pupil size variation may generalize to other biopsy types or medical specialties (e.g., radiology). Third, while we adjusted for image brightness and encouraged a consistent seating distance from the monitor, future research might benefit from stabilizing the head with a chin rest. Indeed anterior-posterior head movement toward and away from the monitor during image interpretation might influence recorded pupil diameter. However, we note that the experimenter ensured a consistent participant seating position (60 cm from monitor) to maintain eye tracking quality, and remote eye trackers have been validated as reliable instruments for monitoring task-evoked pupil responses [30]. Fourth, future research may benefit from comparing pupil diameter responses elicited when pathologists view regions of interest established through consensus versus participant-specific regions of interest deemed of diagnostic relevance during intepretation [31]. Fifth, all data analyses were conducted and presented in aggregate format, without direct consideration of intra- and inter-individual differences in pupil response; to ensure applicability to individual readers, continuing research may benefit from investigating the reliability of the present patterns within individuals. Finally, though digital images (unlike glass microscopy) provide a tractable mechanism for eye tracking and are increasingly used for gathering second opinions [32], the U.S. Food and Drug Administration (FDA) has not yet approved digital whole slide images for the rendering of primary diagnoses.

In conclusion, we provide preliminary evidence that pupil diameter may prove valuable in monitoring pathologists’ interpretive process and reflecting agreement with consensus diagnosis during image interpretation. This result was found with tonic differences during the ongoing interpretive process, and more specifically with phasic differences in response to viewing diagnostically relevant image regions. These findings with physicians support theories of pupil response suggesting dynamic interactions between LC-NE function and the engagement of exploit-versus-explore cognitive control states. Uniquely, the present findings extend predictions made by these theories to the challenging real-world setting of medical decision making with high-stakes outcomes.