Testing Uncertainty models in Machine Learning Systems

In a world of uncertain human insights, embracing uncertainty could help machines and humans work together more effectively and reliably. While current systems are programmed to assume that human interventions are always accurate and confident, this research considers uncertainty in such interactions. A team of researchers from the University of Cambridge has created ? UElic,? a platform to collect real-world human uncertainty data and demonstrate its value in improving models? ability to handle uncertainty. It emphasizes the significance of allowing humans to express uncertainty in enhancing the reliability of machine learning models.

The researchers introduce concept-based models that aim to enhance the interpretability and enable human interventions for correcting the errors. It involves supervised learning with inputs(x), concepts(c), and outputs (y), where the concepts can be either binary or categorical and may encompass uncertainty. They used an image classification dataset so that humans could provide their feedback and indicate uncertainty while labeling a particular image. These models predict concepts using neural networks, focusing on concept embedding models (CEMs) with an extension of concept bottleneck models( CBMs).

The research questions explored how concept-based models handle human uncertainty at test time and how they can better support human uncertainty and the level of uncertainty. The researchers have used some of the benchmark machine learning datasets with varying uncertainty: Chexpert for classifying chest x-rays and UMNIST, which is formed by MNIST digits and used for digit classification. For this, the researchers simulated uncertainty while they used the bird dataset, where they had human participants indicate certainty by classifying the bird as whether it?s red or orange.

The study encompasses controlled simulations and real human uncertainty, investigating coarse-grained and fine-grained uncertainty expressions. Design choices in successfully handling the discrete uncertainty scores influence the performance, considerations of mapping, broad vs. narrow uncertainty, and instance vs. population level uncertainty. The research scholars underscore the importance of incorporating human uncertainty into concept-based models and the need for comprehensive datasets like CUB-S to study these challenges.
Some open challenges the authors found from this research are (1) complementarity of humans and machine uncertainty, (2) Treating Human (Mis) calibration, and (3) scaling uncertainty elicitation. The researchers explain the shortcomings of popular concept-based models and introduce the UElic interface and the CUB-S dataset to facilitate further research in human uncertainty interventions.