Holistic Evaluation of AI-assisted Biomedicine: A Case study on Interactive Cell Segmentation

Wout Schellaert

PhD student at Universitat Politècnica de València

Abstract

Rapid advances in artificial intelligence have resulted in a correspondingly growing prominence of AI-based tools in day to day biomedicine workflows. As a high-risk domain with impact on human health, it is of vital importance that any AI systems in use are reliable, safe, and trustworthy. A first step, often ignored, is making sure that evaluation procedures align with expert usage and accurately reflect associated benefits and risks. This project is a necessary effort to validate the applicability of the TAILOR network’s research on AI trustworthiness to real-world scenarios, which it aims to do by applying new insights in robust evaluation to AI-assisted biomedicine, focusing on interactive cell segmentation. A three-pronged user study will be conducted to help set up holistic evaluation methodologies and metrics, investigate performance of human-AI teams, and analyse system predictability in the context of user (over)reliance and error amplification. The intended goal is to both demonstrate feasibility of ideas developed by the TAILOR network and to pave the path for the adoption of procedures and metrics for general segmentation evaluation that align better with expert usage.