Types of Contamination in AI Evaluation: Reasoning and Triangulation

Behzad Mehrbakhsh

PhD student at Universitat Politècnica de València

A comprehensive and accurate evaluation of AI systems is indispensable for advancing the field and fostering a trustworthy AI ecosystem. AI evaluation results have a significant impact on both academic research and industrial applications, ultimately determining which products or services are deemed effective, safe and reliable for deployment. Contamination at any stage of the AI evaluation process can compromise the integrity and reliability of the measurement results and hence increase the risk associated with the deployment of these models. The visit will contribute to enhancing the robustness and trustworthiness of AI systems by identifying and reasoning about different types of contamination during AI evaluation with a particular emphasis on non-malicious test-sets, an often overlooked but critical factor that can invalidate evaluation results.

Keywords: Trustworthy AI, evaluation, contamination, causality, knowledge representation, train-test leakage, belief revision.

Scientific area: Artificial Intelligence, Machine Learning, Trustworthy AI

Bio: PhD student at Universitat Politècnica de València

Visiting period: 01/06/2024 to 15/08/2024