Jeroen G. Rook – University of Twente
Comparing algorithms is a non-trivial task. Often, a set of representative problem instances are used to compare algorithms. However, these problem instances introduce biases in the comparison outcomes, which is often not taken into account. The confidence of the comparison can be strengthened by using statistical tests, for example using bootstrapping distributions. In the cases where there is a single performance objective there exists a computationally efficient bootstrapping-based approach for statistically robust ranking of algorithms. However, algorithm performance can rarely be expressed with only one performance objective. Therefore, there is a need for statistically robust algorithm ranking methods where they are evaluated based on multiple performance objectives. This desire is further strengthened by the increasing need for having simultaneously trustworthy and performant algorithms.This is beneficial to many AI domains such as learning, reasoning and optimisation. We propose an approach for obtaining such a method and demonstrate its behaviour on SAT, MIP, TSP and ML problems.
Keywords: Algorithm Comparison, Statistical Robustness, Multi-Objective Ranking
Scientific area: AutoAI, Trustworthy AI