A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts
Samuele Bortolotti, Emanuele Marconato, Tommaso Carraro, Paolo Morettin, Emile van Krieken, Antonio Vergari, Stefano Teso, Andrea Passerini·June 14, 2024
Summary
rsbench is a benchmark suite developed to address reasoning shortcuts (RSs) in neural and neuro-symbolic models, particularly in tasks involving learning and reasoning. It offers customizable tasks, metrics for concept quality, and formal verification, aiming to improve model reliability in high-stakes applications like autonomous vehicles. The suite includes datasets like MNMath, MNLogic, Kand-Logic, and SDD-OIA, with varying levels of complexity and RSs. Experiments evaluate models like DeepProbLog, LTN, CBMs, and black-box NNs, revealing challenges in concept quality and the need for overcoming RSs. rsbench is available for researchers to study, mitigate RSs, and enhance AI system trustworthiness.
Advanced features