A Benchmark Suite for Systematically Evaluating Reasoning Shortcuts

Samuele Bortolotti, Emanuele Marconato, Tommaso Carraro, Paolo Morettin, Emile van Krieken, Antonio Vergari, Stefano Teso, Andrea Passerini·June 14, 2024

Summary

rsbench is a benchmark suite developed to address reasoning shortcuts (RSs) in neural and neuro-symbolic models, particularly in tasks involving learning and reasoning. It offers customizable tasks, metrics for concept quality, and formal verification, aiming to improve model reliability in high-stakes applications like autonomous vehicles. The suite includes datasets like MNMath, MNLogic, Kand-Logic, and SDD-OIA, with varying levels of complexity and RSs. Experiments evaluate models like DeepProbLog, LTN, CBMs, and black-box NNs, revealing challenges in concept quality and the need for overcoming RSs. rsbench is available for researchers to study, mitigate RSs, and enhance AI system trustworthiness.

Key findings

10

Advanced features