LMUnit: Fine-grained Evaluation with Natural Language Unit Tests
Jon Saad-Falcon, Rajan Vivek, William Berrios, Nandita Shankar Naik, Matija Franklin, Bertie Vidgen, Amanpreet Singh, Douwe Kiela, Shikib Mehri·December 17, 2024
Summary
The LMUNIT framework introduces natural language unit tests for evaluating language models, focusing on explicit, testable criteria. It combines multi-objective training, direct ratings, and natural language rationales through a unified scoring model, LMUNIT. This approach improves inter-annotator agreement and supports effective LLM development workflows, outperforming existing methods on evaluation benchmarks and RewardBench. LMUNIT addresses the complexity of evaluating language model response quality by decomposing it into explicit criteria, enhancing transparency and adaptability in evaluation methodologies. It combines generative judge models and classifier-based reward models through a multi-objective training approach, outputting rationale tokens, a score token, and computing a continuous score prediction. The framework's effectiveness is validated through human studies, demonstrating improved inter-annotator agreement and more effective LLM development workflows.
Advanced features