Learning Mathematical Rules with Large Language Models

Antoine Gorceix, Bastien Le Chenadec, Ahmad Rammal, Nelson Vadori, Manuela Veloso·October 22, 2024

Summary

The paper investigates large language models' capability to learn and generalize mathematical rules, such as distributivity and equation simplification, through synthetic data. It showcases models' application in word problems, demonstrating learning and generalization to varying mathematical complexities. The study also examines models' performance as complexity increases, highlighting top-down generalization. The text describes a method to construct synthetic data incorporating mathematical rules for model learning, focusing on bottom-up and top-down generalization. It uses rules like finding roots of polynomials, solving equations, and applying distributivity. The synthetic data is designed to reflect textbook-like problems without word problems, aiming to help models learn fundamental mathematical operations. The text also discusses fine-tuning a model, like Llama-3 8B Instruct, on this synthetic data combined with the Orca dataset to improve its ability to apply learned rules to unseen variable names and solve complex equations. The text outlines various problem-solving tasks for AI models, including finding non-integer roots, solving resistor circuits, and calculating fruit basket prices. The fine-tuned Llama-3 8B model outperforms the baseline on quadratic polynomials and resistor circuits, especially in parallel configurations. It excels in series circuits and shows improved performance in complex resistor setups. For fruit baskets, the model solves linear equations involving multiple variables, with the fine-tuned version demonstrating better performance. The text discusses advancements and challenges in using large language models for mathematical reasoning, referencing works on Llemma, Qlora, and others. It emphasizes the importance of efficient fine-tuning and low-rank adaptation for improving model performance on arithmetic problems and quantitative reasoning tasks.

Key findings

4

Advanced features