Leanabell-Prover: Posttraining Scaling in Formal Reasoning

Jingyuan Zhang, Qi Wang, Xingguang Ji, Yahui Liu, Yang Yue, Fuzheng Zhang, Di Zhang, Guorui Zhou, Kun Gai·April 08, 2025

Summary

Research enhances automated theorem proving with Lean 4, focusing on posttraining scaling. ATP models, trained on a hybrid dataset, surpass baselines with a 59.8% passing rate on MiniF2F. The study tackles formal language reasoning challenges, improving performance through posttraining strategies. It corrects an initial flawed proof for a distinct integer list problem, suggesting valid solutions like powers of 2, specific element relationships, or a factorial approach. The Leanabell-Prover-GD-SFT model's valid proof distribution is analyzed.

Introduction

Background

Overview of automated theorem proving (ATP)

Importance of Lean 4 in ATP

Objective

Focus on posttraining scaling in ATP models

Improvement in performance through posttraining strategies

Method

Data Collection

Hybrid dataset used for model training

Data Preprocessing

Techniques applied to prepare data for training

Model Training

Description of the ATP models trained on the hybrid dataset

Posttraining Scaling

Strategies employed to enhance model performance

Techniques for optimizing ATP models after training

Results

Performance Evaluation

Comparison with baseline models

59.8% passing rate on MiniF2F benchmark

Formal Language Reasoning

Challenges addressed in formal language understanding

Proof Correction

Initial flawed proof for distinct integer list problem

Valid solutions identified: powers of 2, specific element relationships, factorial approach

Leanabell-Prover-GD-SFT Model Analysis

Distribution of valid proofs generated by the model

Discussion

Insights on Posttraining Strategies

Impact of posttraining on ATP model performance

Formal Language Reasoning Enhancements

Improvement in handling formal language complexities

Proof Generation and Validation

Analysis of proof generation mechanisms

Future Directions

Potential areas for further research and development

Conclusion

Summary of Findings

Recap of the research outcomes

Implications

Impact on the field of automated theorem proving

Recommendations

Suggestions for future studies and practical applications

Basic info

papers

artificial intelligence

Advanced features

Insights

What innovative approaches are introduced in the research to address formal language reasoning challenges?

How does the study utilize posttraining strategies to enhance the performance of ATP models?

What limitations are identified in the study regarding the Leanabell-Prover-GD-SFT model's proof distribution?

What are the main objectives and findings of the research on automated theorem proving with Lean 4?