Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu, Xinhao Zheng, Renqiu Xia, Xingzhi Qi, Qinxiang Cao, Junchi Yan·May 07, 2025
Summary
FPS, a deterministic Markov decision process, solves problems using FTP environments, offering process-verified solutions. D-FPS decouples solving and answer verification for better human alignment. Proven expressiveness, soundness, and completeness, FPS showcases through benchmarks. RPE verifies symbolic correctness. Four FTP models and two prompting methods serve as baselines, achieving limited success on benchmarks. AI innovations in formal math, automated theorem proving, and large language models are showcased. Systems like Minif2f, Internlm2.5-stepprover, and Lean-star use machine learning and expert iteration to tackle complex problems. Research focuses on deep learning models and their applications in theorem proving, mathematical reasoning, and physics.
Introduction
Background
Overview of deterministic Markov decision processes (MDPs)
Role of FTP environments in solving problems
Objective
The aim of using FPS in formal mathematics and theorem proving
FPS: A Process-Verified Solution
Process-Verified Solutions
Explanation of FPS's deterministic nature
How FPS ensures process verification in problem-solving
Decoupling Solving and Answer Verification: D-FPS
Decoupling Mechanism
Description of D-FPS architecture
Benefits of separating solving from answer verification
Proven Expressiveness, Soundness, and Completeness of FPS
Benchmark Results
Overview of FPS's performance on various benchmarks
Evidence of FPS's proven expressiveness, soundness, and completeness
RPE: Verification of Symbolic Correctness
RPE Functionality
Role of RPE in ensuring symbolic correctness
Integration of RPE with FPS for enhanced reliability
Baseline Models and Their Limitations
FTP Models and Prompting Methods
Description of four FTP models and two prompting methods
Analysis of their limited success on benchmarks
AI Innovations in Formal Mathematics and Theorem Proving
Innovations Overview
AI advancements in formal math and automated theorem proving
Utilization of large language models in theorem proving
Systems for Complex Problem Solving
Case Studies
Minif2f, Internlm2.5-stepprover, and Lean-star
How these systems employ machine learning and expert iteration
Research Focus: Deep Learning Models and Applications
Research Areas
Deep learning models in theorem proving
Applications in mathematical reasoning and physics
Current trends and future directions in AI research
Basic info
papers
computation and language
logic in computer science
artificial intelligence
Advanced features
Insights
What are the key differences between FPS and D-FPS in terms of solving and answer verification?
How does the FPS system utilize FTP environments to provide process-verified solutions?
How do the four FTP models and two prompting methods perform as baselines in benchmarks?
What AI innovations are highlighted in the context of formal math and automated theorem proving?