Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving

Qi Liu, Xinhao Zheng, Renqiu Xia, Xingzhi Qi, Qinxiang Cao, Junchi Yan·May 07, 2025

Summary

FPS, a deterministic Markov decision process, solves problems using FTP environments, offering process-verified solutions. D-FPS decouples solving and answer verification for better human alignment. Proven expressiveness, soundness, and completeness, FPS showcases through benchmarks. RPE verifies symbolic correctness. Four FTP models and two prompting methods serve as baselines, achieving limited success on benchmarks. AI innovations in formal math, automated theorem proving, and large language models are showcased. Systems like Minif2f, Internlm2.5-stepprover, and Lean-star use machine learning and expert iteration to tackle complex problems. Research focuses on deep learning models and their applications in theorem proving, mathematical reasoning, and physics.

Introduction

Background

Overview of deterministic Markov decision processes (MDPs)

Role of FTP environments in solving problems

Objective

The aim of using FPS in formal mathematics and theorem proving

FPS: A Process-Verified Solution

Process-Verified Solutions

Explanation of FPS's deterministic nature

How FPS ensures process verification in problem-solving

Decoupling Solving and Answer Verification: D-FPS

Decoupling Mechanism

Description of D-FPS architecture

Benefits of separating solving from answer verification

Proven Expressiveness, Soundness, and Completeness of FPS

Benchmark Results

Overview of FPS's performance on various benchmarks

Evidence of FPS's proven expressiveness, soundness, and completeness

RPE: Verification of Symbolic Correctness

RPE Functionality

Role of RPE in ensuring symbolic correctness

Integration of RPE with FPS for enhanced reliability

Baseline Models and Their Limitations

FTP Models and Prompting Methods

Description of four FTP models and two prompting methods

Analysis of their limited success on benchmarks

AI Innovations in Formal Mathematics and Theorem Proving

Innovations Overview

AI advancements in formal math and automated theorem proving

Utilization of large language models in theorem proving

Systems for Complex Problem Solving

Case Studies

Minif2f, Internlm2.5-stepprover, and Lean-star

How these systems employ machine learning and expert iteration

Research Focus: Deep Learning Models and Applications

Research Areas

Deep learning models in theorem proving

Applications in mathematical reasoning and physics

Current trends and future directions in AI research

Basic info

papers

computation and language

logic in computer science

artificial intelligence

Advanced features

Insights

What are the key differences between FPS and D-FPS in terms of solving and answer verification?

How does the FPS system utilize FTP environments to provide process-verified solutions?

How do the four FTP models and two prompting methods perform as baselines in benchmarks?

What AI innovations are highlighted in the context of formal math and automated theorem proving?