Back to publications

Conjecture extraction for proof autoformalization

Paper Details

Published: 2025/10/31

Container Title: The International Conference on Information Management and Big Data (SIMBig)

Paper Links

HTML

Autoformalization and ATP have each advanced the mechanization of mathematics, yet the translation of informal proofs into fully formalized counterparts remains an open challenge: especially for interactive theorem provers beyond Isabelle. We introduce conjecture extraction, a novel proof autoformalization pipeline tailored to Lean 4 that decomposes an informal proof into individual lemmas (conjectures), formalizes and proves each in isolation, and then reassembles them to recover the original argument. Unlike prior sketch-based methods, our approach is compatible with end-to-end proof generation models and leverages repeated conjecture refinement to incrementally improve performance. We implement an open-source system that integrates off-the-shelf autoformalization LLMs, automated theorem provers, and an online reinforcement-learning loop to optimize both conjecture generation and proof search. On the MiniF2F benchmark, conjecture extraction achieves an absolute improvement of 11.2 percentage points in pass@1 over the Draft, Sketch, and Prove port (DSP) for Lean 4, demonstrating the efficacy of proof decomposition and recomposition. Our results suggest that conjecture extraction not only bridges a gap in proof autoformalization for Lean but also offers a general framework for scaling formalization efforts across diverse proof assistants. We release our code and models to foster further research in large-scale formalized mathematics.

Authors

Soumya Banerjee

Cambridge University

Senior Research Associate, Accelerate Programme

Read Bio

Conjecture extraction for proof autoformalization

Paper Details

Paper Links

Authors

Soumya Banerjee

Simon Sorg

Wenda Li

Conjecture extraction for proof autoformalization

Paper Details

Paper Links

Authors

Soumya Banerjee

Simon Sorg

Wenda Li

Help Accelerate Research

Essential Cookies Always Required

Research Analytics

Essential Cookies
Always Required