This is an explanation of why “auto-formalization”, as a general concept, is not real, but probably still helpful some of the time.
Some people talk about “auto-formalization”: the act of convincing a computer to automatically formalize informal stuff (such as mathematical statements). They might be referring to the ability for GPTs to make conversions like
for every natural number, there is a larger prime number
into
theorem exists_larger_prime (n : ℕ) : ∃ (p : ℕ), prime p ∧ p > n
The tl;dr here is this:
In other words, if it were possible to auto-formalize some range of problems, you would have had to already formalize the range of problems.
Or: 102. “One can’t proceed from the informal to the formal by formal means”.
This is not meant to devalue the work folks are doing on machine learning and theorem proving, which I would like to succeed. I’m not part of any LLM research communities, which I assume host lively debate around what they mean by auto-formalization and the pros and cons of various approaches. I do sometimes sense confusion in discussions around verification; sometimes there is an implicit unchallenged belief that “verification” is an objective measure that some object either does/does not possess.
To be a little more concrete, suppose you want me to start by automatically formalize a particular textbook (say, Algebra by Artin). You want a program that takes in the “natural” language and spits out formal theorems and formal proofs of those theorems. Here are several things that I might do, each resulting in a particular program that constructs a formal object given some input:
auto_theorems
), then it does a brute force
search to find instantiations of the abstract proofs from the library
that prove the theorems.auto_theorems
, but then the natural language proofs are
parsed and an attempt is made to structurally translate them into formal
proof with the help of a library of prior math knowledge. Whenever gaps
are found, the program asks a user (in some mix of formal and natural
language) how to fill them.The first point is: there are no formal grounds by which any of these responses can be refuted as incorrect (even the first one). It is representative of a general danger: the formal theorem statement chosen to represent an informal one might simply not be relevant or useful. These errors could be buried arbitrarily deeply in the formalization, although end-to-end proofs (with vetted hypotheses and conclusion) are a way to guard against that. Still, the vetting must take place.
The second point is: auto-formalization should (probably) be general in an interesting way. The second strategy is not general (cannot formalize any other text) and the third is dubious (its prior knowledge is highly domain specific).
Another point about (3): once you choose a formal system and formal theorem expressions, proofs can be checked for validity without further human intervention. However, determining whether or not a particular formal proof represents a given natural language proof is highly problematic. Normally, people learn by reading proofs; if formal proofs are to be helpful for learning, something more is required.
The fourth strategy is not automatic, but I would argue is highly useful. There is a reality out there where people are already highly interested in building formal math; routines that fill in more busywork for them are great. They expect tools to be interactive: it is simply not the norm for one-shot methods to succeed. When an automated method does fail, it should fail in an informative way: some hint as to what to try next is necessary.
The final option is the most objective, but still relies on human
vetting that auto_theorems
is sound and may fail to provide
proofs that are useful in some desired sense.
Replacing all programming activity with “auto” translated natural language is not a holy grail. Programming languages tend to introduce vast amounts of incidental complexity, but there are still real reasons to prefer formal languages for describing complex processes. A well designed language/notation supports thinking.
As far as mixing natural language and theorem proving my take is:
auto_theorems
problem and towards the more intrinsically useful problem of teaching
people how to use rigorous language to work with reality.