How to search for theorems in Lean 4

Summary

Use a mix of LeanSearch, Loogle, and exact?.

Introduction

In the "All Lean Books And Where To Find Them" article, I advise a number of formalization books (Formalizing Mathematics => Theorem Proving in Lean => Mathematics in Lean) to people who want to learn to formalize mathematics in Lean.

This was based off contents and relative difficulty of the books. Now, after I participated in the formalization of the Carlesons' theorem, my mind has changed on this a bit. I think that there is little sense in reading Lean textbooks about formalization - the easiest route to learning formalization is to:
1. get a hang of the logic behind how goals and hypotheses interact in Lean (play the "Natural Language Game")
2. learn ~10 basic tactics in Lean (do a couple of levels in "Formalizing Mathematics")
3. Dive into a real formalization project.

There is one caveat, however, one more skill that needs to be learned - one needs to know how to search for theorems in Lean 4.  As Will Crichton mentions in his "How to Make Mathematicians Into Programmers (And Vice Versa)" talk, theorem search is essential for formalization, and yet it is not taught in any Lean textbook.

This is my attempt at explaining how to search for theorems in Lean 4.

Overview

In Lean, there are two main types of search engines:

  • "neural/vector-based" search engines (Moogle, LeanSearch, Search-Mathlib)
    Neural search engines work via llm embedding vectors and return approximate matches. Neural search engines can be used when you're not yet sure what's likely to be present in Mathlib, or when you're not in the mood for typing out a formal search query and prefer to quickly query something in natural language.
  • "formal" search engines (Loogle, Mathlib's exact?)
    Formal search engines work via Lean metaprogramming, and return all theorems in Mathlib that match your query. Formal search engines are predictable and comprehensive, but do require you to have a good idea on the shape of the statement of the searched-for theorem, and do require you to be careful with the symbols you use.
Overview of searching for theorems in Lean 4: Moogle, LeanSearch.net, Loogle, #find, #library_search, Lean Copilot

 

Throughout this post, we'll be searching for the theorem Metric.ball_subset_ball from Mathlib, a theorem telling us that if two balls share a center, then the ball with a smaller radius is a subset of the ball with a bigger radius:

theorem Metric.ball_subset_ball {α : Type u} [PseudoMetricSpace α] {x : α} {ε₁ : ℝ} {ε₂ : ℝ}
(h : ε₁ ≤ ε₂) : Metric.ball x ε₁ ⊆ Metric.ball x ε₂

Neural search engines will be queried with ​from: x ≤ y, to: ball c x ⊆ ball c y,
formal search engines will be queried with Metric.ball ?c ?_ ⊆ Metric.ball ?c _.

NEURAL SEARCH

Moogle, LeanSearch and Search-Mathlib are vector-based search engines.

They all share a similar technical implementation - per each theorem in Mathlib, they take a formal statement of that theorem, create the semantic embedding from it, and put it into the vector database. One vector == one theorem; theorems are considered similar when the cosine distance between their corresponding vectors is small.

When searching for theorems in a neural search engine, the user can type anything they like - LaTeX formulas, approximate theorem name, term descriptions, natural language queries - semantic search can digest all of it. Any such user query will be converted into an embedding vector of its own.

We input any kind of query, this query is converted into a vector of its own, and, using cosine similarity, we find the theorem vectors that are most similar to our query vector. Those are our search results.


There is an interesting question of how exactly one should best query semantic search engines. 

Anecdotally, if you input something obviously silly, like theorem name that has many letters "z", semantic search will happily return theorem List.zip_unzip.
If you input theorem that they learn in school, semantic search will happily return theorem Nat.sq_sub_sq (a : ℕ) (b : ℕ) : a ^ 2 - b ^ 2 = (a + b) * (a - b).

LeanSearch paper (link) offers the following categorization of all search queries:

  • natural-descripton
    "If there exist injective maps of sets from A to B and from B to A, then there exists a bijective map between A and B."
  • LaTeX-formula
    "If there exist f: A→B injective, g: B→A injective, then there exists h:A→B bijective."
  • theorem-name
    "Schroeder Bernstein Theorem"
  • Lean-term
    "{f : A → B} {g : B → A} (hf : Injective f) (hg : Injective g) : ∃ h, Bijective h"

In my experience, theorem-name queries are exceedingly rare, most lemmas you might want to find do not have names. Similarly, I never had the need to query a LaTeX-formula. Technically, we sometimes have the necessary LaTeX in our formalization blueprints, but it's hard to transform that into a query - we are not working within LaTeX environment, we are working within Lean environment.

So, what gets used in practice is natural-descriptons and Lean-terms. In fact, what I find most useful is a heavy mix of these two.

For example, LeanSeachClient (a Lean library that lets you interact with Moogle and LeanSearch via a Lean command) offers the following example of a query:
"If a natural number n is less than m, then the successor of n is less than the successor of m"
which would get classified as a purely natural-language query.

Here is the query I would use instead:
"natural numbers. from: n < m, to: n + 1 < m + 1"
which belongs to a mix of natural-descripton and Lean-term categories.


No matter what query category you decide to go for, during the formalization process you have to perform the same ritual:

  1. DISTILL THE ESSENCE OF YOUR SITUATION
    What usually happens is you have a hypothesis with type d ^ m < d ^ n * d * 25 - d ^ n * 4, and you want to turn it into a hypothesis with type d ^ m < d ^ (1 + n) * 25 - d ^ n * 4.
    You can try to query the following string
    """
    from: d ^ m < d ^ n * d * 25 - d ^ n * 4
    to: d ^ m < d ^ (1 + n) * 25 - d ^ n * 4
    """,
    however it won't find anything of use.
    You need to distill your situation to its bare essence!
    So, you manually turn your initial query into the following query:
    """
    from: (a ^ b) * a
    to: a ^ (b + 1)
    """
  2. SEARCH
    You submit the distilled search query to the vector search engine. It retuns a number of theorems to you. You decide which theorem looks good based on its signature. You copypaste it into your editor.
  3. TYPE PUZZLE
    Now comes the puzzle game where you try to fit your existing hypotheses into this theorem's signature.
    This usually involves dealing with coercions (e.g., turning your Nats into Reals), dealing with associativity/commutativity, providing additional data to the tactic (e.g., positivity of some variable), and, finally, matching hypotheses to arguments.

Now - even though semantic search engines all share the same technical underpinning, there is some variation in what exactly they convert into the embedding vector, in how frequently they sync with the latest mathlib, etc. Below I describe the main search engines present in Lean 4 at the moment, how they differ, and my personal experience with them.

Moogle

LinkAuthorsAnnouncement
https://moogle.ai The Morph LabsZulip, November 2023

Moogle is a search engine by Morph Labs, and, to my knowledge, Moogle was the first semantic search engine for proof assistants ever created - or, at the very least, the first semantic search engine actively used by the community. After Moogle, other such efforts followed - a competing search engine for Lean 4 (LeanSearch) and a search engine for Coq (ProofDB).

At the very least from November 2023 to July 2024, Moogle was the leading search engine for Lean.
Anecdotally - July 24th 2024 Kevin Buzzard was giving the "The state of the art in the formalisation of research-level mathematics" talk during the "Interactive Theorem Proving in Education and Research" workshop (link). During the questions section, I asked "what do mathematicians use for searching theorems in Lean lately - Loogle, Moogle, ReProver, #find tactic, something else completely?".
Kevin Buzzard responded with something along the lines of "Moogle! Once there was a 6-hour Moogle outage, and that was the only thing zulip discussed".

Despite the fact that Moogle was a pioneer in this space, at the moment Morph Labs doesn't seem interested in keeping Moogle up to date with current Mathlib, or in improving the tech behind it, or in open-sourcing the codebase.

Moogle held the leadership among semantic search engines for a few months, but I get a feeling LeanSearch is at least on par with Moogle nowadays.

MY USE
I have a preference for LeanSearch at the moment.
This preference is not based on any proper statistical comparison, it's based on the intuitive "hey, LeanSearch returns the desired theorem on 3rd place, and Moogle returns it on 10th place", repeated a few times during my formalization experience.
Moogle is functional too, however, and is worth trying out - maybe you'll have better luck with your queries.

LeanSearch

LinkAuthorsAnnouncement
https://leansearch.netJiang Jiedong,
Guoxiong Gao,
Haocheng Ju
Zulip, 30th July 2024

LeanSearch is the semantic search engine I use the most. Generally, LeanSearch just has a higher chance of returning what I'm searching for, on higher positions than Moogle would have it.

Moogle neither open-sourced their implementation nor wrote a paper about it, but from what I understand what they are doing internally is they turn [Lean statement] into the embedding vector. LeanSearch published a paper, so we know it takes [Lean statement, docstring, related defitions, natural language version of this theorem generated using GPT-3], and turns that into the embedding vector.

It's unclear whether that is the main reason behind LeanSearch showing a better performance.
There were cases when I couldn't find a theorem in Moogle via the search query that's a direct copypaste of that theorem's name - it's possible that Moogle simply doesn't refresh their database frequently enough to keep up to date with the latest Mathlib.
On the other hand, I just tried to recreate that situation, and what I found instead is the opposite - on March 1st 2025, ​Real.rpow_le_rpow_of_exponent_le (a theorem that does exist in current Mathlib) is returned by Moogle and not returned by LeanSearch!

In either case - LeanSearch has been more consistent for me results-wise.

As you can see in the screenshot, LeanSearch has one unusual feature - the "Query Augmentation" button. What it does is it turns your query string into a natural language description using GPT-4. For example, it turns our "​from: x ≤ y, to: ball c x ⊆ ball c y" query into "Monotonicity of Balls with Respect to Radius: In a metric space, if the radius of a ball centered at a point c is increased from a value x to a value y, where x is less than or equal to y, then the ball with radius x is entirely contained within the ball with radius y. This means that every point within the smaller ball is also within the larger ball" query.
 I do use that on occasion, but regular search normally suffices.

MY USE
I use LeanSearch all the time.

Search-Mathlib (simple model on huggingface)

LinkAuthorsAnnouncement
huggingface.co/spaces/dx2102/search-mathlibDeming XuZulip, 30th July 2024

Published, quite unfortunately, on the same day as LeanSearch - this is a simple model on huggingface. Returns pretty good results, but this was merely a training exercise for the author as he states in his introduction on zulip.

I think the zulip message by Deming Xu is worth a read - with only 100 lines of Lean code (adjusted code taken from another library) + 200 lines of Python code (100 of which is just setting up the web page), he managed to create a search engine that performs reasonably well, which is quite inspiring to anyone who wants to experiment with semantic search in Lean 4.

The principles are similar to what LeanSearch (and, likely, Moogle) is doing - convert all theorems into vectors, convert a search query into a vector, put all of these vectors into vector database, and determine the theorem vectors closest to the search query vector. Deming turns [theorem name, theorem statement, theorem docstring] into a vector.

MY USE
I don't use it (and it was never intended for use!), as there exist LeanSearch and Moogle.

FORMAL SEARCH

Loogle (aka Mathlib's #find)

LinkAuthorsAnnouncement
https://loogle.lean-lang.orgJoachim BreitnerZulip

Loogle was developed by Joachim Breitner, and was reportedly (link) inspired by Haskell's Hoogle.

Loogle's search is not semantic, it matches expressions by their exact structure. Technically, Loogle is a wrapper around Mathlib's #find - it adds some functionality to the default #find command, and creates a web interface around it.

Initially, Joachim tried to merge two #find PRs (#6643 and #6363) into Mathlib 4, but the reviews went for too long, so he just went ahead and:
1) factored out #find + batteries into a separate repo (https://github.com/nomeata/loogle), and
2) created https://loogle.lean-lang.org, which uses this #find + batteries internally.

One surprising thing about Loogle/#find command is they don't return theorems defined in your local environment. It's understandable why Moogle and LeanSearch don't return local results, that would require parsing the project (though it's not expensive, I do think semantic search engines will implement a "submit a project to parse" at some point), but formal search should be able to do it with little programming overhead, e.g. exact? tactic does return local lemmas.

MY USE
I use Loogle as frequently as I use LeanSearch!
I think a good rule of thumb is: when you're new to formalisation, you should start by using LeanSearch (because you don't yet have a gut feeling for "what kind of theorem is likely to exist in Mathlib"), and then add Loogle to your toolbox.

I mostly use Loogle from within vscode (this functionality is provided by the default vscode-lean extension), this is what it looks like:

Mathlib's apply? and exact? (aka library_search)

From the docs:

library_search has been renamed to apply? (or exact? if you only want solutions closing the goal)

Both apply? and exact? work on a goal only.
apply? will find all theorems that are legal to execute on a goal.
exact? will find all theorems that are legal to execute on a goal, AND that do close the goal.

Unlike all the tools described above, exact? does search among the locally available lemmas.

Good docs: https://lean-lang.org/blog/2024-4-4-lean-470/.

MY USE
I use exact? all the time, and I do not use apply? at all. 
exact? is wonderful, but something to keep in mind is 1) it won't find anything if some additional hypothesis is lacking, so always do the preliminary have h : 2 > 0 := by positivity in your context! 2) it won't deal with casting, make sure all of your hypothesis and the goal are cast appropriately before calling exact?.

OTHER

Something not immediately obvious is that powerhouse tactics such as linarith/omega/ring/simp/simp_all/aesop is also, practically, search.
The idea is that you see what your goal is like, and you have a hunch for whether one of these tactics would solve it. For example, if you have a set of inequalities, and you want to prove an inequality, and all of them are simple enough, then linarith should succeed. If not, try to make the inequalities in question even more simple, and repeat the linarith attempt.
Right now I'm going by such hunches, but I do kind of wish there was a flowchart for when exactly to try each such tactic, and what other such tactics are available.

CMD+F in your editor is also search, don't neglect it! In fact it's frequently the best way to find theorems defined in your Lean project. The Carleson project is a few-dozen-files project, and CMD-F-ing  for a specific objects tends to return 3-10 entries - it's typically reasonably easy to read through them all!

Usage of llms is also search. Both Claude and ChatGPT already got up to speed with Lean 4, but we still want to provide them with relevant tactics and theorem names. Claude is much better than ChatGPT in proof writing at the moment (March 2025).

In the paid version of Claude, I create projects like this:

Which I then query like this: