*It's August 2024. *To say t*he "theorem search in Lean" ecosystem is thriving is an understatement, last time I used Lean (less than a year ago?) most of these tools didn't exist.*

*July 24th 2024 Kevin Buzzard was giving the "The state of the art in the formalisation of research-level mathematics" talk during the "Interactive Theorem Proving in Education and Research" workshop (**link**).**During the questions section, I asked something along the lines of "what do mathematicians use for searching theorems in Lean lately - Loogle, Moogle, ReProver, #find tactic, something else completely?".**Kevin Buzzard responded with something along the lines of "Moogle! Once there was a 6-hour Moogle outage, and that was the only thing zulip discussed".*

*So - let's start from listing all of our options, perhaps with a mild bias towards Moogle.*

*NOTE - this is very much WIP, but already explorable.*

First of all, an overview.

While traditional search engines only begin to experiment with semantic search (say, nn-first approach of exa.ai), Lean theorem search seems to be overpowered by nn-first approaches.

Both **Moogle.ai** and **LeanSearch.net** support a formal pattern-matching approach apart from their main nn-based search entry.

Throughout this post, we'll be searching for a simple theorem - **not_exists** from Mathlib.

```
theorem not_exists {α : Sort u_1} {p : α → Prop} :
(¬∃ (x : α), p x) ↔ ∀ (x : α), ¬p x
```

Neural search engines will be queried with `¬∃ (x : α), p x then ∀ (x : α), ¬p x`

,

formal search engines will be queried with `(¬∃ (x : α), p x) → ∀ (x : α), ¬p x`

.

In no way is this the best possible testing query - my goal for this post is to showcase the interfaces of these systems for a theorem that everyone can understand.

Link | Authors | Announcement |

https://leansearch.net | Jiang Jiedong,Guoxiong Gao,Haocheng Ju | Zulip, 30th July 2024 |

LeanSearch is a semantic search engine.

Long story short - per each theorem in Mathlib, it takes:

- mathlib formal statement and its context (docstring, related definitions, etc.), and "informalizaes" it using GPT-3 - turns the theorem statement into a natural language description.
- Then, it creates the semantic embedding [theorem name; theorem's formal statement in Lean; natural language version of this theorem] and puts it into the vector database (one vector == one theorem; theorems are considered similar when the cosine distance between these two vectors is small).
- When searching for theorem in the LeanSearch interface, the user can type anything they like - LaTeX formulas, approximate theorem name, term descriptions, natural language queries - LeanSearch can digest all of it.

Anecdotally, if you input something obviously silly like `theorem name that has many letters "z"`

, semantic search will happily return `theorem List.zip_unzip`

. If you input `theorem that they learn in school`

, semantic search will happily return `theorem Nat.sq_sub_sq (a : ℕ) (b : ℕ) : a ^ 2 - b ^ 2 = (a + b) * (a - b)`

.

- After the query is submitted, the query text undergoes query augmentation using GPT-4 (something along the lines of
*"You're a maths expert, please transform this lousy query into proper maths speak AND accompanying Lean 4 code"*).

After this process, we have vectors representing all theorems in Mathlib, and a vector representing our augmented search query. Using cosine similarity, we find the theorem vectors that are most similar to the query. Those are our search results.

Link | Authors | Announcement |

https://moogle.ai | The Morph Labs | Zulip, November 2023 |

Moogle is a hybrid. In order to perform a formal pattern-matching search, start your query with `#find`

.

Unlike LeanSearch.net, Moogle is yet to publish a paper about their system.

It's unclear whether LeanSearch.net or Moogle.ai performs better, both are worth trying. LeanSearch.net paper claims performance superior to Moogle.ai's, with a caveeat that these are not directly comparable.

As mentioned in the introduction, Moogle seems to be the most popular search engine at the moment - probably because it was the first semantic search engine for Lean. At the moment of writing, LeanSearch.net is one-month-old.

Link | Authors | Announcement |

huggingface.co/spaces/dx2102/search-mathlib | Deming Xu | Zulip, 30th July 2024 |

Published, quite unfortunately, on the same day as **LeanSearch.net** - this is a simple model on huggingface. Returns pretty good results, but this was merely a training exercise for the author as he states in his introduction on zulip.

I think the zulip message by Deming Xu is worth a read - with only 100 lines of Lean code (adjusted code taken from another library) + 200 lines of Python code (100 of which is just setting up the web page), he managed to create a search engine that performs reasonably well, which is quite inspiring to anyone who wants to experiment with semantic search in Lean 4.

The principles are similar to what **LeanSearch.net** (and, likely, **Moogle**) is doing - convert all theorems into vectors, convert a search query into a vector, put all of these vectors into vector database, and determine the theorem vectors closest to the search query vector. The main difference between **LeanSearch.net **and **Deming's huggingface model **is that **LeanSearch.net** enhances both the theorem vectors and the query vector using conversion to natural language - Deming just uses raw Lean data (theorem name, theorem type, theorem docstring).

`#find`

)Link | Authors | Announcement |

https://loogle.lean-lang.org | Joachim Breitner | Zulip |

Loogle was developed by Joachim Breitner, and was reportedly (link) inspired by Haskell's Hoogle.

Loogle's search is not semantic, it matches expressions by their exact structure. Technically, Loogle is a wrapper around Mathlib's `#find`

- it adds some functionality to the default `#find`

command, and creates a web interface around it.

Initially, Joachim tried to merge two `#find`

PRs (#6643 and #6363) into Mathlib 4, but the reviews went for too long, so he just went ahead and:

1) factored out `#find + batteries`

into a separate repo (https://github.com/nomeata/loogle), and

2) created https://loogle.lean-lang.org, which uses this `#find + batteries`

internally.

`apply?`

and `exact?`

(aka `library_search`

)From the docs:

`library_search`

has been renamed to`apply?`

(or`exact?`

if you only want solutions closing the goal)

Both `apply?`

and `exact?`

work on a goal only.`apply?`

will find all theorems that are legal to execute on a goal.`exact?`

will find all theorems that are legal to execute on a goal, AND that do close the goal.

Here is the interface for the `apply?`

tactic:

Good docs: https://lean-lang.org/blog/2024-4-4-lean-470/.

Link | Authors | Announcement |

https://github.com/lean-dojo/LeanCopilot(install as a Lean library) | Kaiyu Yang,others | Zulip,Twitter |

Unlike the search engines above, Lean Copilot takes inspiration from GitHub Copilot - it looks at the current state of your proof, and tries to predict the next useful step.

Lean Copilot can be considered a search engine because, using the `select_premises`

tactic, you can search "relevant theorems to my proof" with it (confusingly enough, when they say `"premises"`

- they mean `"theorems"`

). It will look something like this (image courtesy of LeanCopilot repo):