Show HN: MathLingua – The Language of Mathematics

fspeech · on July 27, 2020

Documenting mathematics rigorously is actually hard. For example see the Formal Abstracts (https://formalabstracts.github.io) project by Thomas Hales. Hales' other writings on the subject may be informative too: https://jiggerwit.wordpress.com/2019/06/20/an-argument-for-c...

CatsAreCool · on July 27, 2020

Thanks for the links. I was not aware of Thomas Hale's research even though I searched extensively to try to find related work.

His descriptions of what he wants to attain align with what I would like, a formalized way to describe what is known in mathematics.

He also describes how this can be used for machine learning, and area that I am also pursuing, but my results are still in an exploratory state.

Either way, I am really glad others are looking into this area too. I think it is something that will be very beneficial to the math community.

fspeech · on July 27, 2020

This is an area of my interest too, esp. if one can find something that is actually useful for a practicing mathematician. I mentioned Hales' project but based on their github project progress (a postdoc of his did some exploratory work) they seemed to have hit some road blocks, which is an example of this problem being hard. I don't know what your background is. If you are interested in further discussion I would be happy to p.m. over email.

CatsAreCool · on July 27, 2020

I have a PhD in mathematics, I am a software engineer working at Google, and have practical experience with programming language design and usage.

MathLingua is not a Google product and is not affiliated with Google in any way. Instead it is my own work I am exploring at the intersection of math, computer science, language design, and natural language processing.

It would be great to chat more over email. My email can be found on my github page https://github.com/DominicKramer. Let me know if you have problems getting it.

ratmice · on July 27, 2020

This also reminds me of mmt https://uniformal.github.io/ The primary developer of that recently gave a talk about it here https://vimeo.com/421123419

jhanschoo · on July 27, 2020

Also see: https://en.wikipedia.org/wiki/QED_manifesto

CatsAreCool · on July 28, 2020

I have read through this before, but thanks for sharing the link.

CatsAreCool · on July 26, 2020

Hello, this is the creator of MathLingua.

MathLingua is still in development, and the documentation still needs work.

However, I wanted to get feedback earlier rather than later to help guide development and the writing of the docs.

Note that MathLingua is the language that powers https://mathlore.org.

uryga · on July 27, 2020

could you explain the usecase a bit? i skimmed the docs and don't quite get what writing definitions and theorems in this format gets me, apart from an automated "go to definition". what can the tooling do, or what functionality are you planning to implement?

btw "The Language of Mathematics" sounds a bit over the top ;) how about "A Language for Mathematics"?

CatsAreCool · on July 27, 2020

Thanks for the feedback. I'll update the docs to make this more clear.

The goal is to allow the building of a system so someone could ask, for example, "what are the known theorems that describe when a function uniformly continuous?" or "under what conditions is a group solvable?".

To enable this, it is necessary to build a system that catalogs precise statements of math theorems, definitions, and other statements (https://mathlore.org is the start of this system that uses MathLingua but currently only uses keywords and some structure for searching).

This is similar to https://mathworld.wolfram.com or the math pages in Wikipedia except the statements are not text or LaTeX, that is hard for a computer to understand the meaning of. Instead the MathLingua language is used to identify math statements by meaning.

Right now the tooling can render math statements, find duplicate definitions, and can find missing definitions. Further https://mathlore.org uses MathLingua and has some basic searching.

Further, experimental work is also underway so that the MathLingua tooling can expand a math statement by expanding the definition of items used in that statement.

Yeah, I thought about using something like "A Language for Mathematics" but I thought "The Language of Mathematics" might draw more attention :) But it is only the title of my post on hacker news, not the title of the MathLingua website.

uryga · on July 27, 2020

thanks!

Kednicma · on July 27, 2020

Cool work. You should make it very loud and clear that your toolchain does not verify proofs, since otherwise folks may think that your computer-assisted proof layout system does just that; it's rare to see tooling in this subfield which does not verify proofs.

CatsAreCool · on July 27, 2020

Thanks for the feedback. I will definitely do that.

xiaodai · on July 27, 2020

it's cool but why not just Latex? Genuine question.

I am interested in learning more abt your rationale for creating MathLingua? What drove u? What existing faults do you see in existing system that made you do it.

CatsAreCool · on July 27, 2020

Thanks for the question. I thought about using LaTeX, however, it doesn't explain what a math statement means. Instead, it describes that it looks like. With MathLingua, the meaning is made precise so that a computer can understand the statement. (I'll make this more clear in the docs).

TLDR: Having a computer understand what a statement written in LaTeX means is a very difficult Machine Learning (Natural Language Processing) problem (if not impossible since math statements often have lots of hidden context).

For more details:

In LaTeX, what does "if $f$ is continuous on $[a, b]$, $f \geq 0$ there, and $\int_a^b f = 0$ then $f = 0$?

It looks clear at first, but what type of thing is $f$. From context of reading analysis books, it could be clear that $f$ must be a function because "continuous" describes functions (but can also describe other things).

Next, what does $\int_a^b f = 0$ mean? It is the integral of $f$ from $a$ to $b$, but is it the Riemann integral, the Lebesgue integral, the Darboux integral? Does the difference matter?

What exactly does $f = 0$ mean. Does it mean $f$ as a function is the same as $0$ as a function, or does it mean $f(x) = 0$ for all $x$? Are those two the same or different? Does the distinction matter?

All of these questions, show up after a computer has correctly parsed the sentence and identified the structure. However, just that step itself is a very hard natural language processing problem.

In fact, if someone comfortable in reading math was given a random theorem from a random paper and was told to explain what it means, they would likely have problems and would need to look at the context of the paper (or reference other papers) to understand the syntax. This just gives a feel for the difficultly to build a machine learning model.

That is why I decided to build a custom language. In MathLingua, for example, the theorem could be described as:

  Theorem:
  . for: f(x), a, b
    where:
    . 'a, b is \real'
    . 'a \leq b'
    . 'f is \continuous.function:on{\closed.interval{a, b}}'
    . 'f is \nonnegative.function:on{\closed.interval{a, b}}'
    . '\riemann.integral_a^b[x]{f(x)} = 0'
    then:
    . 'f is \identically.zero.function:on{\closed.interval{a, b}}'

It is more verbose, but gives a more precise meaning of what each item means. Then definitions can be made to describe what each referenced math definition means with a description of how its associated notation is written to make the rendering look similar to how it is written on paper.

For example, one could have the definition:

  [\identically.zero.function:on{A}]
  Defines: f(x)
  assuming: 'A is \set'
  means:
  . for: a
    where: 'a \in A'
    then: 'f(a) = 0'
  Metadata:
  . written: "f? = 0 \textrm{ on } A?"

This makes the definition more clear and the statement `g is \identically.zero.function:on{X}`, for example, is rendered as the familiar $g = 0 \textrm{ on } X$.

Other definitions can be written too, to make everything clear (or left for later. MathLingua doesn't require you to define the entire math universe before defining a more complex concept).

My response was a little long, but that is because there are a lot of subtleties that show up when trying to get LaTeX to work for describe the meaning of math statements. I hope this helps.

auggierose · on July 27, 2020

There is a conference related to that topic starting today actually (CICM 2020). Registration might still be available. The subcategory that concerns things like math lingua is called mathematical knowledge management (MKM).

CatsAreCool · on July 28, 2020

Cool. Thanks for letting me know.

mkl · on July 27, 2020

I spotted a typo on the linked page: "vistual"

The dot indentation syntax seems very strange. Why not just indent? Is the intention to reduce nested indentation a little?

CatsAreCool · on July 28, 2020

Thanks for letting me know about the typo.

The dot space is used to mark arguments since some arguments span multiple lines but are connected. For example, in

  Theorem:
  . for: x
    where: y
    then: z
  . 'some statement'

the 'for:where:then' group together is one argument, while the 'some statement' item is the second argument. The dot is needed to distinguish from

  Theorem:
    for: x
    where: y
    then: z
    'some statement'

where it is hard to tell where one argument stops and another starts.

mkl · on July 28, 2020

What about the following, though?

  Theorem:
    for: x
      where: y
      then: z
    'some statement'

syats · on July 27, 2020

This is an awesome effort!

Regarding a comment on possible use cases.. One component that would really compliment this would be a programming-language specific checker of definitions. In the documentation's example of "integer" it would be straight forward to write in python a function that checks if a given variable is an integer. The same can be for other definitions such as "invertible matrix" or "set of orthonormal vectors".. other languages could provide such checks for more symbolic stuff like "commutative ring" etc.

Once you have this in place, you can load your nice data into a (e.g.) numpy.ndarray object, and checks would be triggered for all possible properties. This would automatically tag your data as, e.g. "stochastic matrix" and then you could even get suggestions of the sort of "build a time-discrete stochastic system based on this matrix?". This is would be very helpful for "data scientists" who are a bit far from theory.. at least I know I would use this very often.

There is of course the issue of expressivity to be taken into account. Much of mathematical progress involves invention of new notation, not all of which is neatly written in "left to right English with only binary operators" (think of commutative diagrams for example). While the language can of course be expanded in this direction, there will be a very big temptation (and I would guess frequent pushes from the future community) to change to more and more general languages. In this case, please always keep in mind the very expressive but useless notation of Russell & Whiteheads's Principia. There is a big theory of language and expresivity, and there are also practical examples (think of OWL or other semantic-web languages) of how to reach a compromise that allows for enough expressivity for the use cases of interest.

fdej · on July 27, 2020

Nice project! I'm working on something similar with http://fungrim.org/grim/ and http://fungrim.org/

fspeech · on July 27, 2020

Pretty cool. Are you aware that there is a content markup part of MathML https://www.w3.org/TR/MathML3/chapter4.html ? There are other efforts geared towards math authoring as well if that is your focus，for example people have authored open textbooks using https://pretextbook.org .

fdej · on July 27, 2020

Indeed. I decided not to use MathML for various reasons:

1. The syntax is horrible (for writing by hand).

2. I'm more interested in computation than presentation, so something closer to S-expressions along the lines of Mathematica feels more comfortable to me.

3. The progress so far on defining semantics for content MathML has failed to impress me.

Concretely, I think if you are trying to do semantic mathematics, you need an implementation of the semantics in software, not just a hand-written spec. Ideally this would be backed by a formal theorem proving system. So far I just do randomized testing using symbolic evaluation and interval arithmetic. See this writeup for more details: https://arxiv.org/abs/2003.06181

CatsAreCool · on July 27, 2020

Your work looks really cool and your paper was a great read.

I too didn’t like the syntax of MathML, and I agree the semantics needs to be in the software.

For MathLingua I built a pattern matcher that is built in, but I haven’t tied MathLingua to a verifier.

I viewed these as separate things. This is because one can encode a statement but have a proof as a separate entity (which may be in a paper, digital, or verified form).

This allows one in mathlingua to express theorems and definitions from a book, for example, with references to the proofs in the book even though the results are not formally proven.

A layer on top of MathLingua would use MathLingua to formally verify statements. If this is needed, my plan is to leverage existing theorem provers for this.

Feel free to reach out by email if you want to chat more, or we can chat here. My email is on my github page https://github.com/DominicKramer.

fspeech · on July 27, 2020

Thanks! The symbolic computation aspect of your paper is very interesting. I would be very interested in checking out how conditional rewriting is achieved. Do you have more in depth writings on the theory and mechanics of your approach?

fdej · on July 27, 2020

There is nothing in depth yet. I'm happy to discuss (here or by email).

fspeech · on July 27, 2020

Cool. I have skimmed through the code base. I will email you with further questions.

CatsAreCool · on July 27, 2020

It looks like there is a lot of shared interest in this area.

fspeech, fdej, and others, what do you think of having a Google group, mailing list, or something similar to have a communication channel for anyone that wants to chat more?

If you are open to this (or have another suggestion) I’ll set one up and share the link here.

If you prefer email, that’s cool too.

fdej · on July 27, 2020

Sure, go ahead and set something up!

CatsAreCool · on July 28, 2020

Cool. I've setup a Google Group at https://groups.google.com/d/forum/mathlingua-discuss.

fspeech · on July 27, 2020

Yes that would be nice. Thanks!

CatsAreCool · on July 28, 2020

Awesome. I've setup a Google Group at https://groups.google.com/d/forum/mathlingua-discuss.

CatsAreCool · on July 28, 2020

A Google Group has been created to discuss MathLingua and Mathematical Knowledge Management at https://groups.google.com/d/forum/mathlingua-discuss.

Everyone is welcome to join.

xvilka · on July 27, 2020

Is it able to transpile to Coq/Lean/Agda/etc?

CatsAreCool · on July 27, 2020

It is already difficult to have a formalized math language with a level of formality that matches that found in papers.

The goal of MathLingua, at this time, is to match such a level of formality that is in math books and papers.

However, the syntax was designed to allow support for transpiling to a theorem proving language to be added at a later time. It is something, in the future, that I would like to do, if possible, but I don't have a time estimate if/when it will be available.

xvilka · on July 27, 2020

I recommend to check also Decucteam[1][2] projects, they also convert between different formats. Even as they focus on mathematical logic mostly, it still can be useful to check.

[1] http://deducteam.gforge.inria.fr

[2] https://github.com/Deducteam

jlelonm · on July 27, 2020

This is really cool!! I hope this becomes popular. Imagine having all theorems etc indexed like this!

CatsAreCool · on July 27, 2020

Thanks for the support! Yes, that is one of the goals, to make it really easy to find what is known in mathematics and contribute to that knowledge base. I have https://mathlore.org which is a demonstration of how such as system could work.

xiaodai · on July 27, 2020

It's cool and all but what's wrong with Latex?

Doesn't answer the "why" question at all.

CatsAreCool · on July 27, 2020

Thanks for the question. I have answered it in another question in the comments, and I'll update the docs to make the difference more clear.

magnio · on July 27, 2020

Searchability I guess? LaTex is for displaying mathematics, whereas this looks like an AST for math.

Vivtek · on July 27, 2020

This feels like "mathematics pseudocode" and I kind of love it. Indexable math Zettelkasten notes.

CatsAreCool · on July 27, 2020

Thanks! I wasn't sure how people would feel about the syntax, and I'm glad you like it.

bionhoward · on July 27, 2020

Would it be possible to put the theorem name (and possibly a description) at the top instead of buried at the bottom in a metadata block?

Theorem: “Fundamental Theorem of Calculus”

CatsAreCool · on July 28, 2020

I thought of this, but couldn't find a nice way to fit this into the syntax.

That is,

  Theorem: "some text"

is the same as

  Theorem:
  . "some text"

Thus it would be hard to distinguish a named theorem with one statement

  Theorem:
  . "some name"
  . "statement 1"

from an un-named statement with two statements:

  Theorem:
  . "statement1"
  . "statement2"

So for now I'm leaving the name in the metadata section, but I'm open to putting the title higher if I can come up with a nice way to do so.