Thanks for the question. I thought about using LaTeX, however, it doesn't explain what a math statement means. Instead, it describes that it looks like. With MathLingua, the meaning is made precise so that a computer can understand the statement. (I'll make this more clear in the docs).
TLDR: Having a computer understand what a statement written in LaTeX means is a very difficult Machine Learning (Natural Language Processing) problem (if not impossible since math statements often have lots of hidden context).
For more details:
In LaTeX, what does "if $f$ is continuous on $[a, b]$, $f \geq 0$ there, and $\int_a^b f = 0$ then $f = 0$?
It looks clear at first, but what type of thing is $f$. From context of reading analysis books, it could be clear that $f$ must be a function because "continuous" describes functions (but can also describe other things).
Next, what does $\int_a^b f = 0$ mean? It is the integral of $f$ from $a$ to $b$, but is it the Riemann integral, the Lebesgue integral, the Darboux integral? Does the difference matter?
What exactly does $f = 0$ mean. Does it mean $f$ as a function is the same as $0$ as a function, or does it mean $f(x) = 0$ for all $x$? Are those two the same or different? Does the distinction matter?
All of these questions, show up after a computer has correctly parsed the sentence and identified the structure. However, just that step itself is a very hard natural language processing problem.
In fact, if someone comfortable in reading math was given a random theorem from a random paper and was told to explain what it means, they would likely have problems and would need to look at the context of the paper (or reference other papers) to understand the syntax. This just gives a feel for the difficultly to build a machine learning model.
That is why I decided to build a custom language. In MathLingua, for example, the theorem could be described as:
Theorem:
. for: f(x), a, b
where:
. 'a, b is \real'
. 'a \leq b'
. 'f is \continuous.function:on{\closed.interval{a, b}}'
. 'f is \nonnegative.function:on{\closed.interval{a, b}}'
. '\riemann.integral_a^b[x]{f(x)} = 0'
then:
. 'f is \identically.zero.function:on{\closed.interval{a, b}}'
It is more verbose, but gives a more precise meaning of what each item means. Then definitions can be made to describe what each referenced math definition means with a description of how its associated notation is written to make the rendering look similar to how it is written on paper.
For example, one could have the definition:
[\identically.zero.function:on{A}]
Defines: f(x)
assuming: 'A is \set'
means:
. for: a
where: 'a \in A'
then: 'f(a) = 0'
Metadata:
. written: "f? = 0 \textrm{ on } A?"
This makes the definition more clear and the statement `g is \identically.zero.function:on{X}`, for example, is rendered as the familiar $g = 0 \textrm{ on } X$.
Other definitions can be written too, to make everything clear (or left for later. MathLingua doesn't require you to define the entire math universe before defining a more complex concept).
My response was a little long, but that is because there are a lot of subtleties that show up when trying to get LaTeX to work for describe the meaning of math statements. I hope this helps.
TLDR: Having a computer understand what a statement written in LaTeX means is a very difficult Machine Learning (Natural Language Processing) problem (if not impossible since math statements often have lots of hidden context).
For more details:
In LaTeX, what does "if $f$ is continuous on $[a, b]$, $f \geq 0$ there, and $\int_a^b f = 0$ then $f = 0$?
It looks clear at first, but what type of thing is $f$. From context of reading analysis books, it could be clear that $f$ must be a function because "continuous" describes functions (but can also describe other things).
Next, what does $\int_a^b f = 0$ mean? It is the integral of $f$ from $a$ to $b$, but is it the Riemann integral, the Lebesgue integral, the Darboux integral? Does the difference matter?
What exactly does $f = 0$ mean. Does it mean $f$ as a function is the same as $0$ as a function, or does it mean $f(x) = 0$ for all $x$? Are those two the same or different? Does the distinction matter?
All of these questions, show up after a computer has correctly parsed the sentence and identified the structure. However, just that step itself is a very hard natural language processing problem.
In fact, if someone comfortable in reading math was given a random theorem from a random paper and was told to explain what it means, they would likely have problems and would need to look at the context of the paper (or reference other papers) to understand the syntax. This just gives a feel for the difficultly to build a machine learning model.
That is why I decided to build a custom language. In MathLingua, for example, the theorem could be described as:
It is more verbose, but gives a more precise meaning of what each item means. Then definitions can be made to describe what each referenced math definition means with a description of how its associated notation is written to make the rendering look similar to how it is written on paper.For example, one could have the definition:
This makes the definition more clear and the statement `g is \identically.zero.function:on{X}`, for example, is rendered as the familiar $g = 0 \textrm{ on } X$.Other definitions can be written too, to make everything clear (or left for later. MathLingua doesn't require you to define the entire math universe before defining a more complex concept).
My response was a little long, but that is because there are a lot of subtleties that show up when trying to get LaTeX to work for describe the meaning of math statements. I hope this helps.