Hacker News new | past | comments | ask | show | jobs | submit login

Marking up the semantic structure of all your mathematical formulas is about as likely as marking up the parse tree and parts of speech for all your regular prose sentences. Presentational MathML is the only thing that matters; Semantic MathML is W3C semantic web wank.



The idea is the MathML is generated by a computer algebra system or similar which has all that data, not manually.


What if you do not have any computer algebra system in your pipeline? LaTeX can be written and read by actual humans, MathML cannot.


You can use latex to author mathml, like we everywhere use markdown to author HTML.


The first hit in Google for "latex mathml converter" is https://temml.org/ . The list of supported LaTeX constructs looks fine, but the output… The example LaTeX converts to 67 lines of MathML. The vectors (arrows above B, l, E) are built like this:

    <mover>
      <mi>B</mi>
      <mo stretchy="false" style="transform:scale(0.75) translate(10%, 30%);">→</mo>
    </mover>
But the hat above n is:

    <mover>
      <mi>n</mi>
      <mo stretchy="false" style="math-style:normal;math-depth:0;">^</mo>
    </mover>
Markdown is great, but I can always take the HTML produced by a Markdown→HTML converter and edit it by hand, since HTML is human-readable and human-editable. MathML is neither.

The output of MathML is also uglier than KaTeX’s, with wonky spacing. In Firefox, there’s an ugly space after all \d's (derivative sign). It’s better in Edgium, but there is still an ugly space in \d\vec{l}.

If I need an intermediary tooling/format for math, I’d prefer beautiful LaTeX-esque output, and keep LaTeX code that wold be converted by KaTeX on the fly to whatever ugly HTML is required.


Is that the canonical mathml way of expressing vectors? Then yikes, I thought it's more semantic than that


LaTeX is isn't semantic, so how could MathML computed from LaTEX be semantic?


\vec{x} is quite semantic.


I guess <mover> is semantic enough, actually? Though I don't get why the engine doesn't just know to render the arrow appropriately and it needs all those <mo stretchy> things with inline styling.


<mover> has a boolean accent attribute so I guess you can write:

    <mover accent="true">
      <mi>B</mi>
      <mo>→</mo>
    </mover>


So maybe that's a problem with a particular latex->mathml convertor, but the idea itself should work.


What is anyone going to do with that information?


> Presentational MathML is the only thing that matters; Semantic MathML is W3C semantic web wank.

No. There are situations where a semantic interpretation is beneficial. A simple example is ‘(x + y) / z’. The best rendering (inline or full line) depends on the context.

People can discuss the tradeoffs without exaggerations and calling names. Let’s try.


That's a presentational consideration, whether you want a fraction that can stretch into {a \over b} or shrink into a/b, or if it always stays a/b.

Semantics means whether the / denotes eg division of real numbers or a group quotient. Semantic MathML basically wants you to provide a URI pointing to the definition of / you are using, eg. http://www.openmath.org/cd/arith1#divide. This is nonsense and about as likely as getting humans to mark up every word in your sentences with a URL for the numbered dictionary definition of the word you are using.


Thanks; I stand corrected about my example of a/b.

> Semantic MathML basically wants you to provide a URI pointing to the definition of / you are using, eg. http://www.openmath.org/cd/arith1#divide.

A better example, thanks.

> This is nonsense...

Could you clarify (dare I say markup) which sense of 'nonsense' do you mean?

  1. spoken or written words that have no meaning or make no sense: he was talking absolute nonsense.
  2. foolish or unacceptable behavior: put a stop to that nonsense, will you?
I'm not trying to just be pedantic: driving out ambiguity has value; even more so in mathematics.

Whether a particular technical suggestion for reducing ambiguity seems practical to you is quite different from it being understandable from some point of view and/or having internal consistency; i.e. making sense.

Let's move away from squishy, vague notions comparing semantic math markup vs English markup. Let's talk about some objective differences:

* "who": audience: mathematicians appreciate and deal in formality orders of magnitude more than English-speakers in general (perhaps second only to lawyers and grammar teachers?)

* "what": First, dictionary size: 1600 in OpenMath vs ~600K English words.

* "how": For a particular math document, it is likely that each operators used tends to have a common semantic. So it would make sense to state a mapping from certain operators to their semantics up-front and reuse these definitions throughout the document. See the bottom of this comment for some ideas on how this might be done.

* "why" : computability / reasoning: If mathematical is sufficiently formalized, it can be reasoned over very effectively, limited only by the computational algorithms. Reasoning over English sentences is fraught because English is frightfully ambiguous.

Conclusion: the 'who', 'what', and 'why' differ substantially.

P.S. Regarding the "how" discussion, here is a conversation with ChatGPT 4.0:

> In Semantic MathML, what mechanisms (if any) exist so that one math document does not have to restate the semantics of a particular operator repeatedly?

> "In Semantic MathML, use the <csymbol> element and Content Dictionaries (CDs) to avoid restating the semantics of an operator multiple times in a document. The <csymbol> element refers to an operator in a CD, which provides the operator's semantics. This way, you can reuse the operator without redefining its semantics in the document."

> Please double check this.

> Apologies for the confusion earlier. In Semantic MathML, there isn't a built-in mechanism to globally define the semantics of an operator for the entire document. However, you can use XML entities to create a shorthand reference for operators, and their semantics can be defined using the <apply> and <ci> elements, which encapsulate the semantic content. This will help reduce redundancy while still preserving the semantics of the operators throughout the document.

I'll spare you the rest of the chat log; the syntax is not pretty.


Semantic MathML is not useful. Consider a simple statement like

> Given any positive integers r and k, there is some number N such that if the integers {1, 2, ..., N} are colored, each with one of r different colors, then there are at least k integers in arithmetic progression whose elements are of the same color.

The majority of the semantics of this statement are part of the prose. Even having perfect "semantic" information about the formulas gets you absolutely nothing. You cannot reason about this without being able to reason about human language in general like a human can, in which case you do not need the "machine-readable semantics" of MathML at all.

The division of math into "ambiguous" English and "formal" formulae/symbolism is semantically artificial. Both of these are human languages. There is not some fully marked up form which is mechanically unambiguous that you can produce if you try hard enough. Presentationally the distinction is that special typesetting facilities are needed for the appearance of mathematical formulae and this is why presentational MathML is useful, but the semantics flow from the same source that the semantics of all written language flow from. It is not fundamentally different than, say, an Egyptologist who requires special facilities to typeset hieroglyphics.


>> Given any positive integers r and k, there is some number N such that if the integers {1, 2, ..., N} are colored, each with one of r different colors, then there are at least k integers in arithmetic progression whose elements are of the same color.

> The majority of the semantics of this statement are part of the prose.

The semantics can be expressed as prose if you like. They can also be expressed formally. As you know, 'color' is used here for the convenience of the reader. The underlying semantics works independently of the English notion of 'color'. You know this. What do you think is not formalizable about your example? "Given / assume": no problem. Introducing variables: no problem. Specifying variable types: no problem. Various mathematical structures (sets, sequences): no problem. "there are / there exits": no problem. "at least": no problem. And so on.

> Even having perfect "semantic" information about the formulas gets you absolutely nothing.

You have shared some good counterarguments, and I appreciate it, but "nothing" is an exaggeration here. It doesn't advance your argument.

> You cannot reason about this without being able to reason about human language in general like a human can

This is incorrect. You sound knowledgeable about mathematics, so I'm quite surprised you would make such a claim. Are your emotions (perhaps a loathing of Semantic MathML? -- your comments suggest this story) clouding your logic here? Deductive reasoning is well studied in computer science. To state a previous point again: there are many formal systems that can be reasoned over to various degrees. Such systems are still useful and quite different than English.


> Semantic MathML is not useful

Earlier, you said it was both nonsense and unlikely to ever work. Now you say it is not useful. You are moving the goal posts. Your comments have had a mix of thoughtful examples (thanks) with some exaggeration and rant. I'm trying to tease these apart, but it isn't easy. I can see you have a strong negative reaction to Semantic MathML, but so far, your writing hasn't been persuasive.


> The division of math into "ambiguous" English and "formal" formulae/symbolism is semantically artificial. Both of these are human languages.

I'll grant the obvious: both are human languages. But what does this assumption prove? It certainly does not prove that math is not formal.


That's put bluntly but well.

I understand the parent's sentiment though. Every few years when I need to use Tex for something it always take me a second to remember that it is not a computer algebra system, just a rendering engine hyper focused on rendering 2D math notation that predates web rendering engines by about two decades. For some reason it surprises me that CSS did not grow to eat the publication rendering use case. I get that Tex has solved that problem quite well, but it seems like the sheer number of people who used CSS would mean that those capabilities would be ported to the language. But no, didn't happen. So we still have the two rendering languages.


> as likely as marking up the parse tree and parts of speech for all your regular prose sentences

That sounds completely doable with modern NLP like GPT-4. Maybe we should start doing more design experiments with how one could exploit such semantics? There's no reason to think that syntax-free writing ought to be ideal, any more than space-free and punctuation-free writing was the ideal form of writing which could not be improved...


Yes it does, so there's no reason to mark it up. The failure of the semantic web was trying to lower human text to the level the machine could operate on it. The success of LLMs is to raise the machine to level it can operate on ordinary human text.


You mark it up with the LLM so you can do something with it. A LLM like GPT-4 is still just emitting text tokens, it's not also generating pixels to push to your monitor. If you want to do something involving semantics like part-of-speech tagging, you'd obviously use the LLM to parse & annotate it, and then do stuff like use CSS to style nouns differently from verbs, or whatever. Maybe you don't distribute that, but run it locally, but you still have to do that at some point!


> Marking up the semantic structure of all your mathematical formulas is about as likely as marking up the parse tree and parts of speech for all your regular prose sentences.

No. The semantics of math are simpler than English prose.


My above comment may not be useful or interesting. I'm not complaining; rather, I'm seeking feedback. Is there a strong substantive argument against what I said? I'm happy to be proven wrong, or to be shown that I'm missing the point. Learning and seeing a new point of view is more enjoyable that pointing out weak argumentation. I push back to see if there is any there there.

To the substance: If one wants to argue that Semantic MathML ("SMML") will not be adopted, one would need to define a metric. Then we can model and forecast. I don't think the parent poster nor I have the time or interest to do this. I'll leave this heady analysis to the mysterious proprietary commercialized no-one-gets-fired wisdom of the Gartner Magic Quadrant [1].

Now we shift to utility. To argue SMML is not useful, you have to explain the costs and benefits of specific aspects.

* Of course markup has cost, but what are they specifically? It depends on many things, but the key ones seem to be the format/markup itself and tooling to help with it. Hopefully we can steer clear of holy wars here (aka the semantic analogue of the tabs versus spaces forever-war).

* And what are the benefits? I see three intertwined benefits: driving out ambiguity, easier categorization, and improved reasoning. Reasoning over mathematics can help find mistakes, prove theorems, connect previously disparate proofs and even fields, and lots more.

I haven't seen a convincing argument against the value of such benefits, other than arguing against their likelihood. But the latter seems hollow; such benefits can and do happen with human minds reasoning over mathematics. It is clear that computers can assist, at the very least, and perhaps even lead. I'm inclined to think that AI can scale up mathematical reasoning beyond human capabilities, even if the AI is quite far from human intelligence.

One might argue impossibility of one purported benefit of reasoning. Of course, this is downstream of the for what question, above. In sibling threads, I see what seems to be a false equivalence between formalization of mathematics and English. That argument seems to imply that formalization is "too costly" or "too difficult" or possibly even impossible. It offers shallow dismissals of the benefits at best. In summary, it doesn't make a convincing cost versus benefits argument.

[1] https://www.theinformation.com/articles/the-tech-tussle-over...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: