I was wondering about that as well.. Getting equations right is critically important. But, from my limited experience, the MathJax syntax is LaTeX-enough that equations might be one of the easier parts of doing this?
What does it do beyond what tex4ht does? If there is some way tex4ht could be improved, perhaps it would be best to contribute to that project? https://www.tug.org/applications/tex4ht/mn.html
It is built on top of tex4ht. It provides merely a few settings for tex4ht and post-processing scripts that beautify the generated HTML. You might ask why post-processing? Well, because it was simpler for me than figuring out how to get tex4ht to do the desired thing. I just find Tex/Latex not pleasant to use as a programming language, but that's personal taste.
If the post-processing stage is useful to others, perhaps it could be upstreamed into tex4ht?
(I sometimes think that the user interface of github puts too much emphasis on cloning and not enough on cooperation. Many useful tools ends up in a dozen forks, all with slightly different features, all equally inactive.)
What science needs, I believe, is not another tool for making TeX more usable for information interchange, but a simple (much simpler than LaTeX, whose complexity and user experience is terrible), web-oriented standard language for typing in libre scientific and technical documents that browsers would support (or it can be translated to valid HTML+CSS seamlessly). Web documents are cheaper and more accessible to people, we should concentrate on those. I don't know if there is usable language and platform of this kind already. When we'll have that, tools for converting web documents to other less important formats such as paper-printable ones can be created.
Web documents are cheaper and more accessible, but there's still a quite large usage of print documents, so at least I, as a document author, don't want to commit to a "web-only" toolchain without a good to-print workflow also being available.
It's possible for HTML+CSS to also provide a good to-print workflow, but I don't think it's there yet, at least using open-source tools. I have heard PrinceXML can produce good results, with sufficient control over the print layout to make HTML+CSS usable as a print-oriented markup language. But between the cost, and the prospect of becoming dependent on a proprietary tool with no obvious alternatives, I haven't tried it.
PrinceXML is definitely the best tool, but there are a couple similar commercial alternatives and wkhtmltopdf have become a decent open source solution as long as you want basic docs.
Agreed. I would use latex for everything but I need to work with collaborators who are only used to writing in Word with track changes. Something that would allow me to write in latex then still share for editing with less tech savvy people would be wonderful.
You should definitely post some info about your project to tex4ht mailing list, I hope some interesting and more informed discussion might happen there.
you may also take a look at make4ht (https://github.com/michal-h21/make4ht), it is a build tool for tex4ht, included in TeX distributions and it also solve some of the problems as your script (ligatures, spurious span elements, image conversion, unicode, etc.). It can execute custom commands on all output files, so your script could be used with it as well
Strictly speaking, isn't this false by virtue of latex compiling to pdf?
And I have been looking hard for years and haven't found anything to replace TeX that fulfills even half my needs (as a person who does math typesetting in large documents almost daily). LaTeX has its issues, but it's the best we've got.
Compiling to PDF as a display format, and being amenable to translation to another markup format that retains the structure of the original markup, are slightly different tasks.
In general, I'd agree with the parent that LaTeX is something of a dead end in terms of translation. LaTeX will happily compile to PS/PDF/DVI, but translation to something like HTML is pretty reliant on using a subset of LaTeX. You can, after all, write LaTeX code to do computation.
In my experience with Pandoc, there are a good number of packages that simply don't work when translating from LaTeX. The more specialized or formatting specific the package--the further you deviate from the standard Article class and simple section headers--the less likely that you'll have a good result.
I've found that pandoc (and every other conversion utility) doesn't even produce useful results for LaTeX even if the LaTeX source doesn't use any complex macros or fringe packages.
The problem seems not to be that LaTeX is too powerful, but rather that the point of HTML and Markdown is to be as lightweight as possible. LaTeX, on the other hand, is meant to be a useful tool for humans to use to minimize the amount of boilerplate needed to write a large technical document (in print or on the web!).
This is actually kind of true. LaTeX is a Turing-complete language, which makes converting it to other things somewhat unreliable.
But, there's kind of an informal subset of LaTeX that you can use and not be too bad off with Pandoc.
In general, I'd like to do academic and report writing in Markdown (or Org!), but limitations of Markdown usually make it easier for me to do my writing directly in LaTeX, portability be damned. The biggest impediment for Markdown in academic / report writing I'm my opinion is the lack of internal references (things like "see figure X").
Org-mode can be converted via Pandoc much the same as markdown, but doesn't have the issue with cross-references.
Seconded the cross-reference gripe with Markdown. Somebody clever has to be working on that already, it's such an obvious gap. I think it would enable things like RMarkdown to reach "critical mass"
Latex has too much inertia for users to switch anytime soon. The only competition is from commercial tools like Adobe Framemaker.
I hope the future will turn out to be Lyx + Asciidoctor export. If Microsoft Word ever improves their math input interface, I think it would just kill off LaTex completely. It's already almost mandatory to use for document interchange.
Markdown is too limited. It has no standard for extensions - Asciidoc does. Markdown IMO is strictly for writing blog posts that are mostly text.
The main historical reason for using sans-serif fonts on displays was the displays' lousy resolution. With modern display technology that reason has mostly disappeared.
At what point does the resolution become good enough for serif fonts I wonder? Unfortunately 1366 x 768 is still quite the norm (118 ppi on a 13.3" screen).
I've had so many ideas for studies about the impact of fonts I wish I'd pursued. A psych professor of mine was interested in them and did at least one study that I know of[1], but information on speed and legibility would be fascinating.
I tried Pandoc before reverting back to tex4ht. Unfortunately, it models a rather small subset of the things I was interested in. Specifically around the typesetting of citations and listings, as far a I remember. So, tex4ht and HTML post-processing it was.
In general, mapping LaTeX to HTML is an unsolvable problem (and I speak as the author of one attempt to solve it (http://github.com/softcover/softcover)).