Cool project! I like the idea of easily sharing LaTeX formulas. It's impressive how smooth it works right in the browser.
I've always thought compiling LaTeX in WebAssembly would be a tough nut to crack, so I was curious if that's what you'd done here. Turns out you're using KaTeX.
Well, for one, KaTeX doesn't do "LaTeX" but a limited subset of the TeX equation syntax.
As such, it can't handle more complicated macros or typesetting anything apart from equations.
How does this compare to duckdb/polars? I wonder if GPU based compute engine is a good idea. GPU memory is expensive and limited. The bandwidth between GPU and main memory isn't very much either.
The same group (Nvidia/Rapids) is working on a similar project but with Polars API compatibility instead of Pandas. It seems to be quite far from completion, though.
I've been watching cuda since it's introduction and Polars since I had an intern porting our Pandas code there a couple years ago but I had no idea Polars would go this far, this fast!
I used to think read is always faster than write in ssd. But in figure 5.3 and figure 5.4, it looks like in ssd, read iops is lower than write iops.
When queue depth is low(like qd=1), random 4k read iops is far more less(14.5 kiops vs 128 kiops) than 4k random write iops. When queue depth is high, like qd=32, the read iops and write iops becomes similar. But read iops is still less than write iops.(436 kiops vs 608 kiops)
I wonder why read is slower than write? Is it because ssd has a fast write cache, and it will finish the write request once the data is written into cache? Or it simply report that the data is written and actually write them in batch in background?
I've always thought compiling LaTeX in WebAssembly would be a tough nut to crack, so I was curious if that's what you'd done here. Turns out you're using KaTeX.
Have you considered any WebAssembly approaches?