Hacker News new | past | comments | ask | show | jobs | submit login
Shiki Syntax Highlighter (matsu.io)
123 points by stefankuehnel on March 7, 2023 | hide | past | favorite | 30 comments



Question: why use TextMate instead of tree-sitter?

Tree-sitter's drawbacks compared to TextMate and other regex-based solutions are slower speeds and chance of edits breaking parse trees, which are not issues here. Its advantages are that the grammar definition can be more expressive and is much more readable.

Not saying you should abandon using TextMate, "if it ain't broke..." and I doubt all of the grammars you support right now have as pretty tree-sitter equivalents. But maybe add support for tree-sitter as well, especially because it seems to be growing in popularity, with growing support in emacs and neovim.


The advantage is reusing existing themes. TextMate themes as a standard are the de facto not-great-but-used-everywhere thing and in this case it means you can style your blog or whatever with what you use in VSCode or Sublime or whatever.


Ironically, the last time I checked (which was some number of years ago), TextMate 2.0 did not support the TextMate 1.x theme format, not even via any sort of import or conversion mechanism. Which is odd, considering that it's become a de facto standard supported by other editors.

TextMate 1.x was the king of Mac text editors back in its day. Is anyone actually still using TextMate today?


I’m amazed I can’t remember whether I converted my custom TextMate theme when I started using 2.0! I also wish I’d realized the vivid colors in the theme (I called it “acid skittles”) were appealing to me for contrast, not their range of color spectrum.

I don’t use TextMate anymore but I surely would have stuck with it longer if it had a debugger and language server. I panned all of the non-native editors that came between, but sadly had to admit that VSCode fit my workflow a lot better.


Is tree-sitter really slower than TextMate grammars? Some benchmarks indicate that this isn't really the case [1]. On the other hand, breaking parse trees is a real issue, because the error-recovery in tree-sitter is pretty rudimentary [2][3], but as you said, it's not an issue for Shiki.

Several TextMate grammars suffer from inaccuracy bugs, and issues of maintainability. Perhaps the biggest hindrance in the adoption of tree-sitter, is that the most popular editor, VSCode, still doesn't support it.

[1]: https://github.com/microsoft/vscode/pull/161479

[2]: https://github.com/tree-sitter/tree-sitter/issues/1870

[3]: https://github.com/tree-sitter/tree-sitter/issues/224


Tree-sitter is going to be a lot slower to load: https://github.com/tree-sitter/tree-sitter/issues/1942.

It's ok for things like editor but using it in a SSG for example is hard when just loading the syntaxes will take multiple times the time it takes to render the whole thing with a regex based highlighter.


Would use treesitter if it had more documentation and examples. For example how to find all function declarations, all variable declarations, etc


The problem with Shiki is that it uses TextMate grammars which in turn require the Oniguruma regex engine. Shiki compiles that to WASM I believe, but it makes Shiki larger and slower than a system that uses plain JavaScript regexes or a real parser.

I'd only use it if it's absolutely necessary that your syntax highlighting exactly matches VS Code. (It's also a shame that VS Code requires Oniguruma)


> I'd only use it if it's absolutely necessary that your syntax highlighting exactly matches VS Code. (It's also a shame that VS Code requires Oniguruma)

This more important than you state. I have many JS syntax highlighters and most of them are garbage. Like, they don't even recognize some obvious keywords like `const`, `async` and `await` for Javascript. I'd rather chose something with better inbuilt grammar. I am not sure what the slowdown is like, but I don't imagine it'd be much. And with SSR, you can even offload it to a server and cache the generated HTML.


Coincidentally, I recently looked into the slow startup time for shiki and it was mostly from parsing JSONC, not WASM: https://github.com/shikijs/shiki/issues/439.


Hmm, I don't see why it's parsing it at all. It could just be inside the rust source code. And it might be faster to run, not just faster to load.


Not all WASM files are in the same ballpark. This one is 500K while esbuild is almost 10MB.

https://cdn.jsdelivr.net/npm/shiki@0.14.1/dist/

https://cdn.jsdelivr.net/npm/esbuild-wasm/

That's no knock on esbuild. esbuild benefits from having been built with golang and wasn't designed to be run in the browser. (tinygo can run go in much smaller but you give up some of golang's power)

WASM is fast becoming a standard thing.


500k to mostly duplicate an existing platform feature? No thanks.


What feature are you referring to?


Regular expressions it seems. I disagree that it's a duplicate or a waste. It's a different regex engine with different features.


Yes, I have also noticed that Shiki is slower. We use it in our blogging platform for code block highlighting and posts with code blocks are noticeably slower to save than posts without them.

Are there any alternatives that supports VS Code syntax but faster?


In the examples it doesn't seem to match VS Code at all. It's way less colourful and missing useful features like different colours for nested brackets.


There's a similar one for Emacs I know of, htmlize.el. It reuses the parsing already done by Emacs (not just from the major mode but by any minor mode too) and output HTML exactly matching the colors syntax highlighting you see in your Emacs buffer.


I think it's normally used if you export an Org file with fragments of code. It can be invoked directly, too.


Yes! Using org-publish with htmlize is wonderful.



Is the name 式? I always found this word interesting, it can mean "mathematical term/expression", but also "form" (as in formalism) in general, and can also mean "ceremony" as in ritual form


Shiki is awesome for code samples. It’s even better paired with a set of light/dark themes designed to be used together (and I really need to get around to open sourcing my solution for swapping inline styles with classes for that use case, it’s great for using Shiki without a client side runtime).

ALSO awesome is Shiki Twoslash[1], for displaying TypeScript editor feedback in code examples.

1: https://github.com/shikijs/twoslash


Being out of the loop, what does other highlighters used in static website use as engines?


Related and similar, there's also starry-night: https://github.com/wooorm/starry-night


This is really cool. I've been looking for a high quality (Textmate-based) syntax highlighter for rendering source code on a poster. It would be great to integrate this library with ctags or something in order to perform accurate semantic highlighting for C/C++ (distinguishing between macros and functions, etc.).


Looks nice. Are there any TreeSitter grammar based code highlighters too?


htmlize.el if you use tree-sitter based syntax highlighting in emacs.

I’ve written one myself, it’s pretty simple to walk a tree-sitter parse tree and generate spans with classes based on the type of each node. Then syntax highlighting is “just” a matter of writing a bit of CSS.


It's cool, the svg generation feature is quite useful for embedding code snippets into a diagram.


What, that's so cool. I was surprised to find the code sample be runnable. Nice work.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: