Language and shell in Go with 92% test coverage and instant CI/CD [video]

xiaq · 2024-08-30T15:18:24 1725031104

Hey, it's my talk, AMA :)

If you're interested in Elvish, you may also be interested in the talk on its design - https://www.youtube.com/watch?v=wrl9foNXdgM

HeralFacker · 2024-08-30T18:39:14 1725043154

Do you have a link to a copy of the video with captions? YouTube autogen doesn't cut it unfortunately. Or perhaps a written-form version (slide deck + transcript)?

What's in the 8% not covered by testing?

xiaq · 2024-08-30T19:45:35 1725047135

I don't have a version with captions, sorry. You can find the slidedeck at https://github.com/elves/elvish/blob/master/website/slides/2...

The remaining 8% mostly falls into the following categories:

- Code that use OS functionalities that are cumbersome to mock in tests

- Code paths that are triggered relatively rarely and I was simply too lazy to add tests for them

Nothing is impossible to cover, but for whatever reason it was too much work for me when I wrote the code.

However, it's worth mentioning that I only settled on the transcript test pattern fairly recently, and if I were to rewrite or refactor some of the untested code today I would add tests for them, because the cost of adding tests has been lowered considerably. So Elvish's test coverage is still increasing slowly as the cost of testing decreases.

zvolsky · 2024-08-30T20:11:57 1725048717

Hey, thanks again for the talk and for answering my fork bomb question with a live demo!

xiaq · 2024-08-30T20:49:22 1725050962

Thanks for your question and glad that you enjoyed it!

hnlmorg · 2024-08-30T21:29:56 1725053396

I thought you handled the question really well. To be honest the whole talk was excellent. I'm gutted I missed it in person.

heleninboodler · 2024-08-30T21:50:00 1725054600

There were a lot of aspects of this talk that I thought were really great. The willingness to try something unscripted, diving into the code repo live (e.g. to show where fuzzing is used), and the discussions of the reasoning behind the design choices. Great job @xiaq. This really makes me want to try elvish out, and I usually am quite skeptical of new shells.

xiaq · 2024-08-30T22:01:49 1725055309

Thanks! Glad that the talk is working as a marketing pitch for Elvish :)

xiaq · 2024-08-30T21:32:58 1725053578

Thanks! Murex talk when??? :)

hnlmorg · 2024-08-30T21:45:14 1725054314

haha I can't present nearly as well as yourself but maybe one day.

It's not easy to present though. I know on HN we see a lot of very clever people give some well executed presentations and it's sometimes easy to forget how much preparation and courage it takes to perform like that. And it's great to see how engaged people were with the content too.

Sorry, this is less of a question and more just comment of appreciation.

xiaq · 2024-08-30T22:02:16 1725055336

Thanks, I appreciate the comment the appreciation :)

0xdeadbeefbabe · 2024-08-30T22:14:35 1725056075

In vim vi or nvim :r !date gives me shell returned 2

xiaq · 2024-08-30T22:20:24 1725056424

Did you set your login shell to Elvish? Vim unfortunately relies on your shell being a POSIX shell, but you can fix that with "set shell=/bin/sh" in your rc file.

xiaq · 2024-08-30T22:32:47 1725057167

FWIW, I've just added this instruction to https://elv.sh/get/default-shell.html#vim-/-neovim

mpenick · 2024-08-30T23:18:12 1725059892

Does elvish have a command history limit? Or is it configurable? I like a nearly infinite history.

xiaq · 2024-08-30T23:26:15 1725060375

History entries are kept indefinitely.

mpenick · 2024-09-03T14:17:50 1725373070

Thanks for the reply. You may have a new user. :)

throwaway2016a · 2024-08-30T18:19:53 1725041993

This seems like a cool project.

This is meant as additional information not criticism. I skimmed the transcript really fast so if this is in there and I missed it, please correct me, but two things I think are helpful for people creating projects like this to be aware of:

- This video seems to combine the concepts of lexing and parsing. It is usually beneficial to separate these two steps and lex the input into tokens before passing to the parser.

- Go actually has a pure Go implementation of Yacc in the toolset and I've used it in several projects to make parses. Dealing with the Yacc file is often much easier than dealing with code directly since it takes care of writing the actual parser. There is a lot of boiler plate that goes into parsers that when you use Yacc it "just works".

Edit: there are also some tools for writing parsers in Lex/Flex like syntax (re2c comes to mind) but I've found hand writing lexers to be effective in Go if your language doesn't have many different types of tokens.

xiaq · 2024-08-30T20:48:38 1725050918

Right, I may have forgot to mention that lexerless parsers are somewhat unusual.

I didn't have much time in the talk to go into the reason, so here it is:

- You'll need a more complex lexer to parse a shell-like syntax. For example, one common thing you do with lexers is get rid of whitespaces, but shell syntax is whitespace sensitive: "a$x" and "a $x" (double quotes not part of the code) are different things: the first is a single word containing a string concatenation, the second is two separate words.

- If your parser backtracks a lot, lexing can improve performance: you're not going back characters, only tokens (and there are fewer tokens than characters). Elvish's parser doesn't backtrack. (It does use lookahead fairly liberally.)

Having a lexerless parser does mean that you have to constantly deal with whitespaces in every place though, and it can get a bit annoying. But personally I like the conceptual simplicity and not having to deal with silly tokens like LBRACE, LPAREN, PIPE.

I have not used parser generators enough to comment about the benefits of using them compared to writing a parser by hand. The handwritten one works well so far :)

throwaway2016a · 2024-08-31T01:36:04 1725068164

That example you gave could certainly be done in Lex/Flex and I assume other lexers/tokenizers as well, for instance, you would probably use states and have "$x" in the initial state evaluate to a different token type than "$x" in the string state.

But I do get your meaning, I've written a lot of tokenizers by hand as well, sometimes they can be easier to follow the hand written code. Config files for grammars can get convoluted fast.

But again, I was not meaning it as criticism. But your talk title does start with "How to write a programming language and shell in Go" so given the title I think Lexers / Tokenizers are worth noting.

xiaq · 2024-08-31T10:42:00 1725100920

Yeah, ultimately there's an element of personal taste at play.

The authoritative tone of "how to write ..." is meant in jest, but obviously by doing that I risk being misunderstood. A more accurate title would be "how I wrote ...", but it's slightly boring and I was trying hard to get my talk proposal accepted you see :)

throwaway2016a · 2024-08-31T12:51:14 1725108674

As someone who has given a handful of talks at conferences.. 100% relatable.

ridiculous_fish · 2024-08-30T19:14:04 1725045244

Shells have somewhat unusual parsing requirements. For example "if" is a keyword when used as `if echo` but not `echo if`.

So you either need to implement the lexer hack, or have a "string" token type which is disambiguated by the parser (which is what fish-shell does).

https://en.wikipedia.org/wiki/Lexer_hack

throwaway2016a · 2024-08-31T01:15:30 1725066930

That's no problem in many modern lexers as they usually have a "state" so when you encounter "echo" you can switch to a new state and that state may have different token parsing rules. So "if" in the "echo" state could be a string literal whereas it may be a keyword in the initial state.

Lex/Flex takes care of that mostly for you which is one of the benefits of using a well worn lexer generator and not rolling your own.

radiospiel · 2024-08-30T19:20:18 1725045618

unless i miss something this should not be an issue. the lexer could parse if as an IF token, and the parser could treat tags as STRING || IF ( || other keywords… )

duskwuff · 2024-08-30T19:23:33 1725045813

That seems like it'd get really awkward pretty quickly. "if" isn't unique in this regard; there are about a hundred shell builtins, and all of them can be used as an argument to a command. (For example, "echo then complete command while true history" is a valid shell command consisting entirely of names of builtins, and the only keyword in it is the leading "echo".)

deathanatos · 2024-08-30T19:27:25 1725046045

You'd have to `|| EVERY_KEYWORD_IN_LANG`, and then if you ever add a keyword, now you're updating that list there, and anywhere else you've used it.

As the "Lexer hack" Wiki page says, this is only a problem if you're lexing in the first place. If you just parse the grammar, this isn't a problem.

hnlmorg · 2024-08-30T21:50:18 1725054618

The problem lies with shells extensive usage of barewords. If you could eliminate the requirement for any bareword to be treated as a string then parsing shell code would then become much simpler...but also few people would want to use it because nobody wants to write the following in their interactive shell:

    git "commit" "-am" "message"

    ls "-l"

 etc

lolinder · 2024-08-31T02:22:24 1725070944

> Dealing with the Yacc file is often much easier than dealing with code directly since it takes care of writing the actual parser. There is a lot of boiler plate that goes into parsers that when you use Yacc it "just works".

Honestly, I think this is overstating the amount of boilerplate in a parser and overstating how well a parser generator "just works". I haven't used Yacc, so maybe it's better than ANTLR, but having tried ANTLR and written a few recursive descent parsers I've been pretty well cured of wanting to ever use a parser generator. ANTLR's generated code is verbose, the data structures are hard to work with, and error handling leaves a lot to be desired.

Parser boilerplate can be reduced to a large extent with a good set of helper methods (I often find myself referring back to the set used in Crafting Interpreters [0]), and what you get in exchange is full control over the data structure generated by the parser and over the error handling. For a language that you're serious about, that tradeoff is totally worth it.

[0] http://craftinginterpreters.com/

pianoben · 2024-08-31T02:52:13 1725072733

Maybe it's just my skill level, but I've used both hand-rolled recursive-descent and ANTLR for the same project (Thrift parser), and hoo boy I would never go back to recursive-descent for that. ANTLR shrank my code by an order of magnitude, and cleaned up some bugs too.

I'd be willing to believe that beyond a certain level of input complexity, ANTLR no longer pays for itself. In my experience, there exists a class of languages for which there's no better tool.

xiaq · 2024-08-31T10:46:52 1725101212

I would love to see the diff between the hand-rolled recursive-descent parser and the ANTLR syntax!

I certainly feel the amount of boilerplate in my hand-rolled recursive-descent parser is manageable. Of course it's not as succinct as an EBNF grammar:

- For example, you have to write an actual loop (with "for" and looping conditions) instead of just * for repetition

- The Go formatter demands a newline in most control flows

- Go is also not the most succinct language in general

So you do end up with many more lines of code. But at the end of the day, the structure of each parsing function is remarkably similar to a production rule, and for simpler ones I can mentally map between them pretty easily, with the added benefit of being able to insert code anywhere if I need something beyond old-school context-free parsing.

adastra22 · 2024-08-31T07:57:51 1725091071

> This video seems to combine the concepts of lexing and parsing. It is usually beneficial to separate these two steps and lex the input into tokens before passing to the parser.

Historically, yes. In recent years combined lever-parsers have outperformed dedicated lexer + dedicated parser combinations, and with modern tooling this isn’t the janky mess it used to be. Some of the best tools out there are combined lexer-parsers.

eru · 2024-08-31T09:24:42 1725096282

> - This video seems to combine the concepts of lexing and parsing. It is usually beneficial to separate these two steps and lex the input into tokens before passing to the parser.

With traditional techniques, yes. But if you eg use parser combinators (which would admittedly a bit unusual in Go), combining both steps is pretty common.

> - Go actually has a pure Go implementation of Yacc in the toolset and I've used it in several projects to make parses. Dealing with the Yacc file is often much easier than dealing with code directly since it takes care of writing the actual parser. There is a lot of boiler plate that goes into parsers that when you use Yacc it "just works".

You are right that it's best to avoid Go when you can. Just like Java folks (stereotypically) seemed to avoid writing Java at all costs and rather wrote XML config files to drive their logic.

Yacc (and lex) are otherwise not good choice for specifying languages these days.

solidsnack9000 · 2024-09-01T16:57:18 1725209838

The availability of `else` in `while`, `for`, and `try` is an interesting feature of Elvish.

https://elv.sh/ref/language.html#while

hackernudes · 2024-09-05T15:56:59 1725551819

Python has it too, but they don't work the same way (the python 'else' always runs unless the while loop is exited with 'break'). https://docs.python.org/3/reference/compound_stmts.html#whil...

cdcarter · 2024-08-30T17:38:14 1725039494

Do you have any written posts or documents about this language and your design decisions?

whereistimbo · 2024-08-30T18:22:45 1725042165

https://elv.sh/

xiaq · 2024-08-30T20:30:27 1725049827

I gave a talk about the design: https://www.youtube.com/watch?v=wrl9foNXdgM

As the sibling comment mentioned, you can find documentation on Elvish itself on the website https://elv.sh. There are tutorials and (not 100% but fairly complete) reference documents.

risenshinetech · 2024-08-30T16:45:18 1725036318

[flagged]

38 · 2024-08-30T17:06:29 1725037589

how is this an on topic comment?

randomdata · 2024-08-30T17:13:55 1725038035

Looks more like a question than a comment.

AnimalMuppet · 2024-08-30T17:22:37 1725038557

Looked like a "just asking questions" question. Like it had a point to make and an axe to grind.