GNU Bison 3.3 released

stabbles · on Jan 26, 2019

"There are also many smaller improvements, including a fix for a bug which is at least 31 years old."

benj111 · on Jan 26, 2019

Its weird that web pages can succumb to link rot so quickly, apps succumb to bit rot almost as quickly, but a bug will live on.

In a thousand years time will archeologists study us through the bugs left behind in Linux 1300.05 and windows (30)95? Do you think there will still be jobs for Cobol programmers?

perlgeek · on Jan 26, 2019

> but a bug will live on.

Sometimes, yes. But it has also happened often to me that an old bug I was trying to fix was simply gone, either because the error mode has gone away, or the feature that the bug related to has been removed.

msla · on Jan 27, 2019

> But it has also happened often to me that an old bug I was trying to fix was simply gone, either because the error mode has gone away, or the feature that the bug related to has been removed.

And this is why groups like the OpenBSD team and the Linux kernel team are so eager to remove support for old hardware. Removing support for the 80386 cleaned up code which no longer had to have special cases to handle opcodes and features the 80386 doesn't have but later x86 processors do.

gumby · on Jan 26, 2019

It will be the "junk DNA" of the source code -- because of workarounds and baked in assumptions it will be impossible to eradicate.

indyz · on Jan 27, 2019

Something similar is actually a plot point in Vernor Vinge's "A Fire Upon The Deep".

jimktrains2 · on Jan 26, 2019

I wish it were easier to have more target languages. (And that there were more target languages.)

I know the docs say that additions are welcome, but I'm still in my infancy of working out how to :)

Congrats though! I love it when these tried-and-true tools continue to perform and improve!

jonathankoren · on Jan 26, 2019

I always kind of wonder what people use lex and yacc for. 20 years ago I made something to pretty print random C structs, but I have no idea what people do with these tools. Historically, these have been billed as "compiler compilers", and while that's certainly a use case, it never struck me as a very popular use case.

derefr · on Jan 26, 2019

The main advantage, to me, is that the input of Yacc is a (mostly) self-contained, (mostly) declarative representation of the grammar to be parsed. This format is portable between projects.

You can, for example, take Postgres’s SQL syntax parsing “engine” out of Postgres and use it in your own project, because it’s not actually an “engine”, but rather just a Yacc grammar.

wwright · on Jan 27, 2019

How often are yacc/lex grammars actually portable? Truly portable too — usable in different threading contexts, for example.

You say they are declarative. My experience is admittedly very limited, but I would call them anything but declarative. There are declarative elements, but the actual payload of your code is very, very often imperative. (A lot of this is shaped by the API, imo).

I compare this to parser combinators (like parsec) and PEGs (like luapeg and rust-peg). I find these other options are more naturally side-effect free, both internally and externally.

yanowitz · on Jan 26, 2019

They are used to build a number of tools you probably rely on or use regularly, including ruby, golang, bash, MySQL and postgresql.

CalChris · on Jan 26, 2019

golang uses a handwritten recursive decent parser and handwritten lexer.

https://github.com/golang/gofrontend/blob/master/go/parse.cc https://github.com/golang/gofrontend/blob/master/go/lex.cc

I can't think of a single major tool that uses ANTLR or bison. I say that being a huge ANTLR fan. Parser/lexer generators are really easy to use for DSLs but for real languages and tools, they're too heavyweight and constraining.

lower · on Jan 27, 2019

> Parser/lexer generators are really easy to use for DSLs but for real languages and tools, they're too heavyweight and constraining.

Haskell and OCaml are examples of real languages that use parser generators.

GHC: https://github.com/ghc/ghc/blob/master/compiler/parser/Parse...

Ocaml: https://github.com/ocaml/ocaml/blob/trunk/parsing/parser.mly

gnulinux · on Jan 27, 2019

GCC used to use bison. I don't think GCC became a major project recently. But you're correct that recursive descent is more efficient. Not everything is efficiency though (parsers are usually 1% of compile time since they're dirt cheap compared to code generator and especially static analyzer and optimizer). Bison makes parsing extremely simple. I can't see why anyone would use recursive descent unless they're a super major project and need to squeeze every ms.

wwright · on Jan 27, 2019

A big part of why tools move away from Bison and ANTLR isn’t performance, but UX (especially error reporting)

gnulinux · on Jan 27, 2019

In my experience recursive descent doesn't have any better error reporting than Bison or ANTLR. Parser combinators are much better if error reporting is what you're aiming for. Also, for LALR/LR parsers like Bison/ANTLR, the canonical trick is to encode errors in your grammar, which really isn't a huge deal. E.g. in pseudocode

    suite: stmt ';' stmt {/* ok */}
         | one_line_stmt NEWLINE stmt {raise("you need a ';' there buddy");}
         | // other rules
         ;

obviously this won't be useful for every case in general, otherwise why use ';' at all, but in most cases it'll give a helpful error.

Some more powerful parsers like Earley Algorithm have much worse error reporting.

ksherlock · on Jan 27, 2019

context sensitive grammars.

gnulinux · on Jan 27, 2019

You can deal with context sensitive grammars with "lexer trick" or multiple pass parsers.

grumpydba · on Jan 26, 2019

Does postgres count ? https://wiki.postgresql.org/wiki/Debugging_the_PostgreSQL_gr...

CamouflagedKiwi · on Jan 27, 2019

That's the gccgo frontend; the more usual Go compiler is here: https://github.com/golang/go/tree/master/src/cmd/compile/int...

It's still handwritten though so your point stands.

rvp-x · on Jan 27, 2019

Golang also has its own linker and assembler (going as far as having different notation for asm instructions), it doesn't count in these comparisons.

tptacek · on Jan 26, 2019

PHP appears to. It was the first language that came up in my Github search, and I stopped looking there.

tptacek · on Jan 26, 2019

https://github.com/search?q=yylex&type=Code

cryptonector · on Jan 27, 2019

Hand-coding {object code, codecs, parsers, ...} is a disaster. That's why compilers exist. It's also why code generators exist. That's what lex/yacc (flex/bison) are: code generators specialized for generating parsers. They are not compiler-compilers, but tools used in the construction of compilers (and other things that aren't compilers too).

gnulinux · on Jan 26, 2019

You use it when you need a parser or a lexer. Even GCC used to use it (they switched to a recursive descent parser a few years ago). Not sure what's surprising here. Maybe you're not familiar parsers?

coliveira · on Jan 26, 2019

Yacc is a very important part of UNIX because it allows one to create a parser for a language with limited knowledge of parsing. Of course, if a language is very complicated, it is better to create a special purpose parser, but with yacc one can easily create parsers for most simple DSLs.

chrisseaton · on Jan 26, 2019

> Yacc is a very important part of UNIX because it allows one to create a parser for a language with limited knowledge of parsing.

This isn't true. You basically need to know the entire algorithm Yacc uses to use it. What does someone with limited knowledge of parsing do when Yacc tells them this?

    5: reduce/reduce conflict (reduce 3, reduce 4) on c
    state 5
            B : x y .  (3)
            E : x y .  (4)

gnulinux · on Jan 27, 2019

I can't see the problem here. To use stdlib `rand` you need to understand that you should first `srand(seed)` and why, and that that seed needs to be unique for each program. Then, in order to use it correctly (e.g. not in a physics simulation) you should understand it's algorithm. Using programming tools require reading manual, and sometimes understanding algorithms.

chrisseaton · on Jan 27, 2019

> I can't see the problem here.

I didn't say it was a problem.

yesenadam · on Jan 27, 2019

This unfortunately all comes down to what you mean by "basically" and what they meant by "limited".

I think what they said is closer to the truth. I've used flex/bison a few times to create simple languages, and as long as they're parsed correctly, which is quick/easy to roughly check with a REPL (mine were in that form), conflicts can be ignored. Mine worked fine. It was a joy to use - having both flex and bison program windows open, plus bash to run the makefile, I could add a feature and test it within seconds. I'd say I have very limited knowledge of parsing, but read a few flex/bison books, enough to grasp what that message means. It's not necessary to have no conflicts, from what I understand. And it seems you have a more....professional language in mind, like in a commercial situation.

But then again, Yacc/bison isn't very important because it allows etc.... (i.e. that's not the reason it's important)

coliveira · on Jan 27, 2019

One thing is knowing about parsers, another completely different thing is knowing how to implement an efficient parser. This is the difference, e.g., between using a machine learning algorithm and implementing one from scratch.

tom_ · on Jan 27, 2019

They figure it out, I assume? - LR parsing is accountancy, not rocket science. You don't need to be a genius.

UncleEntity · on Jan 27, 2019

Open Shading Language (from Sony) uses it to parse the, umm, shading language.

Pixie (the Renderman renderer) uses two or three bison grammars.

I personally like SQLite's lemon parser generator for my grammar poking arounding when I'm not messing with my C++ Earley parser implementation -- which I can never seem to get working properly with empty rules since the papers on the subject are less than clear.

But I'm probably a bit of a strange one in that I'll spend days getting a random grammar working then drop it for some shiny new thing to play with.

userbinator · on Jan 26, 2019

In the C family of languages all the notable compilers either never used a parser generator and instead simple recursive descent, or moved from using one. It's easier to write, debug, and extend an RD (or table-driver precedence-climbing) parser than work with the (large amounts of) generated code tools like Bison will create.

x12xc · on Jan 26, 2019

In general, I don't think so. It's one thing if you parse a well understood grammar like C. But many people who use recursive descent from the start for a new language or DSL just don't bother about the grammar, and when someone takes a look it is suddenly LL(4) and relies on whatever the hand-written parser does.

kccqzy · on Jan 27, 2019

So what? I personally am a fan of recursive descent parsing for most programming languages, and if it works with recursive descent, why would it be troubling that it's actually an LL(4) grammar?

Now it would only be troubling if the grammar itself is intended to be widely usable across different languages, say JSON or XML. A parser generator really shines because it can generate code to parse this grammar in multiple languages. For programming languages intended to be written by humans, I value things like good error reporting far more.

ethagnawl · on Jan 27, 2019

I'd assumed VimScript was written using Flex/Bison; turns out it's written directly in C: https://github.com/vim/vim/blob/master/src/eval.c

memling · on Jan 27, 2019

I would guess that a lot of your cell data are processed by code generators that use the lex/yacc toolchain. While perhaps they're not commonly used in implementations, they have a large number of deployments.

dntf0llow · on Jan 26, 2019

I mean, they're basically used whenever you need either a lexer or a parser, or both. So yeah, there is quite a number of possible use cases.

beefhash · on Jan 27, 2019

lex and yacc also used by the ircd-hybrid and ircu families of IRCds to parse the configuration files. OpenBSD also uses yacc for configuration files for pf, doas, httpd, etc.

losvedir · on Jan 26, 2019

yecc, the erlang port of yacc, is used by the Elixir parser.

jeffdavis · on Jan 27, 2019

SQL parsers, like postgres and mysql.

mruts · on Jan 27, 2019

I think a lot of people just use parser combinators which I personally find a lot easier to use and much more powerful. Code as data, data aa code, yadda yadda.