Hacker News new | past | comments | ask | show | jobs | submit login
Yacc is dead: An update (might.net)
93 points by Autre on Dec 13, 2010 | hide | past | favorite | 18 comments



People should please take away from this paper the central claim that they are offering up a fun, easy, and startlingly practical way to implement a parsing library in your favorite language / environment. It is but a few hundred lines in some functional languages. Scarcely importantly more in other languages. It is (per their plausible claims) a nice alternative to trying to muck up some dubious approximation using your regexp engine.

The "YACC is dead" claim of the original paper, and of this follow-up, is not so much "this is so good, everyone will stop using YACC". Rather, its that people don't use YACC when they ought to because it is inconvenient -- but this derivative-based parsing hack is convenient in important ways that YACC fails to be.


    Over the past year, even in the little time we've had to
    work on the paper, we've learned a lot more about parsing
    with derivatives.

    In the week after the community found it, you all taught
    us ten times more than that. Thank you!
Why isn't all computer science research done like this?


/article author

In short, there is a paranoid mentality in academia that if people let their ideas go before formal publication, they'll be "scooped."

I'll admit to falling prey to that mentality too.

This experiment has made me rethink a lot of my assumptions.

I'd say that, in general, most papers wouldn't elicit a community response like this.

I think the fact that this paper was on parsing, a topic accessible and important to many, helped fuel the interest.

I bet if I arXived the rest of my rejected papers (all on static analysis of dynamic languages), there wouldn't be much of a commotion.

But, I'm intrigued enough to give it another shot. I'll write a blog post about another rejected paper in the near future, release the draft on arXiv, post the reviews and see what happens.

If it happens again, maybe "naked science" is a realistic proposition.


>In short, there is a paranoid mentality in academia that if people let their ideas go before formal publication, they'll be "scooped."

That is because academia is mainly about status seeking (see Robin Hanson's many posts on Overcoming Bias, http://www.overcomingbias.com/), which is inherently a zero-sum game.

ADDED: That is about academia as a social institution, it is not intended to be an attack on all academics all the time, though any successful academic does have to accommodate to it.


You also picked a pretty troll worthy title as well (irrespective of content).

We could build a drudge report for CS, but i some how don't think that'd be useful to the community :)

(i found the paper interesting and useful though! reading through the scala has been instructive)


Too true about the troll-worthy title! I'm sure that helped.

We never got around to picking a more "academic" title, so we put it on arXiv under the working title.


Claim strikes me as a bit premature given the amount of Yacc based code there is in the world busy producing away day in and day out. If the algorithm is what it is said to be and I've no reason to doubt that, then at some point we will all be blessed with a Yacc replacement. Since I've always found Yacc to be a PIA, I certainly won't miss the old regime, but since there is nothing to use at the moment, I'm going to hold off a bit...


I have found hand-written parsers to be easier to write and easier to maintain than parsers generated by Yacc or its descendants.


There's nothing to use? What about ANTLR?


Discussion for the previous post: http://news.ycombinator.com/item?id=1948048


I nearly don't understand anything of this. It sounds cool, though.There is so much math and lisp...

But I want to learn and if possible even help. What do I have to learn to understand this and maybe implement it in my own languages?


I created a list of articles a few years ago: http://www.delicious.com/akkartik/parse

Intense in parts, but they are all less dry than a textbook.

(I did have some background, though, so perhaps you'll need some prior reading..)


You need to know

1. Context Free Grammars (essentially regular expressions that can be recursive)

2. Two kinds of parsing techniques LL and LR. LR is much harder to implement. LL is straightforward recursive descent. You don't have to actually study LR parsing.

3. Tools like yacc (LALR parser, weaker form of LR but more efficient) can be used to generated to emit parser code in C or your favorite language. However parser combinator libraries are preferred over code generators. These libraries are typically easier to write in functional languages but tend to be LL parsers that don't handle left recursion well.

4. This paper proposes a technique that lets you use left recursion in a parser combinator library. So, the only reason you would now use yacc is parsing speed.


> So, the only reason you would now use yacc is parsing speed.

"speed" means a couple different things. Someone might say the only reason you would use C++ rather than Python is speed -- i.e. you can always get within a constant factor with Python. Ignoring other aspects of the languages, let's call that true.

But Russ Cox is saying that you would also use yacc when you want a O() bound on the parsing algorithm (a linear algorithm). That is, the technique they advocate will produce parsers more than a constant factor slower than yacc.

Frankly I don't understand their rebuttal at all. The rebuttal is that it doesn't matter for practical purposes? That doesn't sound like computer science to me.

And practically speaking, it does matter, because plenty of people write exponential time regexps and don't know it, and it will blow up in their face on large input.


> "the only reason you would now use yacc is parsing speed"

Wikipedia says another advantage of LR parsing over LL is the error messages are more localized, and therefore relevant. I wonder whether derivatives in an LL parser (e.g. combinators) would help with that disadvantage?

If not, and because derivatives run quickly in practice for correct grammars, but approach theoretical exponential speed for random inputs, perhaps the best use for derivative grammars is giving realtime feedback when typing code into an IDE which enforces correct input and editing (e.g. always producing a corresponding correctly-placed close bracket in the code somewhere when the programmer types in the open bracket), and only allows previously entered source code to be edited within the IDE. Such a use would give decent parsing speed and no need for error messages, the very function parsing derivatives are good at (assuming this research is correct).


Everything you need to know to understand the algorithm would be covered in a mandatory "intro to theory of computation" or "intro to automata" course in a CS undergrad degree. You could start by reading the first half of a good textbook: we used the book by Hopcroft, Ullman and Motwani: http://infolab.stanford.edu/~ullman/ialc.html and I liked it a lot, but I've also heard good things about Sipser's book: http://www-math.mit.edu/~sipser/book.html


Almost everything I know about automata and formal languages I learned from Sipser's book or Wikipedia.


My class used Sipser, and it's generally clear and understandable.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: