Hacker News new | past | comments | ask | show | jobs | submit login

Cool -- I'd never heard of ripper before (Googling for it doesn't produce much information either).

Can you comment on the differences/advantages of ripper, compared with RubyParser? For my use case I just want to get a sensible AST from some Ruby source code, a SAX-style interface is not necessary.




> Can you comment on the differences/advantages of ripper, compared with RubyParser?

Absolutely! To provide context, what got me working on Ripper was my undergraduate thesis: Laser, a Ruby static analyzer, written in Ruby. It targets Ruby 1.9 only. To find out more, check the github: https://github.com/michaeledgar/laser/

My comparisons will be to the latest version of Ripper (the one in 1.9.3).

1. RubyParser only parses Ruby 1.8.x code - this was a dealbreaker for me, but won't be for some.

2. Ripper is C, primarily - it's actually just a separate set of action routines brutally hacked into the normal Ruby grammar file. This also means the code for it is pretty inscrutable. But, it's very fast and is integrated into the actual parser. It should grow with the language by design. However, bugs have shown this isn't as reliable as it could be.

3. Ripper does not provide comment nodes for any nodes - RubyParser provides them for def, singleton def, class, and module nodes. I had to reconstruct this manually using the lexer stream for my purposes, though I get the added benefit that I can attach comments to many nodes (but not all). See the code here: https://github.com/michaeledgar/laser/blob/master/lib/laser/...

4. Ripper is a true AST, in fact it is closer to a concrete syntax tree sometimes, which makes working with it a bit harder. RubyParser has more friendly output, but goes too far in the other direction - it infers semantic information for you, sometimes (imo) quite egregiously.

4a. Constant literals - from numbers to regexes - are returned as objects, with no AST information. The ruby parser does this internally to save some execution time, but I don't believe that's appropriate for a general-purpose parser.

4b. It adds :scope nodes inside class, module, and def nodes, to indicate closed scopes. They don't need to be there - they aren't part of the syntax, they're a property of how you interpret the syntax as a Ruby program.

4c. The worst offender is if you parse "begin; rescue Foo => x; end", it actually inserts an assignment "x = $!" into the rescue block. This is well beyond an AST.

5. Sometimes RP's output is a bit inconsistent in the number of child nodes for a given node. For example, the :rescue node in "begin; foo; rescue; end" has two child nodes. In "begin; rescue; end", it has 1 child node: just the :resbody node. A proper tree would have at least a nil node for the begin body, but RP elides it. This means if you see a :rescue node, you have to always check the first child node's type before you can do anything. That's why consistency is important.

6. RubyParser doesn't pick up errors as often, and as far as I can tell, doesn't report them at all. For example, parsing "def foo(x, x); end" in Ripper will give you a "param_error" node. I'm not too happy with the exact reporting style, and I've blogged about it (http://carboni.ca/blog/p/ripper-plus-How-Ripper-Must-Change), but it at least notes the error so you don't have to, as a parser should in this case (as it's a parse-level error). RP doesn't note it. If you do an invalid global alias: "class A; alias $foo $1; end", RP just ignores the node, whereas Ripper will report it with an :alias_error node.

All told, RubyParser is a nice library and it's important to have an all-Ruby option out there. But for me, Ripper is far more appropriate from a theoretical and practical standpoint for a large-scale project.

Edit: grr, my formatting wasn't saved twice now.


Awesome! Thanks for the information.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: