Hacker News new | past | comments | ask | show | jobs | submit login
Regular Expressions in 100 lines of Scala (rcoh.svbtle.com)
40 points by rusbus on Sept 13, 2014 | hide | past | favorite | 21 comments



It's not so easy to see in the linked article, but the article is actually in 3 parts:

Part 1: http://rcoh.svbtle.com/no-magic-regular-expressions

Part 2: http://rcoh.svbtle.com/regular-expressions-part-2

Part 3: http://rcoh.svbtle.com/no-magic-regular-expressions-part-3

It's really well done!


There is a similar chapter [1] in the book Beautiful Code in which Brian Kernighan explains how to implement a regular expressions matcher using code written by Rob Pike in C.

[1]: http://www.cs.princeton.edu/courses/archive/spr09/cos333/bea...


see also parser combinator approach:

reg exp engine in 14 lines of python https://news.ycombinator.com/item?id=3202313


Btw, is Scala still relevant with the coming of Java 9?


Java is improving, but it's got a long, long way to go before it can catch Scala. Can you write data types in Java 9 as simply as case classes? Does type inference just work? Is there something as concise as pattern matching? Does it have typeclasses, or an equivalent solution to the expression problem? Can you write DSLs with the flexibility of Scala's for syntax? Can you write tagged types? Even if it has all these features, does it have them in a unified way, implemented by consistently combining some simple primitives, or is it a mess of special cases?


Considering that Java still has about 95% market share on the JVM, I'd say Scala has a long way to go before it can catch up to Java. Same observation for jobs and tooling.

It really all depends on what criterion you use.


The grandparent asked explicitly about Java 8 and 9. What are the market share, jobs and tooling like for those versions?

As someone who's had to push for both, many organizations find introducing Scala easier than doing a JVM upgrade (since the latter requires work on the ops side). Last time I looked, e.g. the New Relic agent would crash if you tried to run it under Java 8 - whereas it will profile your Play transactions just fine.

Popularity is a lagging indicator. The tooling is there already, quite frankly - we see first-class support for Scala in new efforts like Takipi. Jobs are what you make them - more than once I've taken a Java job and turned it into a Scala job. I honestly believe it's the best general-purpose language going right now (which is why I use it, and why I have a job doing it full-time).


I definitely think so. Why wouldn't it be? Do you think Java 8 or 9 would be able to solve OP's problem with as much expressiveness as Scala?


Not completely, but there was/is a line of advising to use Scala as a better Java. Java is getting better and better with subsequent versions, so at least this line of Scala loses strength I believe.


Scala is still a much better java than even Java 9. Even if you don't want to go FP crazy, it still makes sense to at least try scala.


Sean, you've written in the past that you tried Scala 7+ years ago; any reason beyond working for M$ that you don't give it another look? It's certainly improved a ton since your explorations during the stone ages of Scala ;-)

For example, you probably were never able to take advantage of type classes, (G)ADTs, value classes, recursive types, exhaustive pattern matching, etc., etc. back in the day.

Would be nice to have an OO expert/advocate on the Scala side of the fence to provide a counter point to the FP wave.


I have nothing personally to write for the JVM...or I would use Scala. That's what I meant by "better Java." Besides, the Scala community might not be a great place for an OO advocate, Dart and Typescript, or maybe Swift, are more of the future for us.


The Scala community is, and always has been, quite heterogenous. There's a vocal, and sometimes quite abrasive group of fundamentalists on HN and Twitter. But the Scala community is much more diverse than what you would hear from them.


There is a bit of difference between OO people who use Scala and OO people who influence it (through implementation work or design input); I'm not seeing much of the latter these days as an outsider, but I admit I might not be looking hard enough.


Another factor is that Scala runs on JRE 6 and 7. There's still a huge amount of Java 6 in production environments, and much much less Java 8.


You're asking that question about two years too early.


The title is a little misleading - perhaps better would be "A very limited subset of Regular Expressions in 100 lines of Scala".

While the code is concise, this is more a case of "it implements A and B of an X" than "it implements an X, apart from A and B".

Still, all three parts of the series are a good read, and definitely helpful for learning about parsing and about Scala.


Mathematically, regular expressions are limited in comparison to what are commonly called 'regular expressions' or 'regexes' by programmers. Mathematical regular expressions can be translated into automata, 'Regex style regular expressions' often cannot.

The advantage of mathematical regular expressions is that they terminate in O(n) time based on the length of the input string. The disadvantage is that they lack features like capture groups and look ahead/behind. The important point is that engines like those of Perl or Java are not based on regular expressions in a mathematical sense. This is true of any Regex engine that stores state or backtracks.


Regular expressions just need 3 things: sequence, alternatives (|), and repetition(*). Everything else is an extension.


And the empty versions/identities of those operations


Right, those three things plus three times nothing. 3 + 3*0 = 3




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: