Implementing a programming language in D: Lexical Analysis

landr0id · on Dec 28, 2015

Wow, perfect timing! I'm currently writing my first lexer/parser in D as well (http://github.com/landaire/hsc) for a Scheme-like language used in Halo 3. I found Rob Pike's talk on lexical scanning to be pretty useful (https://www.youtube.com/watch?v=HxaD_trXwRE) and I've modeled mine pretty heavily after the text/template lexer: https://golang.org/src/text/template/parse/lex.go.

dimgl · on Dec 28, 2015

I've never seen the D language before and I have to admit, it looks very elegant and has piqued my interest. Thanks for your post.

I'm wondering why you used D rather than Rust? Was it just that you were curious about the D language or is there something about Rust that you don't like?

jeremiep · on Dec 28, 2015

Out of all the modern systems level languages (D, Nim, Rust, Go), D is by far my favorite.

I recently came upon this post and couldn't agree more with it: https://www.quora.com/Which-language-has-the-brightest-futur...

dom96 · on Dec 29, 2015

I see that you mentioned Nim. Have you used it much? I would be interested to hear your thoughts on main features which makes D your favourite over Nim.

jeremiep · on Dec 29, 2015

I haven't looked at Nim in a while actually (it was still called Nimrod back then). But I want to say right away that Nim is a great language!

I prefer meta-programming in D as I don't get to think at the AST level; its practical in Lisp because the code is the AST, but everywhere else I like compile-time function evaluation better - it feels more natural. It's the same reason why I don't like template-Haskell.

D also has string mixins for the rare occasion when templates reach their limits. You get something akin to "eval" in javascript, but as a compile-time construct. Mixed with the other compile-time features of D (function evaluation, reading files, full reflection and more) it makes for the most powerful thing I know of after Lisp macros. I never have to write offline code analyzers and generators again!

I prefer interfacing with C in D because I don't get to mess with a FFI; I can copy/paste the C definitions in D, run a few Emacs macros and I'm good to go. (Although that's becoming a non-issue over time as more and more bindings are published!)

As far as I know (correct me if I'm wrong!) there's no way to declare pure functions in Nim. This is something I do all the time in D!

Nim has memory-safety only when using the GC while D has decoupled it from the allocation strategy. D's memory-safety is actually part of the type-system (as function annotations) and I absolutely love it!

That's about what comes to mind right now :)

Please note that these points merely makes D a better fit for my own use-cases, I'm also biased from knowing D a whole lot more than Nim, so take what I say with a grain of salt ;)

dom96 · on Dec 29, 2015

Interesting points. I personally consider Nim's metaprogramming to be the best there is, although I must admit that I have not tried D's yet. Let me try to give you some reasons why I love Nim (I'll try to touch on all the points you've made about D) :)

Nim includes templates (declarative metaprogramming), and macros (procedural metaprogramming). The latter gives you access to Nim's AST. Macros (and compile-time functions) are evaluated using Nim's VM. The VM supports the full language, with the exception of FFI, but you do get special compile-time functions for reading files and executing external processes at compile-time so it's already very powerful!

Nim compiles to C/C++ so interfacing with C (and C++) could not get easier. Have not tried writing a macro for my editor which converts a C definition into Nim, sounds like a fun thing to implement.

> As far as I know (correct me if I'm wrong!) there's no way to declare pure functions in Nim. This is something I do all the time in D!

By pure I assume you mean "side-effect free"? If so there is: http://nim-lang.org/docs/manual.html#pragmas-nosideeffect-pr...

> Nim has memory-safety only when using the GC while D has decoupled it from the allocation strategy. D's memory-safety is actually part of the type-system (as function annotations) and I absolutely love it!

Going to need to look into that. Sounds awesome!

In any case, you should definitely give Nim another chance. I can certainly say that I will do the same for D :) Thank you for taking the time to write this up!

jeremiep · on Dec 29, 2015

I wasn't aware of the distinction between templates and macros in Nim, it does indeed sound powerful.

I'm also glad to see Nim having purity, its incredibly hard to go back once you've tasted it!

The Emacs macros I mentioned aren't saved anywhere; the similarities in syntax between C and D makes it very straightforward to write dumb macros on-the-fly and forget them afterwards. For the most part D improves on C's syntax by fixing a lot of its shortcomings and the macros merely reflect that.

One example would be to convert "#define foo 1" into "foo = 1," to be put inside an enum declaration. Another would be to remove the DLL_EXPORT references (as D does not require them). The function definitions themselves barely change at all :)

One thing I'm also curious about are compilation times. My D program still compiles under 2-3 seconds even with recursive reflection of 100+ aggregate types, outputting meta-data and code for every single type and field (to handle serialization, generate on-screen editors, resolve dependency graphs and more). I haven't found any other systems language giving me this much power for this little compilation times.

This even includes transforming my regex expressions into D code during compilation so the regexes get the full power of the language's optimizer.

I'll definitely keep Nim in mind for my next pet project, right now I'm getting over 30k LoCs in my D project and really don't want to switch languages halfway through! :)

FraaJad · on Dec 28, 2015

I'm not the poster, but having tried my hands at both D and rust, I chose D over for my new project at $work. There is far less upfront mental gymnastic with D compared to Rust and that is a nice benefit when you are trying to get things done.

I continue to dabble in Rust, but D is so much more comfortable.

felixangell1024 · on Dec 28, 2015

Hey! Yeah D is a really nice language. I have been poking around in Rust, but I don't feel I'm good enough at it to write a blog series with the language.

groovy2shoes · on Dec 28, 2015

If you want to see what a lexical analyzer looks like in Rust, here's one that I wrote for Lua:

https://github.com/aswyk/oxidation/blob/master/oxidation/src...

Of course, Rust's own lexer is also written in Rust.

ETA: I can't promise my lexer is actually any good... it's really the first non-trivial thing I've written in Rust. But it works ;)

gnuvince · on Dec 28, 2015

Thanks for that link! I'm currently writing a compiler for a minuscule language[1] and it's great to be able to look at how other people tackle the same problem and learn.

[1] https://github.com/gnuvince/minilang-rs/

thinkpad20 · on Dec 28, 2015

One advantage I could see of Rust over D for language stuff is sum types (enums in Rust). So for example you can write:

    enum MyLanguage {
      Var(String),
      Int(i32),
      Sum(Box<MyLang>, Box<MyLang>),
      Let(String, Box<MyLang>, Box<MyLang>)
    }

(Apologies if I got a few things wrong; I just mean the general idea)

Seems like without a construct like this, you'd have to use subclasses or something similar, which (to me) isn't quite as nice.

jeremiep · on Dec 28, 2015

There's std.variant in D which provides type-safe sum types.

alias MyLanguage = VariantN!(<insert types here>);

p0nce · on Dec 29, 2015

Like others said, this can be done as a library type in D: http://p0nce.github.io/d-idioms/#Recursive-Sum-Type-with-mat...

edit: not that it proves anything, just that "it can eventually be done".

tomjakubowski · on Dec 29, 2015

Can you name the variants to distinguish two variants which "carry" the same types? It wouldn't be as useful if you can't express something like:

    enum ConnState {
        Disconnected,
        Connecting,
        Connected(net::TcpSocket),
        Transferring(net::TcpSocket),
     }

jeremiep · on Dec 29, 2015

I would probably do it like this in D (disclaimer: I haven't tested this code!)

  import std.typecons;
  import std.socket;
  import std.variant;

  struct Disconnected {}
  struct Connecting {}

  alias Connected = Typedef!(TcpSocket, null, "connected");
  alias Transferring = Typedef!(TcpSocket, null, "transferring");

  alias ConnState = Algebraic!(Disconnected, Connecting, Connected, Transferring);

You could use plain structs instead of the typedef template for the same result.

p0nce · on Dec 29, 2015

I see what you mean.

You can create such new types with std.typecons.Typedef https://dlang.org/phobos/std_typecons.html#.Typedef

Though it's less pretty.

felixangell1024 · on Dec 28, 2015

Yeah, tagged unions and the pattern matching make things so much nicer. I love a lot of things about Rust, it's got a lot of things right.

ksherlock · on Dec 28, 2015

Another option is to use a lexer/state machine generator. Ragel support D and Rust (as of the not-quite-released v7) as target languages.

nunull · on Dec 30, 2015

Nice read! I'd really like to know when we can expect the next post to be published.

felixangell1024 · on Jan 1, 2016

Thanks :) Second post is in the works, will probably be out next week when I have time around college :)