Hacker News new | past | comments | ask | show | jobs | submit login
Do you have a problem? Write a compiler (oleg.fi)
327 points by lelf on Oct 3, 2019 | hide | past | favorite | 139 comments



To people saying that writing a compiler is almost always an overkill - that's because you and author mean different things by the word "compiler". Or, in other words, that's because your intuition is based on the inflexible languages you're working with.

It's not an accident that the talk/article is about Clojure[0]. A Lisp. In Lisp, "writing a compiler" is a fancier way of saying "writing a possibly code-walking macro", which is still just a fancy way of saying "writing a function that transforms a tree into another tree, and having it run compile-time on parsed program source as the input". 99% of the usual compiler work - tokenizing, building an AST, generating machine/bytecode, optimizing - is handled for you by the host Lisp compiler. All the mechanics are there, and all you have to do is to add the brains you need.

--

[0] - Clojure itself actually makes this slightly more difficult than it looks, because of the decisions it made that makes the syntax slightly less obviously homoiconic.


There is a joke in Lisp / Lisp dialects that they don't write their code in Lisp, they write the language they want to write their code in in Lisp.

Then solve the problem in that language. It's often not true, but writing DSLs in a Lisp is technically much simpler than other languages I reckon, if your problem domain actually requires such a thing.


Lisp is not alone in that regard, to be honest I suspect it is even more true in FORTH. In FORTH you define words, which you then write your program in. These words very quickly become problem/domain-specific.


Speaking as someone who has written maybe 100 lines of code in both languages, thus someone who should not be taken too seriously, there seem to be a lot of similarities between Lisp and Forth.


Speaking as someone who has written maybe 10000 lines of code in both languages, and may be worth taking a little more seriously, the similarities are there but the macros in Forth are just somehow off.

There is extremely little written online about Forth macros. The best book I've found on the subject is Let Over Lambda, whose latter chapters talk about Forth macros and how it's possibly the best macro system second only to Lisp. And yet I don't know of anyone, outside possibly Chuck Moore in his "Glow" system for CAD for circuits, that can exploit layer upon layer of Forth macros.

The beauty of Lisp macros, really, when you really dive to the bottom of it, is not what it appears to be. It's not about being able to turn one thing into another, no matter how much flexibility it allows in doing so. Any language can do that. Javascript for one, just use strings and eval and you can create interesting macros. It's not about a domain-specific language: every fucking language lets you define words in it with which you'll do the work, and often a syntax too derived from it (operator overloading). No. And it's not about brevity: because of S-expressions, Lisp is really not that brief compared to other languages (like C) (when you are doing everything right).

The beauty of Lisp macros is simply and only that you can do all these things over and over again, layer upon layer, "ten or twenty layers of macros deep" (to paraphrase Paul Graham), with each layer making the next layer no more difficult to work with. Nothing else does that. Not even Forth.

And a note on homoiconicity: it appears standard Forth is homoiconic...maybe. Mostly. Not transparently so like Lisp (see above), where no matter how much you exploit homoiconicity your code doesn't degrade in structure, preventing you from doing so n+1 times [1]. No, in Forth, you get mostly homoiconicity, but there's caveats. However, F18 code, featured in the GA144 chip, is basically homoiconic both because it's forth with pointers and because it's assembly, and it does allow a lot of macro-like behavior, including an 18-bit self-replicating virus. It's nice. But it's still not Lisp.


+1 to this. I did this literally last week for a toy project and it was hilarious to be reminded of how easy it was.

I know I'm just rehashing the above comment, but: if your target language is complicated, and your host language is complicated, sure, it can be a pain. If they're both lisps, you can almost do it in your sleep.


Writing a compiler is still almost always overkill, because implementing the compiler is one of the smallest parts of creating an effective programming language.

Let's say you're starting a project, and you're choosing a language to write it in. You have two options:

Language A:

1. Has multiple mature compiler/interpreter implementations.

2. Has lots of tooling built around it.

3. Has extensive documentation.

4. Has a diverse community to hire programmers from.

5. Fits the problem domain okay.

Language B:

1. Has one buggy implementation that returns mangled stack traces on errors. See Kernighan's Lever[1].

2. Has no tooling.

3. The documentation is the compiler. So... no documentation.

4. The only programmer that knows it is the one who wrote the compiler.

5. Fits the problem domain as well as the programmer understood the problem domain when he started writing the compiler.

I know which language Dan McKinley[2] would choose. :)

You might say, "But my language uses the same homoiconic syntax of Lisp, so the tooling of Lisp carries over, and it doesn't require more documentation than just normal functions, and Lisp programmers can pick it up easily." To which I would respond, "Sounds like you took the long way to implementing a library." I'd level this criticism against a lot of Racket languages, which basically are just a shorthand for a bunch of require statements. I'd rather copy/paste those same require statements.

The fact that Lisps make creating a compiler so easy is actually a downside, because it leads people to write compilers without thinking through the full secondary effects of doing so. This is one of many heads of the Hydra that is The Lisp Curse[3].

EDIT: Decided to say more.

EDIT 2: Decided to say less.

EDIT 3: Always be editing.

[1] https://blogs.msdn.microsoft.com/alfredth/2009/01/08/are-you...

[2] https://mcfunley.com/choose-boring-technology

[3] http://www.winestockwebdesign.com/Essays/Lisp_Curse.html


You're still thinking of languages as these huge things that can be offered as products. It's not the way a Lisper thinks about it.

> To which I would respond, "Sounds like you took the long way to implementing a library."

And I'd respond, that library is the language. There's a spectrum of complexity of what you can call "a language". General-purpose programming languages like you seem to be considering are one end of that spectrum. The other end of that spectrum are the abstraction layers and APIs you're coding to. Take e.g. OpenGL API - you can view it as a set of functions, but once you consider the rules about calling them, you may as well call that a language.

So when you're designing a "DSL" in a Lisp, it's literally no different than designing any other module. You have to name things, you have to figure out how they should be used. Lisps just doesn't force you to shoehorn your API into built-in language syntax. It doesn't force you to write "open-file" and "close-file" because your language only has functions and no built-in "with" construct; it lets you add the "with" constructs with four lines of code, making the API cleaner.

Most DSLs sit somewhere between "API for a stack" and "API for OpenGL". They're just code - code that in some cases happens to run at compile time and operate on other code. But on the user level, on the API level, it's no different at all from using any other library. A documented DSL is no harder to use than a documented function library; a function library lacking documentation is no easier to use than similarly undocumented macro-based DSL.

Some people seem to have an irrational fear of "less mainstream" things, even though they're not fundamentally more difficult than the mainstream things. I've been working with Lisp professionally for a while now, and I've seen plenty of really complex and hairy DSLs and had to debug some when they broke. Fixing a broken macro or its use is not fundamentally different from debugging Java code of similar complexity, but when the DSL works, it can vastly improve readability of the code that uses it.


No, @kerkeslager has it exactly right.

1. Fixing broken generated code is fundamentally harder than debugging non-generated code, because the code generator doesn't appear in the call stack, and because a line of generated code doesn't necessarily even have any specific line number in the input program.

2. "Tooling" includes type systems that can detect and prevent bugs at compile time, and IDEs that can highlight the bug and often automatically fix the bug with a shortcut action.

You can write your own type system in Lisp, but testing a type system is uniquely harder than testing ordinary functions. Type systems prove facts about a program, and type-system bugs have to do with cases where the language can't prove the safety of something that's safe, or where the system wrongly proves that code is incorrect when it's actually correct, or vice versa.

Finding important truths that a proof system can't prove (or finding falsehoods that a proof system can generate) is much harder than ensuring that a well-scoped function generates valid outputs for its inputs.

3. Documenting languages (especially type systems, especially type system errors) is also much harder than documenting function parameters.

In a non-Lisp language, developer-users don't have to understand how the language (and especially the type system) is implemented, because the language devs keep the language small and well tested. (Yes, the Java compiler does have bugs sometimes, but odds are, any given bug is in your code, not the compiler.)

Keeping the language small allows a community to form around documenting. If you write your own DSL, there's no language community to support you.

And, sure, having a big community of people supporting each other and answering questions always a huge help, even just for libraries, so there is some pressure to stick with mainstream languages. But languages have come and go from the mainstream while Lisp has remained sidelined.

Lisp makes it easy to generate code, and that's the "Lisp curse," because writing your own mini language is usually a bad idea.


CLOS started as a portable library for Common Lisp. It was implemented as an embedded language extension and provides on the user level a set of macros: DEFCLASS, DEFGENERIC, DEFMETHOD, etc...

There is no reason such an embedded DSL can't have documentation, find widespread use etc. In the Lisp world, there are a bunch of examples like these - where DSLs are written for larger groups of uses and not just the original implementors.

One has to understand though, that not all attempts have the same quality and adoption. But that's fine in the Lisp world: one can do language experiments - if tools are designed for larger groups, one can just put more effort into them to improve the quality for non-implementors.


> CLOS started as a portable library for Common Lisp. It was implemented as an embedded language extension and provides on the user level a set of macros: DEFCLASS, DEFGENERIC, DEFMETHOD, etc...

Right, that shows that if a highly skilled group of people with community support put in a lot of effort, they can do something that is extremely difficult to do.

The Scheme community has not had such success. All three of the Scheme object systems I've experimented with suffer from all four of the problems I mentioned. And at least one of them was recent enough that they had the advantage of having CLOS to guide the way on how to do it right.

I'm not saying that DSLs are impossible to create successfully. I'm saying that many Lispers drastically underestimate the difficulty of doing so in proportion to the value.

Keep in mind that you have people in this thread claiming that "DSLs are trivial to implement in Lisp." And I'm saying, DSLs are trivial to implement counterproductively in Lisp, but implementing a high quality DSL is still very hard.


> Right, that shows that if a highly skilled group of people with community support put in a lot of effort, they can do something that is extremely difficult to do.

I don't think it's 'extremely' difficult to do. Object Systems in Lisp were for some time experimented in dozens in various qualities.

CLOS OTOH is on the 'extremely difficult' side, since it has a lot of features and even may provide its own meta-object protocol. But even then it is possible to leverage a lot of Lisp features (like a well-documented code generation/transformation system) and thus reduce some implementation complexity.

When CLOS was designed, several well-documented OOP extensions already existed (and were used): Flavors, New Flavors, LOOPS, Common LOOPS, Object Lisp, CommonObjects, a bunch of frame languages like FRL, KEE, ...

> DSLs are trivial to implement in Lisp

Some are, some are not. There is a wide range of approaches. This starts relatively simple for some 'embedded DSLs' and gets more difficult for non-embedded DSLs.

There is a decades long practice developing languages in Lisp (since the 1960s), especially embedded DSLs and thus there should be a wide range of them, including a lot of well documented.

Many Lispers know that this CAN be a lot of work, given that many complex DSLs have been developed, many which have seen more than one person year (or even dozens) to develop and some which have been maintained over a decade or more. In many projects one has also seen the limits of Lisp (speed, debugging, delivery, etc.) for this.


> There is no reason such an embedded DSL can't have documentation, find widespread use etc.

Yes, actually, there is. It turns out that type systems are actually hard. They're hard to understand even when the documentation is as good as it can possibly be. (Look at all the confusion around Rust's borrow checker. Rust's borrow-checker documentation is world class; this is just a challenging area to understand.)

Have you ever had that feeling where someone explained a mathematical proof to you and you just couldn't understand it, even though it was presented as clearly as it possibly could? You had to stop and think about it, play with it, and then you think you get it, but then you realize you didn't get it, and then, finally, you get it, and you can't even really understand how you misunderstood it in the first place.

Type systems are literally proof tools. Designing your own type system and documenting it is harder than solving your actual problem.

This yak hair is made of steel fiber. Don't shave it.


> Have you ever had that feeling where someone explained a mathematical proof to you and you just couldn't understand it, even though it was presented as clearly as it possibly could? You had to stop and think about it, play with it, and then you think you get it, but then you realize you didn't get it, and then, finally, you get it, and you can't even really understand how you misunderstood it in the first place.

I thought this is what programming of anything other than trivial, repetitive web CRUD looks like?

I mean, seriously, you have to stop and think sometimes. I'd say, fairly often. And you and 'kerkeslager keep bringing up type systems for some reason, as if this is something one would reasonably want to write in a project that's not strictly a type research project. CL already has a type system that's adequate for most tasks, it's easy to add and document new types. It's not Haskell, but then again Lisps aren't what you want to pick up if you need a proper type system.

Designing your own DSL is not about designing a type system, except for the most trivial (or convoluted) meanings of "designing".


> You're still thinking of languages as these huge things that can be offered as products.

I do wonder if "Data Specific Language" would be a better acronym, successful DSL's are nearly always confined to processing very narrow sets of data.


> You're still thinking of languages as these huge things that can be offered as products.

No, I'm really really not.

You're assuming my criticisms are because I don't understand how DSLs work in Lisp, but I would request that you not make that assumption.

> And I'd respond, that library is the language. There's a spectrum of complexity of what you can call "a language". General-purpose programming languages like you seem to be considering are one end of that spectrum. The other end of that spectrum are the abstraction layers and APIs you're coding to. Take e.g. OpenGL API - you can view it as a set of functions, but once you consider the rules about calling them, you may as well call that a language.

You might as well not call it a language, though. I understand what you're saying about the spectrum, and I do agree that a library of functions is a DSL, but a library of functions is a special case of a DSL that doesn't have any of the usual downsides of a DSL. You can't really use a library of functions to justify the complexity of other DSLs.

A library of macros isn't a library of functions, and a library of macros has all the downsides I talked about. And if you're talking about DSLs as being part of what makes Lisp special, then you're not talking about libraries of functions, because almost any language can do that.

Let's not get lost in a semantic argument here: when we're talking about DSLs, libraries of functions aren't the central example of a DSL that we're talking about.

> So when you're designing a "DSL" in a Lisp, it's literally no different than designing any other module. You have to name things, you have to figure out how they should be used. Lisps just doesn't force you to shoehorn your API into built-in language syntax. It doesn't force you to write "open-file" and "close-file" because your language only has functions and no built-in "with" construct; it lets you add the "with" constructs with four lines of code, making the API cleaner.

Not being forced to shoehorn your API into the built-in language syntax is exactly the problem I'm talking about.

Either you shoehorn your API into the built-in language syntax, or your users have to learn a new syntax, which mangles your stack traces through the macro expansions, breaks your tooling, isn't documented, and is only understood by you.

Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax. It's been a while since I used it, but I remember this clearly.

> A documented DSL is no harder to use than a documented function library; a function library lacking documentation is no easier to use than similarly undocumented macro-based DSL.

That is very much your opinion, and not one that is borne out by my experience.

> Some people seem to have an irrational fear of "less mainstream" things, even though they're not fundamentally more difficult than the mainstream things.

Again, please don't make assumptions about me. I'm not avoiding macros because they're less mainstream, I'm avoiding them because I've experienced a lot of pain from them.

> Fixing a broken macro or its use is not fundamentally different from debugging Java code of similar complexity, but when the DSL works, it can vastly improve readability of the code that uses it.

It's true that fixing a broken macro is not fundamentally different from debugging Java code of similar complexity. You can write bad code in any language. But if we're trying to write good code, limiting the complexity is a huge priority, and you can't deny that introducing macros introduces complexity. Comparing to Java misses the point: we can look at Lisp with macro-based DSLs versus Lisp without macro-based DSLs, and see the benefits there.

Just to be clear: my criticisms here aren't of Lisp as a whole. I've been working a lot in Racket lately and there are a lot of things I like about it--it's batteries-included in the way Python used to be. I really hope Racket takes off more.


"isn't documented"

This applies equally to libraries or DSL. The developer can document them clearly and extensively, or not at all.

"Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax."

As a macro, most likely?

"and you can't deny that introducing macros introduces complexity"

It can also eliminate a massive amount of incidental complexity, by eliminating lots of repetitive, boiler plate, copy and pasted code, where bugs will inevitably pop up as an error is fixed in one place but not another, or a subtle typo, a programmer forgets to clean up a resource...etc. etc.

"I really hope Racket takes off more."

Does Racket use macros less aggressively or egregiously than other Lisps? (Serious question, I honestly don't know.)


I like to think of it as concentrating complexity. This allows a programmer to focus intensely on a small area of the code with the benefit of easing the development process of a much greater portion of the code base.


> This applies equally to libraries or DSL. The developer can document them clearly and extensively, or not at all.

This is true, but functions usually don't require as much documentation. With a function, I usually know when the arguments will be evaluated and how, for example: this isn't an assumption I can make with a macro. If you don't change syntax you don't need to document changes in syntax.

> "Incidentally, Lisp (Common Lisp) definitely implements a with-like syntax."

> As a macro, most likely?

Possibly, but a mature Common Lisp implementation has likely taken care to preserve stack traces and make sane decisions about how to expand, is probably thoroughly documented, etc. A macro implemented by your average programmers has little in common with this, and if you're doing all this work, the idea that DSLs are trivial to implement in Lisp begins to erode.

> It can also eliminate a massive amount of incidental complexity, by eliminating lots of repetitive, boiler plate, copy and pasted code, where bugs will inevitably pop up as an error is fixed in one place but not another, or a subtle typo, a programmer forgets to clean up a resource...etc. etc.

That's all true, and I'm not completely against their use. DSLs can be an effective tool. But I think that tool is a double-edged sword which needs to be used much more judiciously than the glib, "Do you have a problem? Write a compiler!".

> Does Racket use macros less aggressively or egregiously than other Lisps?

I don't think so. It's really the programmers, not the language, that uses macros, but I would say that the culture around Racket has a big focus on creating DSLs, so if anything Racket programmers use macros more aggressively and more egregiously. :)

But there are other upsides. Racket is the only language nowadays with a decent, truly cross-platform desktop UI library that ships with the language that I know of. Python and Java were once contenders, but support for Java has dropped quite a bit, and Tk isn't even used by many Python programmers any more (Qt is just so much better--but it doesn't ship with the language).


> 3. The documentation is the compiler. So... no documentation.

I don't understand this one. Shouldn't small compilers for small languages be excellent documentation? I can't really ground the point or my response, but it seems that clever tooling should be able to analyze the compiler to create good documentation. Information could include type annotation, a timeline of the creation of features of the DSL or any other number of things that could be kept during the creation of the compiler.

> Sounds like you took the long way to implementing a library

DSLs aren't Libraries. When would you want one over the other? The Racket languages I've seen have impressed me with their simplicity and focused scope, I would never think the same of C with includes or js with requires. The point is to be writing a file which is perfectly scoped to the issue, right? I imagine it would suck to "write" html if it were a library in C, rather than a file format with all the benefits that comes from being its own thing.


> Sounds like you took the long way to implementing a library.

You are right that DSLs should be about as natural as libraries. Linq in C#, regex, and SQL are good examples of DSLs you probably rely on regularly.

I suggest looking at the symbolic differentiator and circuit simulator in SICP. It is composed of just a few functions and is far cleaner than a library of functions on a custom data structure.


> Linq in C#, regex, and SQL are good examples of DSLs you probably rely on regularly.

Right, but with the exception of Linq, those all have addressed all the problems I brought up:

1. Has multiple mature compiler/interpreter implementations.

2. Has lots of tooling built around it.

3. Has extensive documentation.

4. Has a diverse community to hire programmers from.

They were able to address these issues because the DSLs solved very common problems that a lot of people had, and even with that incentive, it took a lot of effort from a lot of people to get there. A new DSL you implement to solve your specific issue is not going to trivially address these issues.

And Linq is an "exception that proves the rule". Linq has not taken off quite as well as the initial hype would have indicated. I'm not sure where it is now, because I haven't worked on .NET in over 4 years, but I remember times when it was unclear whether Linq would be supported in future Microsoft products. It's very difficult even for an organization with Microsoft's clout to create a DSL which is successful.

> I suggest looking at the symbolic differentiator and circuit simulator in SICP. It is composed of just a few functions and is far cleaner than a library of functions on a custom data structure.

I suggest looking at the 4 problems mentioned in my posts, and noting that all of them still apply to the SICP example(s) of DSLs.


> 4 problems mentioned in my posts

But we are comparing a small library of functions with a DSL of similar complexity.

> You might say: But my language uses the same homoiconic syntax of Lisp, so the tooling of Lisp carries over, and it doesn't require more documentation than just normal functions, and Lisp programmers can pick it up easily.

Doesn't that address them? Why are those 4 issues more applicable to a Lisp DSL than a library?

To learn a Lisp DSL, I read the docs for the relevant ideas, and look at a few examples. What about that process is different than a library, except that it tends to be natural and flexible for certain problems? The SICP examples would be a major pain to implement as a typical library.


"Sounds like you took the long way to implementing a library."

And of course in Lisp, writing a compiler is just as easy as writing a library.


...which doesn't address any of what I said.


Well, I don't typically use languages like that or trees per se, but usually when I'm trying to do something I don't know how to do, I define a data structure that expresses what I want to do as naturally and succinctly as possible, and then try to write code to transform it into the actions I want.

And then I get bogged down in edge cases, and one day I realized, hey, I really should, as a matter of course, use tricky coding as feedback that tells me I should tweak my data structure(s) to make them simpler to parse, and eliminate bugs in advance by reducing my cognitive load.


Oleg has worked in other non-Lisp languages, notably members of the ML family. I know that you want to be all pro-Lisp and such, but it's just another family of languages, really; there's nothing magical or special about them. All compilers use an anamorphic parser, a hylomorphic reducer, a paramorphic optimizer, and a katamorphic instruction selector.

It's not about what the language provides, but about what the language doesn't require. Lisps can be low-ceremony, but they're not the only ones. Trees might be universal data structures, but they're not the only ones.


I'm not saying Lisps are unique in this regard, but I am saying that complaints that I see in today's HN discussion most likely stem from lack of exposure to that kind of programming. In the mainstream world, "language" and "compiler" are products. In Lisp world, they're no different than functions and classes.

Out of language families providing this level of capability, Lisps are probably closest to mainstream, and that's because of Clojure.


In the mainstream world, we study a single category of types and logic at a time. There is an ∞-category Tomb whose objects are languages and arrows are compilers; we don't talk about it much. Lisp is certainly not somehow an arena where we transcend computing concepts; it's a lightweight syntax for untyped lambda calculus, at best. A single topos, not the entire multiverse of topoi.

I could restate your entire argument for ANS FORTH and I'm not sure whether it would lose anything. That should clue you into the idea that your argument is really about the age of the language family, and the corresponding maturity of philosophy and reasoning around members of the family.

The only thing that Lisp makes easy is the parser, by dint of reusing the host Lisp's s-expression reader. The most trivial and surface-level part of a compiler, really.


"In Lisp world, they're no different than functions and classes."

Haskell may be similar in this regard, due to the popularity of parser combinators?

(Or were years ago when I last looked, don't know if they are still popular or widely used tool in that community.)


Are you confusing Oleg Grenrus (OPA) and Oleg Kiselyov (http://okmij.org/ftp/)?


Wow. There's more than one of them. Today, I learned something. Thank you.


Hmm, does it have zygohistomorphic prepromorphisms too?


> It's not an accident that the talk/article is about Clojure

On the other hand, I'm surprised that the optimizations in question don't happen automatically for Clojure programs. Surely the underlying JVM is capable of constant propagation and folding? Maybe it's the let-from-let optimization that actually gets the 10% speedup.


I probably don't say it in the talk very explicitly (or do I?): the target architecture is web = JavaScript.


Oh, you do say it in the article: "However, for the maximum reach, we want our game to run in the web browsers." I missed that, thanks.

But even so, JavaScript compilers aren't stupid. No need to do this for me of course, but if you got around to measuring the performance impact of the two optimizations separately, I'd be interested in that. (And the ClojureScript developers might be interested too.)


Years ago I was at a trading firm where I needed to write a test suite for our order entry system (the software that sends orders to a stock exchange). Tests could become quite large because I had to track the sequence of acknowledgments and executions from the test exchange.

The tests would have been a real pain to write in C++, so I created my own language! It was inspired by both Expect and Cucumber.

https://en.wikipedia.org/wiki/Expect

https://cucumber.io/docs

That allowed me to write a simple script of expected behavior, like:

    test "Cross own order"
    -> new buy IBM 100 $141.01 1
      <- ack 1
    -> new sell IBM 200 $141.00 2
      <- ack 2
      <- trade 1 100 $141.01
      <- trade 2 100 $141.01
My interpreter would run the above script against the exchange's UAT environment.

The tests could get pretty sophisticated. I could investigate swanky exchange-specific order types by verifying queue position, liquidity flags, etc. By writing these test scripts, I could verify (1) that my order entry code was correct and (2) that I actually understood what the exchanges were doing with my orders.

These test scripts had another added benefit in that traders now had executable documentation. So whenever a hypothetical scenario came up, I could write a script to verify that our assumptions were correct and then email the script's code as a precise set of steps that the exchange was taking.

I ended up writing the whole language in Boost.Spirit, which admittedly was a beast. But I still believe that creating a testing language was the right thing to do.


Okay but why not:

  test("Cross own order");
  send(newBuy("IBM", 100, 141.01, 1));
  expect(ack(1));
  send(newSell("IBM", 200, 141.00, 2));
  expect(ack(2));
  expect(trade(1, 100, 141.01));
  expect(trade(2, 100, 141.01));
Because then you'd just need to write a few helper functions in C++, not an entire interpreter.


One reason, from our similar experience is that showing non-Technical business people (in some environments) text with brackets, semicolons and the general code annotations you show here effectively “scares” them.

This isn’t always true. But it tends to be institutional when it happens. In some places people learn deeply that code is the realm of the tech teams and they should not engage with it.

Getting them to engage with the executable tests can be fantastically useful but it can also be a delicate sales game, where simple things like using a DSL can be key


That's assuming the "non-tech businness people" are actually going to spend time writing and reading such scripts.

I've never seen a situation where that ended up happening (because they're going to write the script wrong, they're going to need help debugging, etc...)

Their time is probably better spend doing something else. So they will write english in mails and powerpoint, and a tech will write code in a programming language.

The closest thing I can imagine to having biz people "write scripts", is if it's done in a GUI with a lot of assistance and a quick visual feedback loop (think Excel macros.)

Maybe writing the GUI is a better investment than writing a compiler.


The parent comment doesn't say the non-technical people wrote the scripts, but that they were shown the scripts.

Think of it this way: Writing iambic pentameter is hard. Without practice, you're almost certainly going to get the meter wrong at some point. However, reading it is easy, because it's just English.

The same can be true for scripts. You don't need to be able to have business people write them, but it's nice (and in some domains critical) to be able to have them read and validate them.


"So they will write english in mails and powerpoint, and a tech will write code in a programming language."

This reminds me of the old movies where the Executive calls in the secretary to transcribe his words.

"Miss Parker! Take this down!"


Something i've done in the past is write multiple backends for an internal DSL like this. One runs the tests, another one prints out the steps as nice-looking HTML to show to The Business. It means they can't read and edit the source code directly, but in practice, this doesn't matter. We even had one that produced Excel files that could be handed over QA as manual test scripts, so they could reproduce what we were doing.

An old boss once pulled off a hack where code drove a browser test, but you could run it step by step, with the element being inspected or interacted with being highlighted with a red border, and a separate window explaining what was happening in natural language. Non-technical people could follow along with what the test was doing. It was an amazingly effective way of explaining what automated testing was all about.


So show the laymen cute text without Scary Symbols, no? Heck, these days my tests with certain frameworks are automatically inundated with unsolicited fire emojis and green check marks. I'd imagine that isn't particularly difficult with C++, right?


They already wrote

> The tests would have been a real pain to write in C++

so i guess what the script piece above shows wasn't the entire deal. When you have a custom language you have full control over its syntax you can make it do all sorts of things that are hard to do in something like C++.

As an example you can build a tree of key-value nodes using C++ calls just fine, but writing a JSON file is usually much simpler and concise (especially when deep nesting comes into place). And a custom syntax could be even simpler than JSON.


You just described functions.


That just compiles like normal code which is obviously a win in many scenarios but there can be a few downsides. The first is that it can quickly devolve into template/generic hierarchical madness (assuming the real example was a bit more complex then the example), this is more complex to maintain than a DSL parser. The second is users will get crazy meta programming error messages, especially in c++ where you can get some multi page ones. You're probably unlikely to get non-tech people writing these tests but you will get people that want be able to untangle that mess, like QA and interns. There are a few other little things too, like the original including the '$' sign making it clearer what is a monetary value.

Personally nearly every time I write something resembling a DSL it's not for production code though and tends to be written in shell script or awk and will only translate to something like sql.


They could also overload operators for terseness.


The Scala programmer has logged on.


I believe it's called Domain-Specific Language (DSL) and there are systems that excel at quickly, easily and reliably creating these languages, for example Groovy. You could write a language like in your example, and it would be usable and stable in a couple of hours. Then you could script all kinds programs in it quickly, while also being able to leverage the rest of the flexibility of the Groovy language, interchaning its code with DSL statements.


Jetbrains have a new tool specifically for this purpose - https://www.jetbrains.com/mps/

I haven't played with it yet, it looks interesting!


MPS is hardly new, but it’s certainly an interesting tool if you are targeting the JVM.


Interestingly we’re doing something very similar, but using cucumber syntax. So the tests are much more chatty

In our situation we probably have more concepts (middle office) but the interactions we’re modelling are simpler.

Going down this route is quite interesting because initially we got a lot of scepticism from other groups, but over time the solution has evolved with more and more documentation / traceability, and they ended up pretty complete documentation.

Every time someone decides that the development teams have life too easy and try to impose specific extra paperwork or documentation burdens on the day to day work, giving them that test pack seems to make them go quiet :)


Why not using expect directly ? For one of my most complex setup, I have a perl module (using Expect) that is used by all my test scenarios (perl scripts).


Excellent. Your solution is a common pattern after all. It's creating a DSL domain specific language and then obviously you need to write some kind of interpreter for it.

For all it matters, I found myself in similar situations but what I decided to do is to use an existing language (in my case it was JavaScript but nothing to do with the Javascript you run on node or on the web)


Vaguely related, I came across this talk the other day, and it is interesting to see this sort of thing come up twice in short succession for me. The talk is about building an exchange from Jane Street.

Leaving it here in case anyone is interested.

https://www.youtube.com/watch?v=b1e4t2k2KJY


I am curious. Was there any particular feature that your language had, which was lacking in C++, that helped in simplifying the test script?


Also not OP but: Having a limited language that looks more like data than code provides great opportunities to analyze it in ways that you can't analyze Turing-complete languages (halting problem and all).

An example: A friend works for a game translation studio and one of their games has a tool that read the game scripts and is able to detect missing/unused external resources, overlapping dialogue, unmatched mouth animation/voiceovers, etc... without even running the game.


Not OP, but looking at his example, conciseness.

DSLs are feared by many, but actually when you're doing the same thing over and over again and it's hugely verbose, they're great!


In essence as you gain expressive power you lose analytical/introspective/reasoning power. Being more powerful on a concrete level costs power on the meta-level. There is a great talk about this called “Constraints Liberate, Liberties Constrain” from Rúnar Bjarnason.


These are things you typically want to do run time, and not require a compile cycle to change.

There's also the risk of breaking changes to the scripts, if writing in C++ exposes too much of the project's object model.


Almost every DSL's capabilities are expressible in almost any language. The value is in the expression, not in what it expresses (usually).

I can describe (off-the-cuff example) a regular expression using any programming language's basic constructs:

  Regex r = new Character('a').then(new Repetition(new Character('b'));
Or I can write that as:

  ab*
Which is clearer? You can probably imagine a simpler example in most languages, but you still have to design and construct an API, and then train people to that API. The DSL offers (in theory, not all do) a more natural representation of the problem, which can then be converted to an internal structure or directly executed.

That more natural representation is what brings you most of the value. Another aspect of this, and this is where some have argued benefits in more algebraic-notation languages (the MLs) over others: You can often write DSLs that allow reasoning to be clearer.

Back to that regex example, you can view regular expressions as an algebra. You can manipulate them like algebraic expressions. You cannot manipulate that first example algebraically, so if you decide you want to change it in some way or conduct some analysis on it, those activities are much harder.

In the first API example, how would I alter it to indicate that a should be repeated 0 or more times? I have to insert a new new Repetition(...) element around the first Character element. It's tedious and error prone. For the regex, I add an asterisk after the a. It's clearer, more concise, more amenable to reflection (by the author and others), and more maintainable in the end.

Additionally, if I want, I can change the implementation of that DSL to reflect new capabilities or new backends. With the API version, that's harder to accomplish.

=========================================================

At work, we also have test DSLs. One of the biggest benefits besides the more natural expression of test cases, was that it was reimplemented and all the old tests continued to run without modification. From an old Fortran implementation to a more modern C# implementation. The ability to migrate 20+ years worth of test cases was a great boon. Every one of them could've been done in Fortran directly before, or C# now. But in 10-20 years we may have motivation to reimplement again. The one reimplementation was because: Original dev retired, poor documentation of code base, inability to port to modern operating systems due to graphics API used and code design, inability to extend to support newer systems being tested. I believe the C# implementation is better in all those ways, but limitations may be discovered in the future that require another rewrite (I do know it was successfully handed off to new developers, so it is more comprehensible and maintainable, if nothing else).


Me and some co-workers did a mini-language a long time ago too, but it was for a BPM workflow software that was so in vogue ten years ago [1]. It only had a super buggy graphical interface that customers hated. Some of the largest customers had workflows with more than 100 steps/conditions, and we were losing money because we needed to have a full-time coder on-site for each customer just to program the workflows.

The database structure however was quite good, and I noticed that it was being deleted and recreated each time the workflow was edited.

Adding a text mode was just a matter of parsing individual lines and persisting them on the database. No parser, just regex matching. Even an extra blank space was a syntax error. Something like that:

    q1 = Question("Submit Request?")
    a1 = Answer("Submit")
    a2 = Answer("Reject")
    Connect(q1, a1)
    Connect(a1, a2)
    c1 = Cancel("Cancel")
    f2 = Finish("Finish")
    Connect(a1, f1)
    Connect(a2, c1)
It was implemented as an easter egg. Using text mode disabled the graphical part, since we lost the positions of each box. But nobody ever complained about that, of course.

[1] Looked similar to this: https://archibus.ro/wp-content/uploads/2017/07/GWE.png


Many years ago I was working on a complex screen scraper (the good kind) app and every site we scraped was ~1000 lines of error prone code, plus all the usual problems of sites changing randomly under us and the need to respond quickly. They wanted to scale up to scrape dozens of sites and we weren't given the resources so I came up with a little xpath like language "//content-area/table/tbody/@foreach/tr/@get-value" or something like that, after that the largest scraper was a dozen lines of code.

I think it came down on the right side of reducing complexity rather than creating more but I was let go shortly after for the heinous crime of not being at my desk at 9am.


This is a Most Righteous Answer, kissing cousin to The Correct Answer.

Repeating myself, sorry:

I've done a lot of electronic medical records. At the time, SOAP, WSDL, HL7 3.x (XML format) reigned.

Techniques like yours is how our two person team was able to run circles around our much larger partners. In other words, while they were stuck trying to update and compile the XSDs, we just treated inbound data as screen scrapping.

Sharing a tip of my own: while xpath expressions are mostly great, we migrated to globbing expressions. The wildcarding allowed our scrappers to be a lot more flexible, robust in regards to all the chaotic mutant data we were receiving.

https://en.wikipedia.org/wiki/Glob_%28programming%29

PS- IMHO, The Correct Answer is to have path expression intrinsics, built right into the language. What LINQ should have been. Imagine if Java had regex intrinsics like Perl, instead of that weird library. Some idea.


So how did you use globbing on XML? Was there an existing library for it, or did you write your own?


Rolled my own that I'd copypasta. I always ended up with my own graph model (eg HL7, XML DOM, parse trees) and add query methods to the node objects. I really dislike having separate query objects, preferring a tighter fit.

If you need a globber, maybe you can extract it from this library. https://github.com/EsotericSoftware/wildcard It's nicer than mine, but I regarded mine as easier to debug.


> Implementing small (domain specific) languages is fun.

This, from the conclusion, is something I totally agree. However fun isn't all there is to production systems. Writing so many little languages just make the program overall more complicated with increased cognitive burdens. Writing new languages must generally be carefully weighed against the alternative of picking one single language with good abstraction power for almost everything.


C++ or JavaScript is a "language", but so is the API to that library you're currently using in your project. Different ends of the language spectrum. The article is about Clojure, a Lisp, and Lisps are good at giving you access to the entire spectrum. When you hear a Lisper talking about "writing a language", half of the time it's just couple dozen lines of code defining a macro that makes some code much more readable and expressive. Then there are times when "a language" is an extensible pattern-matching engine or otherwise half of Prolog, but those cases are usually handled by someone who has a real need for it and later publishes it as a library - and you're only happy you can now use half of Prolog without actually adding a Prolog to your codebase, with all the devops headaches this would create.


That's what I meant by "language with good abstraction power." Lisps with macros, or Haskell with its incredibly flexible do-notation are examples of languages that allow you to reuse the host language parser and compiler. It takes a lot of fun out of actually developing a language, but nevertheless more practical.


Some languages have constructs that make it possible to make pseudo-DSLs in them. For instance function with receivers in Kotlin. That's a nice tradeoff I think.


Often called internal DSLs.


I once took over a project from an engineer who'd been let go, and after I saw the mess of a DSL he'd made and used in 100% of his contributions to the project I understood why. We called it our infernal DSL.


Fortunately, in the near future, all programming will be done in YAML. Apart from OPA obviously. /s


I had a problem with Telecom C back in 1982, so I decided to write a C compiler. Still working on it.


I had a problem I wrote a compiler Now I have 2 problems at line 33 and 67.



Made me exhale out my nose:

"Whenever I gave even a moment's thought to whether I needed to learn compilers, I'd think: I would need to know how compilers work in one of two scenarios. The first scenario is that I go work at Microsoft and somehow wind up in the Visual C++ group. Then I'd need to know how compilers work. The second scenario is that the urge suddenly comes upon me to grow a long beard and stop showering and make a pilgrimage to MIT where I beg Richard Stallman to let me live in a cot in some hallway and work on GCC with him like some sort of Jesuit vagabond."

"Both scenarios seemed pretty unlikely to me at the time, although if push came to shove, a cot and beard didn't seem all that bad compared to working at Microsoft."


Holy fuck this is hilarious.


"Imagine you are writing a cool new rogue-like game. So cool many have no idea what's going on."

...and then you nerd snipe yourself by writing a compiler in order to ensure the fair randomness none of your end users perceived to be off in the first place.

The game still isn‘t fun.


> The game still isn‘t fun.

Joke is on you; writing it was the fun part. Roguelikes are meant to be hard. Fun really isn't the point


Want to hammer a nail into a wall? Build a robot to do it! If you have lots of spare time, of course...


I've actually come to the conclusion over the years that automation is a form of optimisation, and that all of the rules of optimisation apply in full force.

Do you need to process a list of 10,000 things, once a day, forever? Then you do need to automate.

Do you need to process a list of 100 things, once? Just roll up your sleeves and start typing (or get someone else to do it). The chances that you could come up with something remotely reusable in less than the two minutes it'd take to do manually are slim to none, yet I have often seen people spend a couple of hours coding up a script to do a 15 minute non-recurring job.


Sometimes it helps to automate small things anyway, as practice. I've noticed that as I become more practiced with automation, even the small things take less time (and are less prone to error) than doing things manually.

It's also possible to do partial automation, where you just automate what's easy and finish up manually.


On the flip-side, people who rarely automate things, become accustomed to manually doing things and increasingly miss opportunities to automate/script their workflow...


I get your point, but a benefit of coding the script is it's frequently easier to verify the script is correct then to verify that the repeated operation you did by hand on the large list was done correctly for every entry.


100% agree. Also, if you need to add additional steps to your processing or make small changes to what you're doing then processing the 100 items again will be pretty fast compared to if you had done it manually.


Macros, regex, awk are the tools to quickly automate text processing. It literally takes less than two minutes in most cases once you’re proficient.


Honestly the amount of time I've saved just by throwing stuff into a text editor with multi-cursor support is kinda unbelievable.


Yes, this is the right thing to do very often, especially for one time tasks.

It’s not the thing you should do if you need to do it repeatedly, like once a day or even once a month.

This all depends on a lot of circumstances, and its hard to calculate time for implementing a script vs doing it by hand for a future you don’t know yet.


I often do a similar thing using emacs macros (I assume most good editors have a similar feature)


Step 1 - Let me try using command line magic to transform this data Step 2 - ?? Step 3 - Create a scratch file in IDEA and some cursor + regex magic and I am done!


And those regexes are the first part of an awk script or something similar which will save time when you want to do this repeatedly.

And when your awk script grows into something that doesn’t produce the final output, but has to be run in another interpreter first in order to produce the final result, then you created your custom language.


Oh sure, if it's absolutely trivial then bash it (hah) through some shell scripts or even just do it on the command line. I'm talking more about things that are slightly more complicated and will need a bit more effort than find & replace.


I was consulting with a customer, and we hit a point where they needed to check a change into source-control. They were using a VCS that I'd not used before, so I asked how to do it. The answer was "Oh, let me get Tim, he's the one who knows how to check stuff in." It took 5 minutes or so to find Tim, and then a few minutes of him moving through the "interestingly designed" GUI of their VCS he was able to get it checked in, and then another couple of minutes he was able to enter a description into their change-tracking system to pend for code review (required to merge into their mainline branch).

I overnighted a book from amazon on the VCS and process management tools they were using, read it the following night, and the next day took about 15 minutes to talk to Tim and then another 30 minutes writing a perl script that prompted for the 4 bits of information they needed and automated the check-in and change-review request. Total billable time spent on this 45 minutes. Time saved by having this script around the 2 weeks I was with the customer at least 2 hours (10 minutes of my time were wasted every time a change needed to be submitted previously!).

Every time I visited that customer in the future, the group was still using that script I wrote. Also every time the manager wanted me to assure him I wouldn't waste time by writing a script like that because that's not what they were paying me for.


> Also every time the manager wanted me to assure him I wouldn't waste time by writing a script like that because that's not what they were paying me for.

It eternally baffles me (although no longer, sadly, surprises me) how often this happens. Clear evidence of a simple change which dramatically improves efficiency and therefore profits, and someone complains and tries to prevent it ever happening again because "that wasn't what I specifically wanted to do right then."


> Total billable time spent on this 45 minutes.

Should've added the time spent on book to billable time (though I guess your manager would approve of that even less than they approve of scripts, given the description). In a few similar situations I've managed to get approval of my manager to bill the time I spent on such emergency learning, because let's be honest, I'm not speed-reading a book on an obscure piece of tech for fun.


Or just wait till we have a distributed problem recognizer or single-conscience general AI: solved and automated once, replicated and reused forever.


As usual Randall Munroe has the answer: https://xkcd.com/1205/


"Tell me, Gaston, this machine, how do you attach it to the wall?"

https://twitter.com/joel_kienast/status/859412383650938880


Might still be useful if you're placing enough nails in the right arrangement, sort of like railroad track laying machines.


this is brilliant


Want to hammer _a_ nail on your wall? Do it yourself. Want to hammer nails on your wall all day everyday? Build a robot to do it, and do something else


If you build a robot you'll be better at building robots. Personal growth vs product.


While reading this blog post I feel that my programming knowledge is an order of magnitude below. It's a bit depressing.


Writing compilers for languages that look like Lisp in Lisp is so effortless that most don't bother to go further. I feel that's often a missed opportunity. Hook in a parser [0] and the sky is the limit.

[0] https://github.com/codr7/lila


I feel like this and the motivation for golang are vaguely similar. Hear me out. The intention of golang in its language is to make complex things hard to do and make it easy to understand the language in any piece of code because there isn't much syntax. But sometimes you want to code more expressively and having some nice syntax would be nice. Its not a big deal if its local and limited to what this exact problem is. That's tough to get from a general purpose language.


I thought the intention of golang was to avoid C++'s super slow compile times.


Do you have a problem? write a compiler....and now you have two problems!


Fun to implement, sucky to maintain! Like event-sourcing.


Why would event sourcing be sucky to maintain?

My last job was to build a system to replace something that used to require lots of maintenance. Event sourcing made most of this maintenance trivial, and the remaining cases were made easier to explain and handle.

Event sourcing things doesn't mean you need to use a complex frp framework, a simple pg table and a state machine in your language of choice is usually enough.


In this case the maintainer is the builder, that is always comfortable.


I have recently moved to another team, so we shall see.

However, we also administered two crud dbs, one we inherited and another recent one. The three dbs were doing pretty much the same thing. Understanding and maintaining what happened in the crud db we built was significantly harder than understanding and maintaining the event-sourced one.


> Fun to implement, sucky to maintain! Like event-sourcing.

This comment begs for a blog post or some links that illuminate what you mean.


Here the link to video: https://youtu.be/kOXfdZRD0wM


> JavaScript numbers are a mess. Next a bit of arithmetics.

They aren't a mess. Every programming platform has limits on number sizes. In JavaScript numbers are floating point values, but you can get a guarantee of integer precision within the scope of Number.MAX_SAFE_INTEGER and Number.isSafeInteger. Is that a mess? I'm not one to defend JavaScript, it has lots of problems, but in this case the behavior works as defined in the specification and is predictable.


Do you not find the fact the JavaScript has no concept of integers and treats everything as floating point, as a valid point to it being a mess with numbers?

Sure, you can make it work. But I think it's a fair criticism when compared to just about any other language.


Seems like the opposite of a 'mess'? Maybe it's too minimal and limited. But having just one of a thing seems hard to characterise as being 'messy'.


You can force JS to treat a Number type as an integer. It is uncommon and a bit more verbose.

https://en.wikipedia.org/wiki/Asm.js

I agree though. Dealing with arithmetic (or with Dates) in JS can be painful.


Conversely, do you find that this criticism applies to Lua?


Absolutely. I don't particularly like either language much as a self admitted language snob, but I have to realize their successes and try to remain rather objective in my statements. Subjectively, I think that python does numbers best for a scripting language.


As a person who spends his days writing Lua, it absolutely applies.


I would say ever so slightly less.

Lua has the identical problem to JS, in that you get 53 bits of precision (now 64 with Lua >= 5.3).

However, Lua has operator overloading, so it's practical to import an infinite-precision integer library, define the operators, and then use ordinary arithmetic from there, with correct results.

I would prefer a proper built-in numeric tower, but I'm in the minority there; Lua is a minimalist language and generally doesn't add things to the core which can be written as extensions.


This is quite pedantic because I really agree with your overall point, but:

> Every programming platform has limits on number sizes.

Arbitrary precision integers are a thing, and lots of languages at least have library support for them (I'm sure JavaScript does in an npm library). In Python they're even the way that all integers work (as of Python 3). Of course they're limited in principle by available memory.


They wouldn't be adding BigInt to the language if everything was fine.

The current status quo is that everyone pulls in math libraries if they need precise math. Dealing with overflow issues is difficult in most languages. The float representation adds another layer of difficulty.


JavaScript can’t represent 64-bit integers without using a string.

Calling it just a mess is being polite.



The article really should have been called: "Do you have a problem? Write a DSL". I use DSL's (Domain Specific Languages) or "Little Langues" all the time to be much more productive in C++/Java/C#/...


"When inlining is a valid rewrite?" "When constant folding is a valid rewrite?"

Are these valid questions ? structurally speaking ... they seem somehow off, a lot off.

I'm asking because I'm way more bothered by them than I should be.


> Write a compiler

Then you have two problems?


Yes, that quote is also the first thing I thought of.


Now you have two problems!

:-D


Only if you use regex in the lexer.


I guess now you have 3 problems


Now you have 2 problems.


Alternatively, use blockchain.


> Alternatively, use blockchain.

Hows that work?

Step 1. Raise 4 billion dollars in an unlawful ICO.

Step 2. Pay 24 million (0.6%) in a settlement with the SEC that admits your actions were unlawful but absolves you of any further issues.

Step 3. Use the remaining 3.998 billion to pay someone to write a compiler for you instead of writing it yourself?


This, or perhaps MechanicalTurk.


I was half expecting the punch line to be "now you have 2 problems!".




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: