Hacker News new | past | comments | ask | show | jobs | submit login

I used to be fervent embedded DSL fan many years ago (particularly in college with Scheme/Racket) but since then I have grown to hate them particularly Scala, Groovy and Ruby DSLs.

External DSLs (that is a DSL independent of the host language)... I love them. An external DSL is like a protocol or the ultimate formal API (an example would be HTTP).

Why do I dislike embedded DSLs: I guess the crux could be that you are essentially mixing two languages when the reality it is just one so you are doing non canonical things in the host to make the guest language look pretty (eg going to town with implicits, extension functions, and operator overloading (scala)). Basically a whole bunch of confusing stuff to normal developers just to make the DSL look pretty.

With Scheme/Lisp/Clojure it is OK because embedded DSLs using sexp look like lisp. In Haskell it is sort of OK because Monads and various other things Haskell provides (laziness).

With Scala its the absolute worse with operator overloading, implicits, and various other gotchas.

In my experience it also gets really confusing trying to figure what is going on when a embedded DSL breakk as well as extremely difficult to secure an embedded DSL. `ie System.exit(1)` is just one call away.

There are exceptions of course. DSLs that mimic external DSLs one-to-one. One example being jOOQ as it aims to be one to one with an external DSL ... SQL.




Embedded DSLs often seem like the typical "bike shed coding styles" taken to the extreme. In my opinion they often value form over function, creating an entirely new (subset of the) language just because.

I can obsess about code formatting as much as anybody else but creating my own subset of the language just for a specific domain just seems excessive. It puts additional burden on anyone else trying to work with it because they now not only have to worry about the "host" language but also the idiosyncrasies of the embedded DSL.


Counter argument for statically enforced dsl (e.g. Kotlin):

I have an idea for a DSL - parse the templates for a site to generate type information about what buttons, inputs, interpolated values, etc can be found on each page. Combine that with parsed route information to allow one to generate dsl binding for a all the pages / routes.

If we're doing that with Kotlin, then afterwards we can get a really nice, statically enforced testing api (for Selenium, etc), with IDE completion. And you don't even have to run them to make sure that the element selectors work (barring of course any bugs in your parsing setup).

I had this idea several months ago but haven't had the opportunity to use it. I'm itching to because it seems like a feasible, if not slightly insane, way to making writing automated end to end tests easier to write and maintain.

tl;dr if you're going to do some code gen for code generation, why not make the interface a DSL?


I'd go the other way - rather than generate code from templates, declare the dynamic parts of the page in code, and generate both the implementation and the test accessors from that (I'd use a Wicket-like approach where the HTML fragments only included ids and all the logic was in code). I haven't seen this done for the actual HTML side of things, but https://github.com/http4s/rho gives you a declarative way to, uh, declare your HTTP routes, and then you can generate swagger or a client from the exact same declarative routing construct value that you're using to run your actual routes.


Would you say this is the battle between syntax & semantics?

If you are inventing new syntax to support semantics you don't fully understand along the way, it usually ends up as some franken language.

Lisp makes this a feature by limiting what most people end up doing into one syntax, and with things like macroexpand-1, you can see what "real code" is being generated.


I failed to explain why with Lisp and Haskell it feels ok but I think you sort of nailed it with the "expansion" part. With Haskell and Lisp you think in expressions being expanded.

The DSLs in Ruby, Groovy, Scala (often but not always) are not macro expansion. Often times these DSLs are very mutable and are like a state machine (and in fact have a state machine underneath which causes a whole bunch of concurrency issues).


Would add that Haskell DSL's can be formed from functionality derived from different kinds of monads and other categorical structures[1]. That sounds highfalutin, but different types of monads and other structures (functors, applicatives, etc) give you insane amounts of power to define your DSL in a very precise way. It even gives you a formal basis to reason about your DSL should you want to go that far with it.

[1] https://wiki.haskell.org/Typeclassopedia#Introduction


I think another difference at least for Common Lisp DSL's are embeded into the language (look at loop) and so people are comfortable with using them and somewhat how to write them.

This doesn't mean there isn't people who hate stuff like loop being in cl but it does go someway to showing why it's okay. I think the design of lisp encourages them as well (as the previous poster said) by the fact that the entirety of lisp is built up from a extremely limited amount of instructions so the entire language can be argued to be a DSL on top of a DSL on top of a DSL etc.


I find them very useful in Scala, because they make for a way to do "config" that you can refactor safely under the normal rules of the language. E.g. in akka-http/spray if there's a bunch of directives that I keep repeating in my HTTP routing config then I can just pull that out into a variable that's just a plain old Scala variable, and use that variable like a normal variable. Whereas if I want to pull out a repeated stanza from a routing config file I have to find out the rules for that config file and maybe it's not even possible.

The stuff Scala does can be confusing to some people, but I find a lot of it actually simplifies things. Certainly I never want to go back to a language where operators are treated as a special case differently from normal functions (at the same time there are certain Scala libraries I avoid because those libraries give their functions stupid names - but that's a problem that exists in any lanugage). Implicits make for a nice middle-ground if you use them the right way, to represent things that would have to be either painfully verbose or completely invisible in another language but really want to be almost-but-not-quite invisible. If you use them for something that should be more explicit, or for something that's not worth tracking at all, then that can become a problem (though again a problem you'd have in any language).

A lot of the stuff certain libraries do with "embedded DSLs" in Scala because it just shouldn't be a DSL at all - IMO there's no reason unit test code shouldn't just be plain old code. But the cases where you really do need a DSL more than make up for it. The wonderful thing I've found in Scala is that you never need an external config file or magic annotation or anything like that - everything is just Scala, and once you understand Scala (which, granted, probably takes a little longer than other languages) you never need to worry about understanding what some framework or external DSL is doing.


> I find them very useful in Scala, because they make for a way to do "config" that you can refactor safely under the normal rules of the language

If you see the choices as either DSL or external config file, then I can understand why DSLs are attractive.

But I found that Spray suffers because it tries to hard to be a DSL rather than just being Scala. When our team tried to do something that wasn't 100% obvious, or didn't fit with the documentation we ended up spending a lot of time trying to understand how the DSL worked under the covers so that we could work out what we needed to do, and then work out how to turn that back into the DSL.

It's been a while since I worked with it so (a) it might have changed, (b) I'm probably remembering details incorrectly, but I recall several conversations with team-mates who were just trying to do something simple like extracted a value from a request in a slightly non-standard way, where I would have to explain "no, this part is installing a spray directive, it runs at class initialisation, this is the part that runs at request time". Once you understand how spray is implemented, you can work those bits out, but the DSL doesn't make it clear at all, and if getting something to work requires that you understand the spray design, and you need to know Scala, then the DSL is just one more thing to learn, when a idiomatic standard scala API would be clearer all round.

My experience was that Spray (and Slick) fell right into the trap of: you need to know how the tool implemented to be productive in it, but once you know that the DSL is no longer adding enough value to justify its awkwardness.


> if getting something to work requires that you understand the spray design, and you need to know Scala, then the DSL is just one more thing to learn, when a idiomatic standard scala API would be clearer all round.

I found that at least it was all ordinary Scala code, so I could always click through to the definitions in my IDE - I actually did that quite a lot early on, basing my custom directives off of copy/pasting directives from the standard library. Contrast that with e.g. Jersey, where when I wanted to add a custom marshaller I couldn't even tell where to start - the existing marshallers had annotations, but the annotation was just an inert annotation, I had no idea how to get from there to the implementation code or how to hook up my own annotation that would do the same thing.

I'd be interested in an "idiomatic standard Scala API" that had to pretensions to be anything other than code, but I suspect it would end up looking very similar to Spray. It seems like every web framework in every language does some kind of routing config outside the language - even e.g. Django, you'd think Python is dynamic enough that you could just use Python, but actually the way you do the routing is registering a bunch of (string, function) pairs where the string is some kind of underdocumented expression language for HTTP routes. So from my perspective the options really are DSL or a config that's to a certain extent external, because those are the only things I've ever seen web frameworks offer - I'd be very interested to see any alternative.


Pandas & Numpy are extremely useful DSLs in Python. Being able to add vectorization to the language and use structures like DataFrames in an R-like way is very powerful. There are some good use case scenarios.

Language features like operator overloading do exist for a reason, and that's because the language designer was aware of situations where you do want to use those features.


Do pandas and numpy count as DSLs? This always confuses me. I think it's just a library, with the same language and the same semantics, but different data structures.


This is a good question, at what point does an extensive abstraction, a library on top of a language to extend the language to make it accessible for a specialized use-case, turn into a "DSL"?

I also never saw Numpy/Pandas as a DSL but rather a extensive layer on top of Python whose complexity is largely a result of the usecase rather than being the result of attempting to be a full DSL on top of the language ala Matlab for Python.

This is likely one of those scenarios where DSL's are one of many options available to particular languages to solve a particular problem set but are hard to identify in practice. Not to mention the many times it doesn't make sense to develop a full DSL layer but regardless the ease of creating them in some languages makes it a commonly abused trope (as many OO-related concepts are applied to everything where other old solutions are far superior).

It's difficult to differentiate between the functional utility vs purely aesthetic optimizations of various abstractions, so I wouldn't be quick to blame negligence as much as communicating the best tools for the job on a language-by-language basis.


I'd even go as far as arguing that the fact that pandas / numpy isn't a DSL causes some of its awkwardnesses, e.g. the fact that you have to use & for `and` in pandas, and the fact that you have to parenthesize expressions like `df[(df.a == 7) & (df.b == 2)]` instead of `df[df.a == 7 & df.b == 2]` or python's wonky operator precedence will try to execute `7 & b` first. Also we could even have special dataframe scoping rules like `df[a == 7 and b == 2]`, but we have to do `df.a` instead, exactly because pandas is NOT a DSL.


That makes sense. I do find that stuff to be awkward and sometimes wish the syntax could be simpler.


You can't do this in regular Python:

Numpy array * 10

Pandas column A + column B

Pandas Dataframe[ column C < 10 ]

Numpy array 1 / array 2 where the second array has 0s and NaNs in it. Numpy has overridden division to allow division by 0 and NaN (Numpy added data type) in addition to vectorization.

Moreover, you're encouraged to not iterate (generally a lot slower) if you can help it when using these libraries.


I believe the dot product for an array is a.dot(b) ?

Would a.mult(b) be terrible for the first example?

I assume the third example is R-style:

  df[df['foo'] < 10 ] ?
I don't believe I can ovveride 'is', or 'instanceof', plus df has to pre-exist:

  foo = make_df()[foo['col'] > 10]
why does it have to be R-style? is that necessarily more powerful than something more pythonic?

  df.filt(lambda x: x > 10, ['foo'])
or even

  df.filt(lambda x, y: (x > 10) and (y > 10), ['foo', 'bar'])

  new_tbl = make_df().filt(lambda x, y: (x > 10) and (y > 10), ['foo', 'bar'])
vs

  df[(df['foo'] < 10) & (df['bar'] < 10)]
also, I believe James Powell does a talk wrt the inconsistencies of pandas/numpy interface.


Subclass array.array and specialize the operators as you desire. All in pure, albeit slower, Python. Numpy is just a library.


Embedded DSL's are just libraries, what makes something an embedded DSL is that it attempts to be a literate fluent configuration language in the host languages native syntax. If it doesn't use the host langauge's syntax, it's not an embedded DSL, it's an external one.


Numpy doesn't introduce new syntax. Novel operator behavior does not a DSL make.


You don't have to intrude new syntax to create an embedded DSL, that's the whole point of an embedded DSL, it uses the languages existing syntax. Smalltalk and Lisp are full of DSL's, as is Ruby, of the three only Lisp has the ability for syntactic abstraction, every Smalltalk DSL uses native syntax. See Seaside's DSL for html generation or Glorp's for database mappings.


I don't think you can introduce new syntax in Python and have it run as part of the language, so magic methods, decorators and metaclasses are as good as it gets. You'd have to write a parser to handle new syntax, and that makes it external, right?


You can also use MacroPy[1] and create embedded DSL with a macro system inspired by Scheme and Elixir.

You don't need to write a parser, btw, because the stdlib provides one for you (in `ast` module).

[1] https://github.com/lihaoyi/macropy


No, because they're valid Python. Similarly, what Ruby kids call "DSLs" don't count as DSLs, at best they can be called DSAPIs.


they are not DSLs. they are literally Python and they are APIs.


An internal DSL would have to be part of the native language. Either Python doesn't (directly) support this, or magic methods partially allow the creation of DSLs by extending the operators.


I get what you're trying to say, I think, but you should use a different term. Pandas and NumPy aren't DSLs unless you interpret the 'L' to mean Library.

It is unusual but perfectly cromulent in Python to overload the magic methods on a class to provide whatever semantics you like though operators. So to me it doesn't seem like a DSL.

There was a recipe on ActiveState's site for essentially creating new operators by defining classes that overrode default operator semantics in "both directions" if you will. So you could write:

    foo <<my_op>> bar
And my_op could do whatever it wanted to with foo and bar, by overloading the left- and right-shift magic methods in the my_op class. Neat, eh? (But still not a DSL! Heh.)


By definition, internal (or embedded) DSLs (a term with well established use) are valid host-language code, relying on whatever host language features exist that allow defining code that reads fluently for the application domain. That is what distinguishes them from external DSLs.


I didn't know that. Thank you.


I have a theory on why people maybe overvalue embedded DLSs: many problems become trivial when the terms of a language are a natural fit for the problem domain, where 'terms' are just any abstractions (new types, instantiations of old types).

So, maybe people are assuming that this benefit derives from having a full language which is a natural fit for a problem domain? (My contention being that the terms are what's really significant and the rest can be nice, but diminishing returns + tradeoffs.)

edit: fixed phrasing. I'll also add: I think certain sub-problems in an application can be specialized enough to need something really different (e.g. SQL, Prolog), but it seems like a relatively uncommon thing.


I used to like embedded DSLs a lot too, but then I realized they (mostly) were just the builder pattern + a lot of "extra" syntax (never written anything large enough in a Lisp or Haskell). The builder pattern feels more readable and has fewer context switches when writing code.


A "nice syntax to the builder pattern" describes every single applicative based DSL I've seen on Haskell. (I think this generalizes for all theoretically possible ones, but I need more coffee for being sure.)

But you get the advantage that applicative is a standard syntax, what you don't get on most builders.


I always had a difficult time understanding applicatives. Do you know of any good resources for understanding them?


I have started intuiting it like this. I want you to imagine the concept of a "producer of values", which is a fairly broad notion. This can be

1. a list, which produces its elements one at a time;

2. a computation that might fail to produce a value, which either produces one value or none at all;

3. an I/O computation, which may ask the user for a value to produce),

4. a stateful computation, which produces a value that may depend on some hidden state;

5. a parser, which produces a value that may depend on the source data being parsed;

6. a source of pseudorandomness, which produces a that is unpredictable;

or any other of a nearly infinite set of things. For such a producer of values, we can imagine that

A) The Functor instance says, "Hey, give me a function and I'll create a new producer that produces the values you get if you apply that function to the values I produce."

-- Example: If the producer is the infinite stream [1..] and the function is (^2), the Functor instance lets us create a new producer that produces the infinite stream of positive perfect squares.

B) The Applicative instance says, "Okay, that's cool, but you know what I can do beyond that? Give me two different producers of values, and I promise you I'll create a producer that combines values from both of the two producers you gave me. So in a sense, I am a combiner of producers."

-- Example: If the first producer is a database of previous actions the player of a game has taken, and the second producer is a source of randomness, the Applicative instance lets us create a new producer that produces an AI decision based on player action history but with some randomness thrown in to look more human.

C) The Monad instance says, "Pah, and you thought that was neat? Look what I can do! If you give me a producer and several different possible producers, I can create a new producer that chooses which of the different possible producers to run next based on the values produced by the first producer. So in a sense, I am the opposite of Applicative: I am a splitter of producers."

-- Example: if the first producer is a stateful computation that extracts the player health from the state in a game, we may have two different producers lined up to follow: one produces a commiserative message mentioning their score, and the other produces a message telling them round number n is starting. The Monad instance lets us create a new producer that delegates to either of the two depending on whether the player is dead (health <= 0) or alive (health > 0).

... I should really get around to writing this blog post ... I feel like I have rewritten it a thousand times in various comments at various points...


This is one of the best comments I've ever read on this matter. Please link that blog post when you write it!


+1. And you have to learn the all damn dsl wich is always badly documented and with zero tooling or support instead of just using the regular language API for witch you are equiped for.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: