Hacker News new | past | comments | ask | show | jobs | submit login

Do pandas and numpy count as DSLs? This always confuses me. I think it's just a library, with the same language and the same semantics, but different data structures.



This is a good question, at what point does an extensive abstraction, a library on top of a language to extend the language to make it accessible for a specialized use-case, turn into a "DSL"?

I also never saw Numpy/Pandas as a DSL but rather a extensive layer on top of Python whose complexity is largely a result of the usecase rather than being the result of attempting to be a full DSL on top of the language ala Matlab for Python.

This is likely one of those scenarios where DSL's are one of many options available to particular languages to solve a particular problem set but are hard to identify in practice. Not to mention the many times it doesn't make sense to develop a full DSL layer but regardless the ease of creating them in some languages makes it a commonly abused trope (as many OO-related concepts are applied to everything where other old solutions are far superior).

It's difficult to differentiate between the functional utility vs purely aesthetic optimizations of various abstractions, so I wouldn't be quick to blame negligence as much as communicating the best tools for the job on a language-by-language basis.


I'd even go as far as arguing that the fact that pandas / numpy isn't a DSL causes some of its awkwardnesses, e.g. the fact that you have to use & for `and` in pandas, and the fact that you have to parenthesize expressions like `df[(df.a == 7) & (df.b == 2)]` instead of `df[df.a == 7 & df.b == 2]` or python's wonky operator precedence will try to execute `7 & b` first. Also we could even have special dataframe scoping rules like `df[a == 7 and b == 2]`, but we have to do `df.a` instead, exactly because pandas is NOT a DSL.


That makes sense. I do find that stuff to be awkward and sometimes wish the syntax could be simpler.


You can't do this in regular Python:

Numpy array * 10

Pandas column A + column B

Pandas Dataframe[ column C < 10 ]

Numpy array 1 / array 2 where the second array has 0s and NaNs in it. Numpy has overridden division to allow division by 0 and NaN (Numpy added data type) in addition to vectorization.

Moreover, you're encouraged to not iterate (generally a lot slower) if you can help it when using these libraries.


I believe the dot product for an array is a.dot(b) ?

Would a.mult(b) be terrible for the first example?

I assume the third example is R-style:

  df[df['foo'] < 10 ] ?
I don't believe I can ovveride 'is', or 'instanceof', plus df has to pre-exist:

  foo = make_df()[foo['col'] > 10]
why does it have to be R-style? is that necessarily more powerful than something more pythonic?

  df.filt(lambda x: x > 10, ['foo'])
or even

  df.filt(lambda x, y: (x > 10) and (y > 10), ['foo', 'bar'])

  new_tbl = make_df().filt(lambda x, y: (x > 10) and (y > 10), ['foo', 'bar'])
vs

  df[(df['foo'] < 10) & (df['bar'] < 10)]
also, I believe James Powell does a talk wrt the inconsistencies of pandas/numpy interface.


Subclass array.array and specialize the operators as you desire. All in pure, albeit slower, Python. Numpy is just a library.


Embedded DSL's are just libraries, what makes something an embedded DSL is that it attempts to be a literate fluent configuration language in the host languages native syntax. If it doesn't use the host langauge's syntax, it's not an embedded DSL, it's an external one.


Numpy doesn't introduce new syntax. Novel operator behavior does not a DSL make.


You don't have to intrude new syntax to create an embedded DSL, that's the whole point of an embedded DSL, it uses the languages existing syntax. Smalltalk and Lisp are full of DSL's, as is Ruby, of the three only Lisp has the ability for syntactic abstraction, every Smalltalk DSL uses native syntax. See Seaside's DSL for html generation or Glorp's for database mappings.


I don't think you can introduce new syntax in Python and have it run as part of the language, so magic methods, decorators and metaclasses are as good as it gets. You'd have to write a parser to handle new syntax, and that makes it external, right?


You can also use MacroPy[1] and create embedded DSL with a macro system inspired by Scheme and Elixir.

You don't need to write a parser, btw, because the stdlib provides one for you (in `ast` module).

[1] https://github.com/lihaoyi/macropy


No, because they're valid Python. Similarly, what Ruby kids call "DSLs" don't count as DSLs, at best they can be called DSAPIs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: