Hacker News new | past | comments | ask | show | jobs | submit login
Hidden treasure in the D standard library (nomad.so)
127 points by nkurz on Aug 28, 2014 | hide | past | favorite | 29 comments



I'm curious how D implements memoizing an arbitrary function. To memoize something, you need to be able to check if it's being called with the same arguments. But to do that, you need a definition of "same" (i.e., an equivalence relation). For example, if the argument is a dictionary, is it considered the same argument only if the pointer is the same, or if the keys and values are the same, assuming a recursive definition of "same"? Also, you need a way to store computed values, using the arguments as keys, which for efficiency would require either an ordering (for a tree) or hashing (for a hash table) on the arguments. Multiple arguments would further complicate things. How does this get automated? And for some things, memoization wouldn't make sense (for example, with function arguments in higher-order functions). Sensible defaults would take care of most of these things, but it makes me wonder how they chose to address these things. And is the memo recreated each time the (outer) function is called, or is it stored for the duration of the program? Also, how does the type system play with this? What's the type of `memoize!`?


1. Sheer comparison for equality is used. (It is user definable.) For e.g. built-in hashes, contents must compare equal.

2. The store is a hashtable.

3. Multiple arguments are grouped in a tuple - no major complication.

4. Where memoization doesn't make sense you'll get a compile-time error because e.g. comparison for equality is't defined. At least most of the time :o).

5. The memory is thread-local.

6. memoize takes the name of the memoized argument and creates a distinct type for each memoized function.

TL;DR: business as usual


> 1. Sheer comparison for equality is used. (It is user definable.) For e.g. built-in hashes, contents must compare equal.

Wow, so if you compare aHash == anotherHash, it doesn't just compare the references, it actually examines both hashes for content equivalence? That's fairly unconventional, but probably convenient sometimes.


Affirmative. Use "aHash is anotherHash" to compare references.


The standard library (Phobos) is open source. You can find the implementation of memoize at https://github.com/D-Programming-Language/phobos/blob/6cd2e4...

memoize is a template (http://dlang.org/template.html). In this case, a template is used to create a "version" of "memoize" with the right parameters and return value (i.e., `fn` and `memoize!fn` are the same type).

Each of these "versions" has a hash table from argument tuples to results. I believe the hash is computed by calling a toHash function for each item in the tuple.

Tuples and type tuples (http://dlang.org/tuple.html) in D are really quite powerful.


How do you know what types are hashable?


Must support comparison for equality and the hash function. Default built-ins provided appropriately.


In theory you could use a tree for unhashable things with a total order, and you could use a list for just about anything else (though you might want to emit a warning...).

I assume D has enough magic to pattern match on the traits of the types in the argument list?


Presuming you have equality.


Oh, that's tragic.


s/tr/m/


Very magical—but I'm not convinced all types have meaningful equality or hashes. It's convenient to have, but thorny whenever you, say, have anonymous functions lying around.


Those won't compile.


Oh, dlthomas caught what went wrong! I misunderstood, apologies. If they dont compile then that's great.


Why is it tragic?


My read of this is that tel (mis)interpreted andralex as saying that all types must support equality and hashing, not that types must support equality and hashing to be hashable. It would be tragic because there can be types with no meaningful way to produce a hash or test for equality.

If that's not the case, then I don't understand either.


Have a look at the sourcecode [1]. It is just implemented with an associative array (implemented as hash table) from argument tuples to the return type. This means that keys have to be atomic or provide their to_hash and equality check themselves. Check out the D template system, it is quite impressive what you can express with it.

1: https://github.com/D-Programming-Language/phobos/blob/master...


Memoization is one the most useful things I learned about in Common Lisp. Glad to see it in D.

It allows the program to have recursive algorithms with the performance of optimized for loops.


Memoization is pretty easy to implement yourself without any special language support. It's cool to make it part of the standard library but it's not like some magic sauce. Especially given that D always uses hash tables. In the Fibonacci example given, a simple array would have much, much better performance.


> In the Fibonacci example given, a simple array would have much, much better performance.

When I looked at the output of the hash function in Python for integers a few weeks ago it looked like the identity function was being used. You'll still have greater complexity than an array, but I'm not sure how much greater.


It was dead easy in Common Lisp. That's the most impressive thing about it.


Tail call optimization is necessary as well (or at least extremely beneficial). I don't know enough about D compilers to say whether or not they do this, but this discussion gives a strong "perhaps?":

http://forum.dlang.org/thread/jk5jc0$25cd$1@digitalmars.com


Not really. Only if a hash table lookup is the first part of every iteration of the for loop.

It does allow some recursive algorithms to reduce their algorithmic complexity significantly, however.


I recommend try std.parallelism. I did a test with a n Body simulator and runs very well on a 15 core machine with a speed up around x13-14 over the serial version.

https://github.com/Zardoz89/nBodySim


The "flag" example could be easily replaced with named parameters. Seems a bit unnecisary to add a new type for that purpose, unless I'm missing something.


You are not missing anything: named parameters would have been a much nicer choice, but I think they don't want to make the language more complex.


Are there any plans to add named arguments a la C# or Python in the future?


Currently 'the plate is full': D is full of features and many of them are not finished/polished so I doubt that they will add such kind of feature in the near future..


Haskell has treasures.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: