Hidden treasure in the D standard library

thinkpad20 · on Aug 28, 2014

I'm curious how D implements memoizing an arbitrary function. To memoize something, you need to be able to check if it's being called with the same arguments. But to do that, you need a definition of "same" (i.e., an equivalence relation). For example, if the argument is a dictionary, is it considered the same argument only if the pointer is the same, or if the keys and values are the same, assuming a recursive definition of "same"? Also, you need a way to store computed values, using the arguments as keys, which for efficiency would require either an ordering (for a tree) or hashing (for a hash table) on the arguments. Multiple arguments would further complicate things. How does this get automated? And for some things, memoization wouldn't make sense (for example, with function arguments in higher-order functions). Sensible defaults would take care of most of these things, but it makes me wonder how they chose to address these things. And is the memo recreated each time the (outer) function is called, or is it stored for the duration of the program? Also, how does the type system play with this? What's the type of `memoize!`?

andralex · on Aug 28, 2014

1. Sheer comparison for equality is used. (It is user definable.) For e.g. built-in hashes, contents must compare equal.

2. The store is a hashtable.

3. Multiple arguments are grouped in a tuple - no major complication.

4. Where memoization doesn't make sense you'll get a compile-time error because e.g. comparison for equality is't defined. At least most of the time :o).

5. The memory is thread-local.

6. memoize takes the name of the memoized argument and creates a distinct type for each memoized function.

TL;DR: business as usual

delluminatus · on Aug 28, 2014

> 1. Sheer comparison for equality is used. (It is user definable.) For e.g. built-in hashes, contents must compare equal.

Wow, so if you compare aHash == anotherHash, it doesn't just compare the references, it actually examines both hashes for content equivalence? That's fairly unconventional, but probably convenient sometimes.

andralex · on Aug 28, 2014

Affirmative. Use "aHash is anotherHash" to compare references.

jnetterf · on Aug 28, 2014

The standard library (Phobos) is open source. You can find the implementation of memoize at https://github.com/D-Programming-Language/phobos/blob/6cd2e4...

memoize is a template (http://dlang.org/template.html). In this case, a template is used to create a "version" of "memoize" with the right parameters and return value (i.e., `fn` and `memoize!fn` are the same type).

Each of these "versions" has a hash table from argument tuples to results. I believe the hash is computed by calling a toHash function for each item in the tuple.

Tuples and type tuples (http://dlang.org/tuple.html) in D are really quite powerful.

tel · on Aug 28, 2014

How do you know what types are hashable?

andralex · on Aug 28, 2014

Must support comparison for equality and the hash function. Default built-ins provided appropriately.

repsilat · on Aug 29, 2014

In theory you could use a tree for unhashable things with a total order, and you could use a list for just about anything else (though you might want to emit a warning...).

I assume D has enough magic to pattern match on the traits of the types in the argument list?

dllthomas · on Aug 29, 2014

Presuming you have equality.

tel · on Aug 28, 2014

Oh, that's tragic.

andralex · on Aug 28, 2014

s/tr/m/

tel · on Aug 28, 2014

Very magical—but I'm not convinced all types have meaningful equality or hashes. It's convenient to have, but thorny whenever you, say, have anonymous functions lying around.

andralex · on Aug 28, 2014

Those won't compile.

tel · on Aug 28, 2014

Oh, dlthomas caught what went wrong! I misunderstood, apologies. If they dont compile then that's great.

sanderjd · on Aug 28, 2014

Why is it tragic?

dllthomas · on Aug 28, 2014

My read of this is that tel (mis)interpreted andralex as saying that all types must support equality and hashing, not that types must support equality and hashing to be hashable. It would be tragic because there can be types with no meaningful way to produce a hash or test for equality.

If that's not the case, then I don't understand either.

v0i0-0 · on Aug 28, 2014

Have a look at the sourcecode [1]. It is just implemented with an associative array (implemented as hash table) from argument tuples to the return type. This means that keys have to be atomic or provide their to_hash and equality check themselves. Check out the D template system, it is quite impressive what you can express with it.

1: https://github.com/D-Programming-Language/phobos/blob/master...

Shorel · on Aug 28, 2014

Memoization is one the most useful things I learned about in Common Lisp. Glad to see it in D.

It allows the program to have recursive algorithms with the performance of optimized for loops.

blt · on Aug 28, 2014

Memoization is pretty easy to implement yourself without any special language support. It's cool to make it part of the standard library but it's not like some magic sauce. Especially given that D always uses hash tables. In the Fibonacci example given, a simple array would have much, much better performance.

repsilat · on Aug 29, 2014

> In the Fibonacci example given, a simple array would have much, much better performance.

When I looked at the output of the hash function in Python for integers a few weeks ago it looked like the identity function was being used. You'll still have greater complexity than an array, but I'm not sure how much greater.

Shorel · on Aug 29, 2014

It was dead easy in Common Lisp. That's the most impressive thing about it.

jacobwcarlson · on Aug 29, 2014

Tail call optimization is necessary as well (or at least extremely beneficial). I don't know enough about D compilers to say whether or not they do this, but this discussion gives a strong "perhaps?":

http://forum.dlang.org/thread/jk5jc0$25cd$1@digitalmars.com

yoklov · on Aug 29, 2014

Not really. Only if a hash table lookup is the first part of every iteration of the for loop.

It does allow some recursive algorithms to reduce their algorithmic complexity significantly, however.

Zardoz84 · on Aug 28, 2014

I recommend try std.parallelism. I did a test with a n Body simulator and runs very well on a 15 core machine with a speed up around x13-14 over the serial version.

https://github.com/Zardoz89/nBodySim

tonysuper · on Aug 28, 2014

The "flag" example could be easily replaced with named parameters. Seems a bit unnecisary to add a new type for that purpose, unless I'm missing something.

asterite · on Aug 28, 2014

You are not missing anything: named parameters would have been a much nicer choice, but I think they don't want to make the language more complex.

EvenThisAcronym · on Aug 28, 2014

Are there any plans to add named arguments a la C# or Python in the future?

renox · on Aug 29, 2014

Currently 'the plate is full': D is full of features and many of them are not finished/polished so I doubt that they will add such kind of feature in the near future..

pyed · on Aug 28, 2014

Haskell has treasures.