Hacker News new | past | comments | ask | show | jobs | submit login
Merging Maps (maxbittker.github.io)
51 points by mmastrac on May 9, 2017 | hide | past | favorite | 19 comments



Why not use `update` in Python? e.g. if you want to merge dictA and dictB, you'd write

    dictA.update(dictB)
If you have many you can always use this:

    dictD = {}
    map(dictD.update, [dictA, dictB, dictC])


Or

    dictAB = dict(dictA, **dictB)
in Python 2 and 3.


That'll create an intermediate dict (due to the unpacking) for Python <3.6 and then another dict. a.update(b) does neither.

It also has different semantics from a.update(b). The unpacking way will explode when both a and b contain the same key, while a.update(b) does not; existing keys also contained in b will be overwritten by the values from b.


It thought ChainMap might be useful (Python) but found out I need to be a lot more careful using it.

    from collections import ChainMap

    a, b, c = {'a': 1}, {'b': 12, 'c': 13}, {'c': 23}
    d = ChainMap(a, b, c) # ChainMap({'a': 1}, {'b': 12, 'c': 13}, {'c': 23})
    e = ChainMap(c, b, a) # ChainMap({'c': 23}, {'b': 12, 'c': 13}, {'a': 1})

    # Lookups cascade
    d['c']  # -> 13
    e['c']  # -> 23

    # Updates mutate first dict
    d['c'] = 1003 # ChainMap({'a': 1, 'c': 1003}, {'b': 12, 'c': 13}, {'c': 23})
 
    d['c'] # 1003
    e['c'] # 23
    a   # {'a': 1, 'c': 1003}
Python 3.3+ https://docs.python.org/3/library/collections.html#collectio...


You can add an empty dictionary as the first argument if you want to prevent that.


> Except you may not call them “maps” — key value data structures inexplicably have a different name in every single programming language:

And then there's the cases where one language/framework has multiple associative arrays.

In C++, an std::map is an ordered map, where the key needs to implement a comparison method. An std::unordered_map, on the other hand, uses a hash function to organize items into buckets. (Same in Qt with QMap and QHash, by the way.)

When a language only provides one type of associative array, it tends to be the unordered variant, but there's usually a library for ordered associative arrays.

Usually, the unordered variant is preferred because it's much faster to compare computed hashes (usually an int32 or int64) instead of the keys themselves (e.g. potentially long strings), esp. if the map is clever and caches the hashes along with the keys. However, the implementation is more involved since it needs to implement a sufficiently nice hash function, that also adds a random salt to counter complexity attacks.


"Most big Go or Lua projects end up pulling in dependencies like mergo or penlight tablex."

Don't know about Lua, but in Go this is absolutely not true. I didn't even know "mergo" and it seems to have a mere 200 stars on GitHub.

As the article notes, the culture in Go is to write explicit imperative code instead of hiding operations behind functional-sounding routines that have none of the functional properties.


Erlang:

    maps:merge(M1,M2).
Composable & preserving.

Variadic doesn't matter much as can do fold over a sequence of maps:

    lists:foldl(fun(E,A) -> maps:merge(E,A) end, #{}, [M1,M2,M3, ...]).


Java9 also has the Map.ofEntries() method. Map.merge() supports supplying a BiFunction for resolution. And Java8 streams make it easy to write composeable, arbitrary merge functions.


Haskell's "containers" package has a whole module devoted to map merging: http://hackage.haskell.org/package/containers-0.5.10.2/docs/...


Perl5:

  my %merged = ( %first, %second, %third, extra => entries, %etc );
  # or slightly more efficient, in-place update one hash with another
  @merged{keys %other} = values %other;
Every statement in Perl5 is an expression, so they're trivial to compose.


I like Perl, and use it quite a bit. But the obvious drawback of that decision is that sub(%foo,%bar) passes a merged hash. You can use refs of course, but I've seen this bite beginners a lot.


Yeah, list context is not an obvious paradigm for beginners, and hashes in list context being a list of pairs is also a surprise.


TXR Lisp: hash-uni:

http://www.nongnu.org/txr/txr-manpage.html#N-00F548D1

For cases when the key occurs in both hashes, you can specify a join-func which takes the two values. Its return value is used to populate the key in the union. The default behavior is to take the left value.

This is very useful.

Suppose that both hashes associate keys with lists. We want all the items in the union: [hash-uni left right append].

We have two frequency histograms. We want to combine them: [hash-uni left right +].


Next Gen JS is even easier:

  const a = { 1: '1', 2: '2'};
  const b = { 1: '1', 2: 'b', c: {3: 'c' } };

  const merged = {
    ...a,
    ...b,
    ...b.c,
  }


Common Lisp - well, depends on how you implement your maps.

alists and plists - just consing them together will do, though they're wasteful in terms of space as all the keys and values of both maps are preserved (or you can treat it as a feature - merging preserves history); anyway, it's easy to trim them with remove-dupliates.

hash tables - nothing in the standard, though trivial to add.


I'm surprised the author doesn't go into deeply nested maps and deep merges. That's an issue none of the languages solves natively.


PHP is the only language compared in the article that natively provides this:

"PHP also includes a functionality in its standard library that none of the other languages here do: deep merging. array_merge_recursive will recursively call merge on string key collisions."


Ruby can also use the Python solution nowadays.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: