Very nice! I've been a bit leery though of defaultdict since it contributed to two different bugs in http://norvig.com/spell-correct.html -- in less than a page of code, by a first-rate programmer, which was read by many thousands of others for years before anyone noticed the second bug. I'm not saying don't use this, I just have a little warning light blink on when I see 'defaultdict' anywhere.
Both of those errors seem like fundamental category errors. He's using a dictionary construct which effectively makes it so that every key has a value, but then relies on dictionary membership when testing. This happens to work, but it makes little sense. (One could argue that 'foo in bar' should always be True if bar is a defaultdict.) Seems to me like he should have checked the value of the word rather than using 'in'.
You make a good point about this being written by a known intelligent person and seen by many others who didn't see the bugs, but I think it's ultimately a lesson in thinking about semantics rather than behavior. This code relies too much on the behavior of defaultdict rather than just what such a thing means. As long as what it means is kept in mind I think it should be safe.
At some level it was sloppy thinking, yes; most bugs have some obvious-in-retrospect reason they were stupid. You may be right about how to think about this one.
I've seen other defaultdict bugs -- this is just the case with the highest eyeball-count-times-attention product I can point to.
If this bug was simply due to having the wrong concept of defaultdict, and if the others you saw were the same, then you can avoid the bugs forever by just having the right concept. Lots of ifs there, of course.
A little known fact is that it works in C++ STL too as long as the objects in your containers have default constructors that make sense. So a map<int, map<int, value_type> > does what you expect when you try: my_map[12][3] = some_value;
I don't think it's a little known fact that std::map works that way, ie. that operator [] inserts if the key doesn't exist (or at least, it's surely not little known among people using C++?).
Also, I don't think you can have an infinitely expanding std::map in the same way, because all the key/value types have to get rolled into the type signature of the top level map. It's not that hard to write something yourself to achieve it, but it's not going to be the one-liner that it is in Python.
Autovivification through a rvalue (for clarity: the magic here is that x[a][b]=c works to create x[a] despite that index never seeing an assignment) has a long history in perl. The C++ feature is more limited in scope and AFAIK accidental -- it works because the automatic construction is a side effect of reading the value, which is often considered an undesired feature of STL (you can't use an expression that contains x[a] to test if x contains a, like you can in many languages).
And you may be less cynical, but I've had to explain this sort of thing to a huge number of C++ programmers, most of whom write STL code that looks like Java, with explicit initialization of all the intermediate containers.
Automatic construction is not really a "side-effect" of reading a value in std::map, it's an explicit design choice. They want users to be able to write code like this:
If using operator[] on a key did not implicitly mean "create an entry for this key if it is not there," the above would not work. Rather, you would have to write the above as:
The reason being that std::map is purely a library, not a part of the language. It does not "know" that operator[] is actually being used as a part of an assignment.
Strictly speaking, it's a side effect, because it mutates an existing object. It's also a surprising and bug-prone side effect (as mentioned upthread; also, Darius Bacon found a bug or two in Norvig's spell-corrector due to the corresponding behavior of defaultdict in Python) and an avoidable side effect, although it's not avoidable in C++. In Ruby, Python, Smalltalk, Common Lisp, Lua, and many other languages, you can make this (or its weird-syntax equivalent) work:
dict["one"] = 1;
without causing this to add "one" to the dictionary:
if (dict["one"]) { /* ... */ }
even for a "dict" that is purely a library, not a part of the language.
The underlying problem is that C++ calls the same operator[] in both lvalue and rvalue contexts. Which is pretty odd, really, when you think about it; it certainly doesn't generate the same code for indexing into native arrays in lvalue and rvalue contexts. All the other languages distinguish between the lvalue and rvalue contexts. Ruby calls [] or []=, Python calls __getitem__ or __setitem__; Smalltalk has #at: and #at:put:; Common Lisp doesn't have separate names for the two things, but one of them is defined with (defun foo (dict key) ...) and the other is defined with (defun (setf foo) (dict key) ...); in Lua, these are the metamethods __index and __newindex.
So it's sort of true that it's not std::map's fault, but C++'s. But not really. Stroustrup fixed several things about the way templates worked to make STL work better; he should have fixed this one too.
do
local mt = {
__index = function(t, k)
t[k] = tree()
return t[k]
end
}
function tree()
return setmetatable({}, mt)
end
end
t = tree()
t.foo.bar = 'foobar'
attrdict is indeed a useful thing to have around. I usually base it off on the built in "dict", but as this example shows, it is useful on top of "defaultdict" as well.
This is a tree with labelled edges, not just a tree. This tree will not hold multiple values with the same label. I always imagine my trees to have no labelled edges.
Why is this haphazard? If you don't want reads to change your data structure shape in a well-defined manner, use 'exists' or make an immutable hash with Hash::Util.