This doesn't feel right.. it's not actually a nice side effect at all.
1. it's not in the spec
2. you shouldn't rely on it
3. python can't figure out if you're relying on it, so no error will be raised
4. subtle bugs are sure to be introduced by people who "know" this "feature" exists and use it.
Regardless of the cool implementation details, this post shouldn't advertise that "keywords become ordered" and "A nice "side effect" of compact dict is that the dictionary now preserves the insertion order".
Ergo... we built this awesome thing that you'd love to use but you can't. Don't use it or you'll be a bad programmer who doesn't read specs!
For what it's worth, the JS ecosystem has tried for years to get users to not rely on key iteration order. It was always unspecified but no matter what, users still expected them to iterate in insertion order.
V8 fought against it for years. Look at this bug horror show:
I think Python is doing the right thing. The only reason to not provide a deterministic iteration order for unsorted maps is to give more room for optimizers. But it seems like there is a clean implementation that is faster while also iterating over keys in insertion order.
Sure, maybe someone will come up with an even better optimization in the future that would break this, but at some point, you have to say, "OK, what we have is good enough, and it gives users a more predictable system."
If you are going to declare insertion order is non-deterministic, you really should make it non-deterministic by doing something like shuffling it. Otherwise, users will just inadvertently rely on whatever deterministic-but-unspecified-order the current implementation happens to provide.
If you think offering every variation of data structures is doing usability right, then C++ has done a lot of usability right. So much that its usability has suffered.
I would guess it probably is the same way in c++ but I've never spent more than 5 seconds considering which collection to use in Java. They have a decent amount of them but once you are aware of their existence they are incredibly intuitive
In Python v3.1 - v3.5 the order is deterministic within an execution, but non-deterministic across executions. This was implemented to avoid the possibility that a hacker could slow down dict key lookup/insertion by creating a large number of hash table collisions.
It never seemed to catch on with the bad guys, but that doesn't mean it wouldn't have happened if major languages hadn't implemented counter measures.
DDoS using DNS amplification was possible for decades but never happened. Then one day a bad guy got bored, wrote a proof-of-concept, and the rest is history.
I believe it's an implementation detail that in cpython 3.6, dict and odict are essentially equivalent now, but it's an implementation detail which is there to support an official feature of the language.
If your project can sustain a minimum version of (the as yet unreleased) Python 3.6, I don't see why you shouldn't begin to rely on this feature straightaway.
This just isn't the sort of thing that benefits from fucking with.
This is imposing a slight but not-insignificant open-ended technical debt on the whole of the Python ecosystem, for what seems to me to be very obscure benefits.
1. No need for metaclasses simply to make the class construction locals dict an ordered dict. ORM library devs will be happy.
2. No need to pass a list of tuples when you wanted to pass keyword arguments. That's a big win for usability.
> technical debt
The language spec will say that class construction locals and kwargs are insertion-ordered, but other dicts are ordered according to the whims of the interpreter. What is causing technical debt?
> This just isn't the sort of thing that benefits from [change].
The core devs say that status quo is the sane default. They're not just mucking about for the amusement of code churn.
> The language spec will say that class construction locals and kwargs are insertion-ordered, but other dicts are ordered according to the whims of the interpreter. What is causing technical debt?
The debt will accrue in the wider ecosystem. The spec doesn't guarantee dicts are ordered, but people will rely on it. Then, when a new dict implementation comes along, all code written based on a (then) correct assumption (but not based on the spec) suddenly stops working.
Only if someone writes (and shares widely, to be used) code that depends on it.
I've [ab]used the stability of dict order when you haven't modified the dict in the meantime, in one-off scripts.
But I wouldn't write code that depended on predicting dict key order though, that seems a bit too much to me.
----
For me, I feel that a Python syntax text should run unmodified on as many interpreters as possible, cPython, Pypy, Jython, Brython et. al., embedded, etc. There are lots of Python and Pythonish runtimes out there, beyond just what core dev produces.
If I can write code that works on most of those systems out-of-the-box that's a huge win from my point of view.
It should be clear that changing the semantics of the interpreter, even in subtle ways, puts a burden onto anyone hoping to maintain parity in their own runtime, yes?
> It should be clear that changing the [behavior] of the interpreter, even in subtle ways, puts a burden onto anyone hoping to maintain parity in their own runtime, yes?
No, that's not clear. Any changes to the language specification put a burden on the maintainers of interpreters/compilers. Any extra changes to the CPython interpreter do not. You argued that behavior of CPython becomes its own spec, beyond that of Python itself, but this does not seem to be the case. There are a bunch of optimizations that CPython does, like caching small ints, that programmers do not tend to rely on and do not form a sort of shadow-spec for other interpreters.
You might say that dict keys staying in insertion order is more useful than the other CPython idiosyncrasies and therefore will become a shadow-spec, unlike the others. That's possible, but let's consider the risk.
Pythonistas have done just fine without this behavior for a while. There's OrderedDict, but it was added in 2008, relatively late. If there had been lots of demand/usage, I'd have expected it to be added earlier. PEP 372 [0] indicates that use in a metaclass was one of the big motivations, which was only enabled the year before.
When a new Pythonista considers this issue, they are likely to search the internet for discussion and will find commentary on OrderedDict. I expect most will use OrderedDict when appropriate.
The riskiest group would be folks porting code from PHP and Ruby, or other languages that have a core mapping type that keeps insertion order according to the language spec. They'd just expect a Python dict to behave the same way and won't bother to read about it. To evaluate the impact of this change, we should ask how much porting from PHP/Ruby/etc. do we expect to occur, how likely is it that dict implementation will drop insertion ordering in the future, and how much benefit does this implementation provide.
Based on the explanation of the memory and compute savings, this new dict implementation sounds like a good idea.
Also, Ruby made the change in 2009 to the language spec. I'm not an expert Rubyist, but AFAICT it hasn't caused any problems for their community.
> No need for metaclasses simply to make the class construction locals dict an ordered dict. ORM library devs will be happy.
understatement with that. Saw someone trying to avoid using a metaclass for that, "had" to use descriptor objects which all used the same global counter to work out their order
Hey, guess what? Just don't use metaclasses. At all, for anything. Done.
GvR called it "The Killing Joke" for a reason.
This is a clear case of "negative productivity": People are working "harder than a cat trying to bury a turd on a marble floor" to pee in my soup. It makes me grumpy.
what, not using metaclasses in that case caused more "issues" than would of been caused by using metaclasses
multiple global values to track state, since can't bind the state to the class' creation, since the conversion to the final form of the class happened after its creation, vs being able to bind the state to the class' creation, since the conversion happens as the class is created
Knowing nothing more about this codebase than the above, I would fire everyone and make the CEO's nephew write it. It could hardly turn out worse and it would be a lot cheaper.
One of their expectations is that the attributes have their definition order stored somewhere. You may either use the decorator method, or the metaclass method
I don't get your concern. You can always treat an ordered collection as unordered without harm, aside from maybe not picking the most optimized solution had you considered order. Existing code will assume it's unordered, code written for 3.6+ can assume it's ordered.
That's discouraged for general dict usage, as non-CPython interpreters may choose a different implementation. Only kwargs and class creation dicts will be specified as ordered.
I'm imagining a scenario where I want to back-port Python 3 code to work with Python 2 and now must take account of it relying on ordering (that's not guaranteed by Python 2 semantics.)
Also it's just bad design. Ordering is an additional constraint so I feel that it should require additional code ("OrderedDict" vs "dict"). It's a deep semantic change for what seems to me to be bullshit superficial justification.
I tend to agree that this is something that could be confusing to rely on, especially if running tests against multiple version of Python (such as a previous version where this isn't the behavior).
Explicit is better than implicit.
There should be one-- and preferably only one --obvious way to do it.
In python 2 the ordering was deterministic. Cue lots of weird bugs when upgrading to python 3. Of course, it should never have been relied on... but it had been.
I'd prefer either a completely random order or a completely deterministic one - but please no take backs (people will rely on it).
These reliances usually aren't explicit, they just happen to work that way and it can be very hard to tease out the dependency unless you have a way to easily screw with iteration order.
Pretty much any unspecified but (because of implementation) reliable ordering will eventually end up with code relying on it, usually implicitly, e.g. some Go code broke in 1.5 (and the release notes noted it explicitly) when scheduling order changed[0] despite it always having been undefined. In fact, specifically to catch this kind of hidden dependencies the Go developers added limited scheduling randomisation to the race detection mode.
[0] from definition-ordering to last-definition-bias e.g. if you launched 3 goroutines 1, 2 and 3, in 1.4 the scheduler would just run them in that order, in 1.5 it would run 3 first followed by 1 and 2.
That's true, but to rely on these behaviors is to introduce bugs into your code, and we should be detecting and correcting them, not forgoing semantically clean optimizations because someone's buggy code will break.
Yeah, this is not a dict that has ordered keys. It's a dict that has that side effect.
I can't think of any reason why a set's members, or a dict's keys (which are just set members with associated values, conceptually) should be unordered. Yes, set membership is traditionally unordered, but we have ordered sets and dicts, so they must be desirable, and they're certainly coherent.
I just hope that if sets and dicts become ordered by spec, that some day someone doesn't feel compelled to implement unordered versions.
> I just hope that if sets and dicts become ordered by spec, that some day someone doesn't feel compelled to implement unordered versions.
Unspecified-order minimizes constraints on future implementations. If someone finds a useful time or space optimization (for particular workloads, not necessarily in big-O asymptotic terms) that isn't consistent with the specified ordering, that might be a compelling reason to implement a version without the ordering constraint. (But, if its not the best choice for general use, that's true even if the default implementation isn't ordered, since its possible that the optimization will still be unsuitable as the default.)
And, obviously, even if the base case has one ordering, there's all kinds of reasons you might need to implement a version with a different ordering.
"Ordered sets" are coherent in the sense that you can make a data structure like a set and then define an order over its members, and call it an ordered set, but "ordered sets" are not sets.
There are certain purposes sets are natural for. When you introduce ordering, you will naturally be asked to add methods for querying or perhaps modifying that ordering in various ways. Many methods later, it's worth asking why you are calling this data structure a set, and what its actual purpose is supposed to be.
I interpret it differently: the ordering is a side effect, and the positive aspect is that it's easier to debug conveniently, not a feature you should rely on version to version.
I am not very sure this is tat bad. If code breaks because dict is no longer ordered, people should have used an ordered one from the start. It's a happy accident it is faster and ordered and not bad because whoever relies on dicts bein unordered (when sometimes they aren't) deserves whatever breakage they get.
This is very similar issue that made migration to Python 3 so painful.
A lot of the code had broken unicode from the start. The difference was that it worked in python 2 most of the time and failed on special cases, while python 3 was stricter and mixing bytes with string always caused it to crash.
Once dict() is ordered and people will use that property (even unknowingly) if the behavior changes again in the future, a lot of their code will be broken.
It may be similar in that it allows for relying on consistent behavior when one shouldn't, but fixing it seems a lot easier than decades of not thinking about Unicode.
Here's what Guido wrote about some of the issues that have been raised here:
"""
I've been asked about this. Here's my opinion on the letter of the law in 3.6:
- keyword args are ordered
- the namespace passed to a metaclass is ordered by definition order
- ditto for the class __dict__
A compliant implementation may ensure the above three requirements
either by making all dicts ordered, or by providing a custom dict
subclass (e.g. OrderedDict) in those three cases.
I'd like to handwave on the ordering of all other dicts. Yes, in
CPython 3.6 and in PyPy they are all ordered, but it's an
implementation detail. I don't want to force all other
implementations to follow suit. I also don't want too many people
start depending on this, since their code will break in 3.5. (Code
that needs to depend on the ordering of keyword args or class
attributes should be relatively uncommon; but people will start to
depend on the ordering of all dicts all too easily. I want to remind
them that they are taking a risk, and their code won't be backwards
compatible.)
Put another way, "We're trying out a smaller, faster dict implementation that has the side-effect of being ordered. We would like to be able to change our mind in the future, except for the three cases listed above where we really want to guarantee ordering."
this is fine, but with PYTHONHASHSEED no longer impacting dictionary ordering, we need a new way to test our applications to ensure they aren't relying on dictionary ordering, without having to replace all dict / {} instances with "myapplication.utils.patchable_dict". What solution has been arrived at for this use case? (there IS a solution, right?)
I'm not quite sure what you're worried about. Are you worried someone might accidentally change the interpreter when pushing to production? If so, I'd prefer to fix the testing policy to ensure that all production environments are used as test environments.
If you're worried about making sure your library/application is correct for older versions? If so, test it under Python v3.5 as well.
Or you could shuffle your test data to change the insertion order and run the test again.
relying upon the ordering of a dictionary is wrong, even with py3.6. Lots of people test for that by using an explicitly random dictionary ordering per test run via PYTHONHASHSEED (way more feasible than "shuffling test data" since not all dictionary use is that simplistic. sys.modules is a dict, for example, how do you "shuffle" that?) That apparently accidental feature is being removed.
it means code will suddenly break when you move it to an interpreter that does not have this implementation detail. This kind of breakage is really easy in the area class instrumentation libraries where order of class attributes affects something. Every non-cPython will implement this anyway, unwitting reliance upon it will be widespread, and there really won't be much of a "problem", other than they really should make this behavior official someday.
I'm one of those weirdos who wants to continue to use Python 2 for the foreseeable future, so to me this is just introducing a way to break backwards compatibility all the way back to 2 for the benefit of a similar but ever-more-incompatible language that I'll never use.
I'm not trying to be negative, I just want to let you know that people like me are out there.
The optimization is awesome but IMHO the key ordering "benefit" in the implementation but not the spec is a so-so move:
It can cause some future bugs in the code assuming it is there. Some languages like Go added some key ordering randomization to maps to be sure to avoid people counting on any specific key order.
It's not so-so, it's a terrible idea. At a minimum, the implementation should cough up keys in (say) reverse order, or some cheap to implement different order than insertion order.
Two years down the line, the amount of code that depends on insertion order will make it impossible to do further optimization.
[Many years ago I wrote a platform with an unintended ordering scheme in its dictionary. The documentation stated in bold that order was not deterministic and could change. Users started reporting "bugs" when I changed the order. That sucked.]
It seems strange that that's what you'd be worried about. Your users found bugs in their own code. That's GOOD, isn't it? Does it matter that they blame you?
Like if I'm using pointers after freeing them and just happen to get usable data back... then you go an change it so that memory is reallocated and my code breaks. Yeah, it is a bug. In my code. It's pretty clear that I should not have been doing that. If I'm to stubborn or stupid to figure that out, then I'm really just a shit programmer.
EDIT: On rereading, I see that the parent and GP aren't specifically talking about the kwarg-ordering benefit, but the general dict-ordering benefit. I would be surprised if dict did not become a synonym for OrderedDict in the spec in the near future.
The only potential disadvantage of dict==odict is performance, as anyone relying on the current arbitrarily-ordered dict behavior by definition isn't worried about the underlying implementation moving to a deterministic-by-insertion-order dictionary.
I've never run into a case where the difference in performance between Dict and OrderedDict was important.
I have dealt with legacy code (especially code that needs to interact with JavaScript libraries) where strict-ordering was critical and it was great that at the time I was using Ruby 1.9 in which dicts are explicitly ordered.
I agree that waffling about it is bad, but I think ordered is a better default than unordered. All this stuff about deliberately trying to shuffle order to catch bugs just says to me that most code is a lot simpler to get right if dicts are ordered-by-default.
I've seen some python 2 code "made use" of the deterministic ordering of dict. Migrating to python 3 sucked, it took a while to realize this was the reason for some failing tests.
As long as the language specification is updated to specify ordered keys in dicts then this is good. The big complaint about key ordering I remember hearing is that it was basically a side effect of the CPython implementation (back in the 2.X days), but not part of the spec.
If the spec won't be updated to specify that dict keys must be ordered, then I agree with you.
EDIT: ganduG clarified below that this is just an implementation change. No bueno.
If the ordering of the keys is not in the spec, a programmer should not assume the ordering of the keys. If none of the training material says or implies that the keys will always be ordered, why would a programmer?
The new and improved algorithm happens to preserve ordering. Future algorithms might not. The dictionary data structure does not specify order preservation so that, in the future, a better algorithm that doesn't have this same side effect could be used.
> a programmer should not assume the ordering of the keys
The Python community has many inexperienced programmers, even non-technical people. Particularly, it's so beginner friendly by making it very, very easy to get started with the REPL, etc. A lot of these programmers aren't even going to look at the docs and even if they do, might not understand them.
Programmers should not try to access memory beyond the end of an array. They should not attempt to add ints and strings. They should not give a function a float when it expects a struct.
But they do. So we try to design languages in such a way as to minimise the likelihood of these mistakes going unnoticed.
Designing a language to minimize the likelihood of these mistakes is a motivation for language design, but usually far from the primary.
"Programmers should not try to access memory beyond the end of an array"
^ Right, and when they do, it's called a logical bug, something to be fixed. A programmer relying on the incidental but unspecified behavior of a function would be making a similar mistake.
But, we learned from C that the possibility of trying to access arbitrary parts of memory means that programs will end up trying to access parts they shouldn't. So, higher level languages like Python solve the problem by having memory-safe types like lists, which preclude the possibility of making this mistake.
Similarly, if we randomise the ordering of keys in the dict then it's impossible for a programmer to make the mistake (consciously or unconsciously) of relying on ordering.
At the cost of an additional processing step to scramble the keys. The balance between safeguarding ignorant or overactive programmers against the performance of a critical builtin type is not going to tip in that direction.
The whole reason we can even have this discussion is because a new algorithm was developed to give us the increased performance in the first place.
why would a programmer? Because the programmer would just run the code and check what different methods return as result. Then if it looks like the ordered keys, said programmer will rely on this behavior.
For instance, I'm writing some javascript recently. I try a little array in the console, if Array.keys() returns the orders keys, I will assume it is, and would check the docs (where?) only if I had a problem. Something unstable should not look stable. I agree with the Golang way, make it look like what it is by the specs.
I have trouble understanding a programmer who takes a couple samples of a method's behavior and feels confident enough to trust that as the actual behavior of the function. At least guard you're flippant assumption with a check or something!
The problem is not to add some checks (asserts? No, just kidding), the problem is that you're quickly writing a little thing for a demo out an experiment, and faster than you'd expect these ten lines of innocuous code have grown in a gigantic ball of mud upon which many businesses are built.
To me the less worse way to handle this is a softened TDD, where a consequent part of the code is covered (described, checked, structured) by a suite of reasonably atomic automated tests.
"try a little array in the console, if Array.keys() returns the orders keys, I will assume it is"
That may work for some toy script you write for your own use, but must be avoided if you intend to engineer a program.
Certainly, that kind of thinking will lead to very bad results if you ever do low-level multi-threaded programming or try to write portable code.
For multi-threaded programming, "is this call thread-safe?" is not something answered in a test, and certainly not in a quick test.
For portable programming, you simply cannot run the test on all possible systems.
In general, when you program like that, you will not only rely on implementation details that may change (for example, this may change for arrays with lots of entries) but also forget to handle error conditions.
It is very easy to inadvertently add dependencies on key order. For example, if you are generating HTML, you might have something like:
print("<a " + " ".join(k+"=\""+v+"\"" for (k,v) in d.items()) + "/>")
And d would be a dict of attributes for the anchor element. The code will produce different output depending on computer which is not what you want because it can cause subtle problems for other systems expecting a certain output.
Several compiler bugs have been of this kind so it is not only newbie programmers that make these mistakes.
It shouldn't affect the way a browser renders the HTML. But there are many possible scenarios in which the order is relevant for some other reason.
Suppose you have another system monitoring the web page generated by the above code. Like a cdn or something. That system would think the web page is constantly updated as the code of the HTML page would be different each time it accessed it, leading to lots of redundant work if it has to update caches, check outbound links and so on.
How many programmers have you worked with? (People will assume all sorts of things that are not true if it makes their job slightly easier, or even harder)
I haven't really followed the discussion around the decision, so perhaps I am missing something, but my initial feeling is that either ordering should be required by the spec and implemented, or the ordering should be randomised.
But, that is the outcome of about 1 minute's thought :)
It needs to be realized that this dict implementation landed literally yesterday. There are space benefits and it simplifies the language in three places where we are adding ordered mapping guarantees (kwargs, namespace passed to metaclasses, and cls.__dict__, all of which were planned to be ordered prior to this dict implementation coming into the language).
Before we bake the requirement of dictionaries being ordered into the language spec and require all current and future Python implementations to support it for Python 3.6 onwards -- remember, Python is 26 years old -- we want to live with the dict implementation for a few releases first.
That's an argument against ever adding ANYTHING to a new language!
"Oh, we have this neat feature in mind, but because code that uses it wont work on older versions of the compiler/interpreter, we can't use it!"
Python code that will depend on the ordering will either have to specify Python 3.6 as the minimum version, or just use OrderedDict, which is presumably isn't going anywhere.
Also, it's rare that anything "depends" on the ordering of dictionary keys, but it's frequently "nice to have". For instance, it means that if you output JSON, all the keys are in a sensible order. This matters not at all for other computers reading the JSON (as the JSON spec explicitly says that ordering of keys is irrelevant), but it's lovely for developers debugging the program.
I am trying to imagine a bug that would happen with an ordered dictionary which would not happen on an unordered dictionary. An unordered dictionary may return it ordered by chance.
Parent said for bugs happening in the other direction -- written for 3.6, run on earlier versions, so it would be the inverse happening (ordered assumption not holding).
I suspect that any procedure relying on dictionary order can experience all the classes of bugs associated with relying on mutated state. For example, if procedure P1 relies on dictionary order and calls another procedure P2 on a dictionary, might get a new dictionary back in a case where P2 is written in a functional rather than imperative style.
Essentially, relying on ordering requires tracking identities rather than values. Which suggests a second bug. What does it mean now mean for two dictionaries to be equal?
Really? It's in the freaking name. Code is written that depends on dictionary being ordered. Works in CPython 3.6, fails on earlier CPython and other implementations of Python.
In 3.6, this will be ordered. Python creates a dict out of the keywords arguments, so before 3.6, this creates the old hash-ordered dict which is covered to an OrderedDict.
But, as the other commenter mentioned, a list of 2-tuples is the safe bet.
I find myself doing that and similar things (objects with `key` and `value` attributes) to deal with JSON non-ordering a lot. I can definitely see value in preserving keyword argument order
The problem isn't that .keys() isn't returning them in order. The problem is that the kwargs (a=1, b=2) aren't passed to OrderedDict() in order, so the constructor has no way of knowing which one should be first.
But as pointed out elsewhere, kwargs aren't guaranteed to be in order, so by the time the OrderedDict is built it's already wrong. It seems the change we're discussing was primarily motivated by a desire to fix that specific problem.
The point OP is making is that none of these statements will guarantee the order of insertion, since OrderedDict must take the variadic kwargs input as a dict, which is inherently unordered. You have to use a list of tuples.
Python should have a "grumpy mode" which causes dict-keyword order to be randomized, and other deliberate attempts to break code where people are doing the wrong thing :P
The equality test for dict objects will be unchanged. Note that it's not as if they said "Oh gee let's just alias dict() to OrderedDict() and call it a day".
Because this is an implementation detail. The language spec doesn't enforce this.
The real reason they did this was because of the performance gains from the approach - the ordering is just a nice side effect. Its an idea originally from PyPy afaik.
`OrderedDict` is now just a thin wrapper around `dict`.
i.e. if you want your code to be portable among different Python implementations then you should still use `OrderedDict`.
Do you have a link to info about why it's faster? My initial reaction was that the extra indirection would slow it down, but of course that's based on nothing but general knowledge and two seconds' thought.
> My initial reaction was that the extra indirection would slow it down, but of course that's based on nothing but general knowledge and two seconds' thought.
There are already indirections in the current implementation.
The new dict has much better memory locality: the "actual data" (24 bytes on 64b platforms) is stored in a dense array instead of the former sparse array, and the sparse array only stores indices so it can be made much more compact (especially as it can switch the index size depending on the size of the compact array, for small dicts it'll store byte indices). So not only is the new dict smaller, the way the data is laid out is better fetchable.
The original point[0] was actually iteration speed: since the data is stored in a dense array, that can be iterated directly and efficiencly instead of having to iterate a sparse array and skip the random empty entries (leading to awful branch predictability, whether a given entry of the sparse array is empty is essentially random if the hash is any good).
> `OrderedDict` is now just a thin wrapper around `dict`.
It is not. odictobject.c has not been touched at all yet. Also, OrderedDict defines several methods that ordinary dict does not (and unlikely ever will).
edit: it seems like this is exactly what OrderedDict does, I presumed it's ordered by key, like std::map (red-black tree) vs. std::unordered_map (hash table) is in C++.
No, this new dict implementation preserves insertion order, it's not ordered by value of the key.
The motivation is to preserve the order of kwargs.
> An OrderedDict is a dict that remembers the order that keys were first inserted. If a new entry overwrites an existing entry, the original insertion position is left unchanged. Deleting an entry and reinserting it will move it to the end.
Any benchmarks yet?
I'm curious about the cost of the indirection and the 2nd array vs the smaller sparse array (easier cachable), and thinking of doing the same for cperl hashes.
splits also need to realloc now twice, which might have some costs. still the run-time advantage should beat all new costs.
What's the point of the compact dict? Does having all the keys together outweigh the extra indirection? That would surprise me. Can someone help explain why that would be the case?
It all depends on the underlying technology. Yes, an indirection costs more. But in the same time, the first level array can be much smaller and it even can be that both arrays together are smaller than the old array before, because it had to had gaps. The new compact array does not need to have gaps.
It of course depends on many factors, like how is the filling factor.
But, preserving memory can potentially also preserve transfers between processor and main memory. In old processors, pure clock cycles and the complexity of the opcodes where the main factor when it came to speed. Today, the complexity of opcodes (for example indirection) are less and less interesting. The amount of cache misses and how much memory must be transferred between processor and memory chips are deciding.
Suppose you want to make something like an XML generator helper function:
def start(tag_, **kwargs):
write("<" + escape(tag_))
if kwargs:
for k, v in kwargs.item():
write(" " + escape(k) + "=" + quote_escape(v))
write(">")
(Apply hand-wavying to get the correct code.) This might be called as:
start("abc", x="1.0", y="2.0")
but generate the output
<abc y="2.0" x="1.0">
when you want it to be:
<abc x="1.0" y="2.0">
The output order depends on the Python hash implementation, which (in modern Pythons) is randomly selected during startup. For an API which preserves order, you must currently either pass in the pairs in iterable order, like:
start("abc", (("x", "1.0"), ("y", "2.0")))
or switch the API to pass in a dictionary-like object instead of kwargs, then switch to an OrderedDict (which must also be initialized with pairs in iterable order).
In 3.6, there's no need for that -- kwargs will preserve the keyword parameter order.
They are equivalent. That doesn't mean that all tools will use XML semantics to test for equivalence.
A testing tool might require that the output is byte-for-byte equivalent to a known good output. Python's pseudorandomly determined hash function won't preserve that order across multiple runs.
XML itself doesn't no, humans may, and some crappy crummy tools sadly do as well[0].
Round-tripping ordering is also convenient for automated processing tools as well (e.g. add/remove parameters without introducing extraneous unnecessary changes)
[0] some also do care about namespace aliases/prefixes, which is a bit of a shock the first time they choke on your perfectly valid and namespaced XML.
This is why it will be defacto stabilized quickly (because it's so useful).
I'm sure the python developers have considered that, since they explicitly chose to preserve order instead of other small non-order preserving optimizations that could have been done with the new representation.
This news has spread and seems to cause a great deal of confusion. I wish the headline had been "Python dict faster [because of some subtle details ... don't look behind the curtain]"
Across processes perhaps? But hash(a) == hash(a) and hash(b) == hash(b) so a dictionary that with a and b inserted will yield the same order every time.
The order of the keys in a dictionary was merely guaranteed to be consistent for a given instance: that is
d1 is d2 ⇒ list(d1) == list(d2)
But the order is not guaranteed to be consistent between two equal dictionaries, in that
d1 == d2 ⇏ list(d1) == list(d2)
This is because the history of the dictionary can affect its order, not just its contents. For example,
x = {-1: (), -2: ()}
y = {-2: (), -1: ()}
list(x) != list(y)
This remains true, but now the particular order produced from a given list of insert and delete operations is well-defined, rather than arbitrary.
It's worth noting that previously the order of two dictionaries with the same history was also the same (but may vary between program runs), but that was not guaranteed.
It's not actually clear if this is still regarded as an implementation detail, but I expect it will be effectively impossible to stop this becoming a key assumption of many libraries, so would expect any attempt to require OrderedDict (except for pre-3.6 compatibility) will fail.
Yes across processes. You don't want to change (reseed) the hash function after the dictionary is created. (At least then you need to rehash everything.)
Deterministic means it is completely determined by the parameters, so it doesn't change from one call to another. This seems to be exactly what you are naming "stable" here. If it outputed the keys in a known order, it would be predictable.
Yet, the GP is just complaining about two different and widespread definitions of "random", so, whatever. It would be great to have a contexts dictionary we could use to resolve such things.
If you've disabled the StartCom CA due to concerns about lack of transparency[0] and are therefore unable to view pages like this one, you can always click the "web" link above and then view the cached page on Google.
If you care enough to disable one specific CA, then you likely know how caches work and how to find the alternative. Did you plan to post a comment like this on every single HN article which uses StartCom's cert?
OK, so I was also raising awareness of the issue, guilty. And no, this was the only time I've mentioned this on HN, and I'll never mention it again here, you have my word. Had no fucking idea it would draw so much ire (most downvoted comment ever here in half a decade). Was trying to be helpful. Sorry for pissing you all off so much.
1. it's not in the spec
2. you shouldn't rely on it
3. python can't figure out if you're relying on it, so no error will be raised
4. subtle bugs are sure to be introduced by people who "know" this "feature" exists and use it.
Regardless of the cool implementation details, this post shouldn't advertise that "keywords become ordered" and "A nice "side effect" of compact dict is that the dictionary now preserves the insertion order".
Ergo... we built this awesome thing that you'd love to use but you can't. Don't use it or you'll be a bad programmer who doesn't read specs!
(Just use addict or OrderedDict.)