Hacker News new | past | comments | ask | show | jobs | submit login
Cocoa's NSMutableArray changes implementation based on the size of the dataset (ridiculousfish.com)
58 points by macrael on April 6, 2011 | hide | past | favorite | 16 comments



I predict 262140 for the inflection point.

Ok, not so much predict, as read in the CFArray.c sources from Darwin.

http://www.google.com/codesearch/p?hl=en#pFm0LxzAWvs/darwins...

I think people forget that huge portions of OS X are available under the APSL. It saved me hours of debug time when I was writing a PKCS#8 certificate that openssl was happy with, but Keychain could not import. (openssl will happily use encryption methods that are probably not part of PKCS#8. But the RFC is cloudy.)


That's from CF-Lite, though, which isn't necessarily quite the same implementation as the full CoreFoundation. It's probably similar, though. :)


NSArray/NSMutableArray are class clusters - meaning that upon allocation/initialization they choose between many classes (and it might swizzle the class pointer to be of type NSArray/NSMutableArray, I can't remember).

What's interesting to me is that this uses the standard Objective-C [[class alloc] init] mechanism, which is more intuitive than using a nonstandard class method (because in most other languages new will only allocate one class). When you look at Apple's (NeXTStep's) APIs, there's a lot of evidence of thought in the design process there.


> because in most other languages new will only allocate one class

Now that's simply not true.

In C++-derived languages it will (and mandates that a new object be created to boot), and C++-derived languages are the most common for Enterprise Programming, but that's a far cry from "most other languages".


There's nothing special about this. I've written numerous classes in C++/C# that return a single object to the creator but instantiate or swap between different impl classes based on usage patterns.


What I'm saying is that you can't do something like this in C++-derived OO languages

    class Test
    {
        Test( int count ) {
            if ( count > 5000 ) return new Implementation();
            return new OtherImplementation();
        }
    }
you'd have to use a class method, whereas you can do that with the standard alloc/init in Objective-C.


This is trivially done in C++ with a private implementation. It doesn't matter that you can only allocate one class, that class can allocate whatever it wants to in turn.


This data structure abstraction can leak if you're not careful with how your implementation matches the intended usage.

I remember ASP.NET used a weird collection class to store javascript snippets that were to be included in the page being generated. The class internally used an array and then switched to a hashtable once the number of items in the collection went beyond some pre-tuned limit. As a result, if you tried to register too many javascript blocks to be included on the page, you found that they got included out of order, because the internal storage was now a hashtable with a non-deterministic order for keys.

I spent an entire night debugging this issue.


I'm don't get why this is surprising. NSArray takes a static dataset, and cannot be changed. NSMutableArray, by definition, can add or delete items. Wouldn't it make sense to store an array that is huge differently than one that is of small -> medium size? And since NSMutableArray must be able to change, it can't necessarily allocate the correct memory space upon alloc.


it is surprising because most people probably expect NSArray to just work like a plain array (see the naive C implementation)


I think the ability to combine C code in Obj-C allows for the usage of a huge legacy code base. However, I wouldn't be surprised if there was a design decision at some point that deliberately made it harder to port NextStep (NS) objects back to C. Creates lock-in...


Why'd I get down-voted for this?


yes, so there'd be NSArray and NSImmutableArray


I love the graphs, but the article only presents data that would back up it's conclusion. I want to see the graph of access times and memory usage as a function of data size. Those are the two factors that are definitely going to be worse for CFArray over his naïve C arrays.


Python dicts have a similar behavior. There is a version optimized for string-only keys. As soon as you insert a non-string key it will switch to the slower generic version.


Looked interesting but was the writing was horrible.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: