I don't understand how you write a post this long and meandering, talking about how Google will select a 64-bit processor, without even noting the obvious, fundamental fact that Apple's new processor is faster because it's a new ARM architecture, not because it's 64-bit. It would probably be faster if it were 32-bit. Doubling the size of your pointers and/or machine words certainly does not make typical code (passing around 32-bit integers - or smaller - most of the time) faster. Architectural improvements make code faster, and the arch used in the A7 is much improved.
EDIT: Does the new iPhone even have 2GB of addressable memory? Does it benefit from the 64-bit address space at all? Apple's specs don't say how much RAM the thing has, and whether it has any VRAM. What else would the address space be used for? I imagine you could map the whole internal flash to memory, but that seems like an incredibly awful idea to me unless you want random addressing errors to corrupt storage (maybe it'd be fine if only ring 0 had access)
It doesn't mater if it has over 4GB of physical memory. What matters is that it has 64bit address space. Memory is abstracted away from the app by the OS and for many applications 32bits is too small. Example of such apps include A) Games that use a lot of textures (compressed textures) B) Image editing C) Video editing D) other more demanding applications.
The RAGE approach to textures is an interesting point here, I suppose. I can see how 64-bit address space is a big improvement there.
EDIT: However, RAGE style textures do not really benefit in any way from the CPU being 64-bit. They benefit from virtualization of textures, which is usually done on the GPU - and in fact I believe there are OpenGL extensions that offer the ability to create a large 'virtual' texture that is not resident in video memory. So I don't actually believe that 64-bit address space is any better for RAGE (especially because random stalls from disk paging of textures are not something a game developer would tolerate).
However, I should note that compressed textures don't really come into the picture. You're not going to map a compressed texture into memory such that you access uncompressed pixels by reading/writing at a given address with compression happening behind the scenes - at least, I've never seen a real world shipped scheme for that in graphics. (I think the XBox 360's memory controller might have done something like that for compressed audio, though...)
EDIT 2: See this talk by id and nvidia that explains how virtual textures in RAGE are applied:
http://www.nvidia.com/content/GTC-2010/pdfs/2152_GTC2010.pdf
Note that instead of the kind of stall you'd get when paging data from disk to memory, the optimal behavior is that it uses blurry 'low resolution' data until high resolution pages are available. Totally different than 64-bit virtual memory you use on a CPU.
My main question is whether it would actually be realistic to do that kind of demand paging in most use cases. Do you really have the ability to create your own fault handlers in user mode? Otherwise, all you can do is page data in from disk, which is just that 'map storage into memory' use case I mentioned earlier.
Reduced fragmentation is a plus but getting it at the cost of doubling every pointer's size is not necessarily a huge win.
It is obvious that smartphones will break the 4GB limit fairly soon (< 5 years), though I think right now the 5S has 1GB.
They added more registers and cleaned up some cruft in the 64-bit ISA, so that is where the speed is coming from. You also get some speed from the extra bits even if you aren't addressing 4+ GB of memory (e.g. 64 bit longs are native).
It doesn't make much sense to clean up the architecture without adding 64-bit at the same time. X86 -> X64 followed the same path.
Yeah, I agree that it made sense to introduce 64-bit support when updating the architecture, it's just crazy seeing people argue that the 64-bitness is what made things faster.
Do 64-bit machine words actually speed things up on average? I would really assume that if an application is designed around 32-bit integers, doubling the size of every register is just going to waste resources.
It doesn't matter whether you call it "new 64-bit A7 architecture" or "4th gen Core i5" architecture. That's just a marketing label to attach to a bundle of improvements. It doesn't have to be an accurate reflection of what really makes the processor faster (remember the bit race with consoles?)
The bottom line point is that Apple delivered on the peformance front.
I don't think this is supported by the content of his post, though, where he starts speculating about how Google will get '64-bit processors', as if that's the thing that matters (it's not).
Furthermore, it's repurposing an EXISTING label (64-bit has an established meaning for desktop CPUs) to mean something else. I think that's silly.
Absolutely horrendous post that uses the trope of presenting a wide array of opinions as if they all come from a singular. ARMv8 is pretty cool, which is why the entire industry is moving to it, and were before this.
EDIT: Does the new iPhone even have 2GB of addressable memory? Does it benefit from the 64-bit address space at all? Apple's specs don't say how much RAM the thing has, and whether it has any VRAM. What else would the address space be used for? I imagine you could map the whole internal flash to memory, but that seems like an incredibly awful idea to me unless you want random addressing errors to corrupt storage (maybe it'd be fine if only ring 0 had access)