Memristors promise to flatten part of the memory hierarchy, merging non-volatile storage and RAM. But the interesting question to me as a systems programmer is whether they will ever merge cache with everything else too. This question is really important because if there were no cache there would no longer be any benefit to compactness and locality in memory -- data structures distributed across large areas of the address space would be just as efficient as ones that are well-localized.
To be honest this idea worries me a little. The performance advantage of small and local data structures is one of the main natural forces that encourages software to be small and simple. Software tends to grow over time, both in size and complexity, especially when more people are involved. Unchecked I fear that there would be no natural counterbalance to this tendency.
Put another way, I'm actually a bit glad that Eclipse is horribly slow, because it's an easy-to-observe symptom of the fact that it's a horribly complex stack of software. If software like Eclipse could get by with acceptable performance because it had fast access to vast swaths of memory, it would be that much harder for an upstart competitor in this space (like Light Table) to be disruptive, because Eclipise would perform well enough that competing on speed doesn't impress.
I'm probably oversimplifying a bit, but I do think it generally good for programmers when simpler software also has better performance naturally.
But even if Memristors could be as fast as SRAM, could you out enough of it close enough to the CPU that it truly flattens the memory hierarchy completely? This is where I hit the limits of my knowledge of computer architecture.
Basically, we're never getting rid of the need for caches. You can't put enough of it close enough to the CPU because the muxes needed to select the data you need have too many FO4s[1] of fanning, and because you're limited by the speed of light, and the arrays you need are too big to get all of them close enough to the processor.
Another problem is write endurance. RRAM is expected to have be able to take many orders of magnitude more writes before failing than Flash, but its still limited. The fact that you have layers of SRAM cache between it and the processor buffering against repeated writes to the same location is why I'd still be comfortable using it as main memory. Otherwise you could get an infinite loop that could actually damage the memory.
Quite true, but far out speculation about the future of computer architectures doesn't really seem germane to talking about a new form of fast non-volatile memory.
It's just a popular sweet-nothing people have grown fond of parroting whenever memristors come up. It is impossible to really refute, because it speculates about the possibility of an architecture we haven't yet imagined. That plus how cool "brand new architectures, completely new ways of computing!" sounds to the layman, means you hear it every time the word "memristor" hits the headlines.
The joke, of course, is that (if memory serves) modern computers actually use the Harvard architecture- not Von Neumann.
When someone writes about the end of the Von Neumann architecture, I take it as dreaming that the poster's favorite language will someday be faster than C.
"You do realize that John von Neumann spent the last 10 years of his life singlehandedly developing a theory of computing based on cellular automata? The computer you're reading this blog rant on was his frigging prototype! He was going to throw it out and make a better one! And then he died of cancer, just like my brother Dave did, just like so many people with so much more to give and so much more life to live. And we're not making headway on cancer, either, because our computers and languages are such miserable crap."
With memristors it is not even remotely "far out". First, we can substantially modify the von neumann architecture by dramatically improving the capabilities of reconfigurable FPGA type devices, allowing them to run at the speeds of custom ASICs and reconfigure with a delay of only a few clock ticks. That alone is pretty astounding. Second, memristors could be used directly as logic and memory, making it possible to programatically transform blocks of memory into circuitry and back to memory, which would have implications beyond our current reckoning. And all of this is possible in the near-term within the next few decades.
Wow, sorry, that sentence I constructed there was really confusing. There are really two separate effects. On the one hand, you can only make a cell of this memory so small, so you can't pack it as densely as you'd like. The other is that you have one cell of memory you'd like to access, and an output line you'd like the data from it to go out on. And you don't want the data from any of the other N billion cells of memory to go out on that line. There are actually some structures you use to retrieve that particular bit, but the more bits you're selecting between the more delay is introduced. In most current CPU designs this is a actually a bigger deal than speed of light, at least for the various levels of on-chip cache.
I don't quite understand the argument - if there's no problem with increased size, why should we worry about size? People from the 1960s would be shocked to hear how large our software is today already, but since we have the capacity to run it successfully, I'm glad that we've freed up programmers to work with more productivity.
As far as complexity: there will always be costs to complexity in terms of agility and reliability. And using less compact data structures often leads to less complex code, anyway.
> As far as complexity: there will always be costs to complexity in terms of agility and reliability.
Yes, but I worry about a world where only the programmers bear the cost of the complexity. If reduced complexity does not benefit the user, it becomes harder to justify time spent on reducing complexity.
> And using less compact data structures often leads to less complex code, anyway.
I disagree, unless you're talking about extreme cases like fancy structure packing. More compact data structures mean less program state. Less program state means less complicated state changes and less redundancy.
Litmus test: if your program has entered a bad state, how much data do you have to inspect to discover what the inconsistency is? And how much code do you have to inspect to figure out how it happened?
> a world where only the programmers bear the cost of the complexity.
It is the users who pay the ultimate price of complexity. Unreliable, expensive software with long release cycles cost them money, time and happiness. A company that does not realize this is doomed. It doesn't matter if the unreliable software runs fast.
As a counter argument, when you are not restricted by performance, you can utilize your memory more effectively to remove complexity from your software.
The memory hierarchy can never completely flatten. Information takes physical space to store and the speed of light is finite. So the more information, the further things are apart, and the longer access times will take.
That being said this theoretical limit might not be the main bottleneck right now. Parallelism can also bypass this limit since not all information needs to be processed at the same location.
> The memory hierarchy can never completely flatten. Information takes physical space to store and the speed of light is finite. So the more information, the further things are apart, and the longer access times will take.
Not necessarily. Memristors were shown to be capable of performing logical operations [1]. Given that, one can imagine computations migrating through the circuit to stay physically as close as possible to the data thereby eliminating the need for caches.
Since we can use the same sort of MOSFETs found in SRAM to perform logical operations you'd could equally well imagine computations migrating out into memory even without memristors. The problem is more one of computational model than physical process. People are certainly looking at things part-way there like grid-computing or whatever, but learning how to use these things efficiently is a hard computer science problem.
Micron built DIMM modules with computation on them as an experiment in this. They of course simply put a CPU inside the DIMM controller (imagine each of your DIMMs was a GUMSTIX rather than a regular DIMM) but the concept they were shooting for was the same. I did some preliminary VHDL work on a 'memory' controller which was basically a DIMM/CPU bus for a psuedo cluster of these things.
The bottom line was that it was buildable but there were a lot of challenges with the way compilers and what not generated code (or not) for these sorts of systems. I think a couple of projects at Umich and UCB got funded but I don't know if they produced any results.
I guess it's similar in idea to something like the Cell[1] processor with 'processing elements'/cores sitting on a blob of local memory/cache, with very NUMA-like access to everything else. Scaling the ram per unit / # units is probably where the complicated lies. What sort of cluster sizes were you thinking of?
I dunno what the current capabilities are of compilers for these cell-like architectures, but I recall that there were (still?) serious problems getting good performance out of the PS3 due to the difficulty of it's programming model. I'd hope they've advanced the state of the art some though.
My (entirely naive) brain screams MPI, but I don't know how well that would work in practice.
Existence of cache comes from trade-off between speed and cost, so it will quite likely never disappear. Notice how there are many layers of cache in current systems: HDD <-> HDD Cache <-> RAM <-> L3 ... <-> Processor registers.
If you get your mass storage as fast as registers then perhaps .
As awesome as RRAM is, and as much as I've boosted it in the past[1], there are still lots of things that could go wrong. Unless its practical to manufacture in large quantities, this won't go anywhere for instance.
I'm not sure why you left out R. Stanley Williams in your post there. Leon Chua postulated the existence of a "4th" electrical component but didn't make any progress on it himself.
Listen to Williams' main talk on YouTube. I'm pretty sure they've sorted out how to manufacture memristor-based RAM -- and are licensing the process out already.
Is there any info about discrete memristors, will there ever be such things, or will all products be only integrated solutions? By discrete memristor I mean part comparable to an individual resistor or transistor. Something elementary, which the electronics-geek in me could play with.
According to Williams, you can't make big memristors - the effect is due to micro mechanisms. Given that, it doesn't make much sense to make "single" memristors.
Memristor-like components have existed since 1960 (http://en.wikipedia.org/wiki/ADALINE). There is no demand for a discrete component so I doubt anyone would start making them. You could make an analogous device with a microcontroller, but that kind of defeats the purpose. Anyway its an elementary circuit element, so its behavior is fully described and not super exciting.
Weren't those super-expensive? You know, the reason resistors are basic elements is among others that they're cheap..
I think a simple SiO layer, which is what this newest memristor tech is based on, could be super cheap.
Even if you can't build "big memristors", you could just parallel thousands of them in a single package. Then, they would start being approachable.
Besides, tiny capacitance hasn't stopped anyone from using the likes of varicaps, and huge inductance never stopped anyone from using an inductor. They have their place.
I, for one, want to see a discrete Memristor in all shapes and sizes. If we don't try it, and don't experiment with it, we might be shutting out 1/4 of all electronics.
Luckily, a lot of electronics companies do feel their responsibility as educators. That's why we have samples (I'll teach you to use my chip, and if you become an important circuit designer you'll use my chips in your designs), that's why we have lots of antiquated chips still in production (stuff like OTAs which is of interest only to miniscule hobbyist groups), that's why we can still buy devices in units.
To be honest this idea worries me a little. The performance advantage of small and local data structures is one of the main natural forces that encourages software to be small and simple. Software tends to grow over time, both in size and complexity, especially when more people are involved. Unchecked I fear that there would be no natural counterbalance to this tendency.
Put another way, I'm actually a bit glad that Eclipse is horribly slow, because it's an easy-to-observe symptom of the fact that it's a horribly complex stack of software. If software like Eclipse could get by with acceptable performance because it had fast access to vast swaths of memory, it would be that much harder for an upstart competitor in this space (like Light Table) to be disruptive, because Eclipise would perform well enough that competing on speed doesn't impress.
I'm probably oversimplifying a bit, but I do think it generally good for programmers when simpler software also has better performance naturally.
But even if Memristors could be as fast as SRAM, could you out enough of it close enough to the CPU that it truly flattens the memory hierarchy completely? This is where I hit the limits of my knowledge of computer architecture.