They have tested it to 10 million cycles with no degradation, so that's where that figure comes from. It's not 10e7 before failures or 10e7 before failures at some particular rate. The assumption is it's somewhere higher than this but you can't tell without more testing.
> The process was repeated five times, resulting in a little over 10^7 program/erase cycles applied to the device. As can be clearly seen in Figure 4d, there is no degradation of the ∆IS-D window throughout these tests, meaning that the endurance is at least 10^7.
The paper says they tested the durability of the ram with a 5ms program-read-erase-read loop. Meaning each time they program-read-erase-read, it takes 5 milliseconds.
Ten trillion cycles would take over 150 years.
I'm guessing a silicon lab doesn't have "the rest of the computer" that would allow them to run this ram at full speed constantly. This UltraRAM isn't something they can just slot into their motherboard.
5ms is 200 cycles per second. 3600 seconds in an hour. 0.72 million writes per hour. Almost 40 million if I start it on Friday evening and stop it on Monday morning
10 million is 14 hours. It takes longer than that to prepare your documentation. Something is rotten in Denmark. A skeptic could very, very reasonably assume that cherry-picking is going on here, and that 10m to degradation isn't far off from the truth.
These labs have their custom structures synthesized, adding a small circuit specifically for endurance testing would be trivial compared to what they have achieved designing and implementing the structures they have.
This is a common problem in memory. Oftentimes they use models to accelerate the wear and tear through temperature, voltage, etc. and extrapolate the lifetime
> Assuming ideal capacitive scaling[33] down to state-of-the-art feature sizes, the switching performance would be faster than DRAM, although testing on smaller feature size devices is required to confirm this.
That's about the same durability as Intel Optane had, so the first thing it could be would be to replace Optane where it has been used.
Optane did inspire a lot of R&D into persistent data structures, databases and file systems that started to challenge the traditional model of local memory and persistent storage.
IMHO, a few of those projects were a little bit overoptimistic, and used NVRAM as DRAM without many restrictions. For NVRAM to be viable, I think it still needs to have overprovisioning, wear levelling, memory-protection and transactions, provided by hardware and/or an OS but not necessarily with traditional interfaces. It is mostly a matter of mapping it CoW via a paging scheme instead of directly, and it will still be at near-DRAM speed.
That's basically the equivalent of a Flash Translation Layer, and having it removes the original selling point of making fsync() a no-op. At that point, persistent memory's only advantage over existing non-volatile storage is possibly higher performance.
To hell with fsync(), I'd want a proper commit()! ;)
The performance is so high that the assumptions that had led to the old file system interfaces don't apply any more. There is opportunity for something better.
There are software transactional memories for nearly all programming languages. A project I worked on, shipping since the mid 2000s, uses an in memory STM for nearly all operations.
Because it’s the best way to survive failures (such as loss of power). Transactions allow you to know that all your datastructures are in a consistent state.
It would also enforce that writes happen only when a program explicitly declares that it is necessary.
When writes are persistent and cause wear, the consequences of e.g. a common buffer overflow or use-after-free bug can be much higher than if they were not. Even an unoptimised loop that writes to NVRAM could be bad.
Maybe I'm confusing something, but to reach a trillion cycles in, say, a year, would take overwriting all your memory 30 times a millisecond. That doesn't sound right?
DDR ram is refreshed every 64ms (varies by DDR generation and specific chips). Branch Education has an excellent video on this named "How does computer memory work?"[1]. It would still take an exceedingly long time to reach a trillion, but it's still pretty frequent.
You don't need to refresh non-DRAM memories though.
I agree that some regions risk being R/W more than others, so memory controllers should indeed perform some kind of wear levelling, but otherwise I find it hard to imagine trillions of overwrites across GBs (or TBs) of data. 1e6 cycles is definitely doable, and on the low side, even for flash devices. 1e9 is pretty good for general-purpose memories, and few applications require 1e12. Not even SRAM or DRAM have unlimited endurance, due to physical wear. It's hard to find a source on this, but I would probably hand wave it at around 1e15 cycles for DRAM? This would be 30 years of operation for one access every microsecond.
Even at 1GHz, a trillion (10^12) writes is only 1000 seconds of work for a modern CPU. OK latency is a thing, so multiply by 10 and it takes a day. This is for DRAM where cells are individually addressed. For flash with wear levelling the numbers of course get bigger.
Volatile requires it emit instructions that access the object. So if the object is in RAMA, it will emit memory access instructions. However, on modern CPUs, that will still hit the cache. You need to either map in the memory as uncached, or flush the caches to force a memory access
no, that won't work. You'd have to clflush after every store. And even then, the cacheline might only ever get to the write pending queue (wpq) - and that you can't control.
I started to think about flipping a single bit in some process a million times per frame inside some loop, but that could only be done in cache…
Still if you only changed the state of the memory once per frame, you would do it in RAM, not in cache. At 1000 FPS (we should consider the worst scenario even if rare) that's 3 hours of playing a game to reach 10 800 000 reads/writes.
Now question is what happens if that bit gets damaged, perhaps the memory just disables it as damaged, and uses another bit for this memory address from now on. Perhaps it makes the ultra ram slower over time as more bits (sectors) get damaged?
The clock frequency is GHz, which is a trillion cycles per seconds. There is at least one cache layer between the CPU and the RAM but we are in the same ballpark. And yet it's OK for the typical lifetime of our computers.
> just by incrementing a shared counter (which cannot be cached)
That's not true, a shared counter (i.e., an atomic integer) is cached – in fact, there's no guarantee that its value is ever written back to system RAM.
You're probably thinking of non-cacheable memory: the kernel can set the MMU attributes of a memory page such that the CPU will avoid the cache when it accesses addresses in that page. This is completely independent of atomic accesses on memory locations [1].
[1] At least typically – there may well be CPUs which disallow atomic accesses on non-cacheable memory.
But then why would it have to be a shared counter? Any write to a non-cacheable memory location is transmitted to system RAM, it doesn't have to be shared with other cores, nor does it have to be a counter.
Wear levelling on RAM isn't in use today to my knowledge, but I don't think it is technically impossible.
You would probably go for some approach where most memory addresses are direct-mapped, and then the few that have been written most are redirected to new addresses.
The reading of the direct-mapped addresses would be super fast, since you can do the read in parallel with the lookup in the remapping table (just to check that this is a direct-mapped address). Reads of non-direct mapped addresses might take a couple of extra cycles, but that doesn't matter because they are very rare.
To do any of that, CPU memory controllers need to be able to handle per-request variable-latency RAM, which to my knowledge today they do not, although it would not be a big redesign to add.
Wear leveling RAM would be trivial with any MMU from the last 40 years. You can just fault on the write and do your wear leveling in the fault handler. This is how virtual memory already works.
No, you would keep a write counter for every (4kB) DRAM page, and have the OS move the virtual page to new physical one if the write count of a page grows much higher than the average.
That assumes you still have DRAM. Since this is faster and higher capacity than RAM, it’s potentially viable as a RAM replacement. In that case, you wouldn’t have anywhere to store the counters (but presumably in that case you wouldn’t need to either). I’m not sure you’d need to have a write counter when this replaced RAM though even if this didn’t have the same write endurance. For storage nodes, there’s no value in RAM outlasting storage. And this already has better write endurance than NAND. So on a storage node, you could easily imagine using this as RAM as the number of erases is going to be dominated by storage activity rather than ancillary memory writes managing the storage.
Assuming a typical 5-year lifecycle, 10 million writes means 1 write every 15 seconds. That's more than enough for executable code, CDN content, or a database index. I can definitely see systems with 75% UltraRAM for read-heavy data and 25% traditional RAM for write-heavy pages acting basically as L4 cache.
The current set up is based on separating volatile and non volatile memory and adding caches to paper over the slowness. Caches are getting bigger and bigger because of the huge speed disparity. I think you underestimate how much of a game changer this could be.
This is persistent and fast.
If this takes off, and it does only last 10s of millions of cycles, just use cache for fast changing things and ultraram for everything else.
If it lasts trillions of cycles, it potentially would completely change pc architecture. It was the 80s when we had ram/rom that could keep up with the processors of the day. This potentially gets you an instant on computer, no need for caches, no need for memory for the graphics card, no separate hard drives. Just one big simple bucket of bytes for everything.
If the latency claims turn out to be true, it could still be worth it in various cases, eg with a bit of effort to reduce the number of writes you could get a big hashtable that you initialise once a day or so that gives really fast lookups.
things that dont last long are replaceable though.
no one complains about not being able to replace the processor in their phone because it 'never' breaks. batteries on the other hand do, and to varying degrees are replaceable.
Article claims a tenth the latency of dram at 100x lower power, but also says they're trying to fabricate at 20nm. Oh, and also persistent.
If they've done that, awesome. Make it, show that it works, licence how to make the thing to semiconductor companies and retire wealthy. Or maybe the university owns the IP.
If they've done that, I think the concept of "turning off" a device goes away. You just unplug it, and the energy needed to dump the stuff in the pipeline to memory can be stored in a capacitor.
The OS can just always be loaded and ready to go; when power is restored it checks to see if the hardware has changed and just loads up the 64 MB of CPU cache. It could take just a few milliseconds. It takes on the order of a millisecond to charge the capacitors in a desktop PSU. "Restarting" becomes basically the same thing as reloading, and takes >100s of times longer than actually restarting the device. That's crazy to consider.
If boot time is 0, stuff will just unplug itself after its been idle for a few seconds. I'd expect the hardware in phones/laptops to become more distributed, with basic vital functions handled by a separate processor. Probably the screen gets taken over by a very simple processor that can only display the time, battery %, cell info (or the current screen buffer, for a laptop) and user input causes the main cpu to wake up in between frames.
If they've done that, I think the concept of "turning off" a device goes away.
...The OS can just always be loaded and ready to go; when power is restored it checks to see if the hardware has changed and just loads up the 64 MB of CPU cache.
The idea, called “Orthogonal Persistence” way back when, has been around quite awhile. Here’s my (probably spotty) idea of the history:
Researchers wanted instant-on for their early visions of tablets. To make sure security and networking would still work properly, there was an idea to use Capabilities (which were around since the 1960’s) to support this and solve the chicken and egg problems that were thought to arise.
Capabilities later became widely adopted just for better security, but Orthogonal Persistence never took off, because never rebooting would have required much higher levels of reliability, which would have been expensive to achieve. So today’s devices still reboot, but also have a fast "wake from sleep."
So I’m not sure if we will ever have true “Orthogonal Persistence.” We might have much slicker “wake from sleep” instead.
I'd expect the hardware in phones/laptops to become more distributed, with basic vital functions handled by a separate processor.
No matter how fast the device itself is, addressing into a large pool will always be slower than a smaller pool. Both because of increased travel distance, and because every time you double the size of the pool, you add one additional mux on the path between the request and the memory.
This is why CPUs have multi-level caches, even though the transistors in L1 cache and L2 cache are typically the same -- the difference in access latency is not because L2 is made of slower memory, but because L1 is a very small pool very close to the CPU with the load/store units built into it, and L2 is a bit further away.
However, if main memory latency is suddenly a lot lower, it might change what is the most efficient cache level layout. The currently ubiquitous large L3 cache might go away. That would of course require very high bandwidth to the memory chips, because L3 does bandwidth amplification too.
> In all of the above tests, the program and erase states were set using between 1 and 10 ms voltage pulses, two times longer than the switching times used in our recent report of ULTRARAM on GaAs substrates.[15] In both cases, the devices operate at a remarkably high speed for their large (20 μm) feature size. Assuming ideal capacitive scaling[33] down to state-of-the-art feature sizes, the switching performance would be faster than DRAM, although testing on smaller feature size devices is required to confirm this.
> Why do you even need a cpu cache?
Cell read time is entirely different from latency and throughput. This stuff still reads in rows like RAM and can't just be accessed freely like registers.
You’d probably still need an L1 cache. L2 and L3 might be superfluous or you could have massive L2/L3 caches made with this rather than traditional SRAM that sit internally within the CPU to avoid the memory bus. Contention for the memory bus could also be a reason to still have SRAM caches that are slower than main memory.
Shifts like this are so impactful it’s hard to predict exactly what good designs will look like until we’ve had 5-10 years hands on for the industry to shake out how the Hw topology will looks like (maybe more since HW dev cycles prevent fast iteration and testing of ideas)
1/100 power, 1/10 latency... in a through-hole chip carrier?? How do they get enough of it close enough to the CPU at those low powers and latencies, at DRAM clocks? Electricity travels 10cm in a tenth of a nanosecond, best case. And it uses quantum...
They are talking about the speed of the new type of memory cell, not of the physical implementation they have it in.
If this actually pans out, it will be worthwhile to stack a lot of it on the same package as the CPU. The reason memory is so far in current systems is mainly that having it closer wouldn't actually meaningfully help, because almost all the latency is reading data from the DRAM array anyway. If they suddenly get an economical new memory type that has an access latency of tenth of what DRAM does, they are going to figure out how to get it close enough that the signal travel will not be a meaningful part of the total latency.
This sounds too good to be true. When Apple buys up all the production capacity for this and makes it available exclusively in Macs and iPads, we'll know it's viable. Till then, my optimism is tempered with caution.
Will this be used for Harvard architecture where programs are run straight from their storage instead of first being read into RAM? Maybe we can use data stored on this instead of having to stream it from storage to RAM?
Optane made loading the OS super fast, but the OS still has to fill up RAM. No matter what, loading up 2+ GB of ram will always take noticeable time. Even flat out, Optane takes >1 second to boot, and several seconds to restore a session.
Cost permitting, this stuff would replace RAM, not the drive. No more loading into ram; now the bottleneck is loading into cache and that will always be trivially fast just because cache is so small.
Even if its too expensive to replace RAM, if it can fit the minimum bits of an OS then I think cold boot time still goes to 10s of milliseconds. Might take a couple years, but interactivity doesnt need to wait on ram to be filled.
> No matter what, loading up 2+ GB of ram will always take noticeable time.
Barely so. NVMe sequential throughput is measured in gigabytes per second. So you can get this under 300ms. And you can optimize the order in which things are loaded so that the important ones arrive first, not all in-memory data is hot.
What makes booting take time are serial dependencies between boot stages, timers (boot prompts for humans, but also for hardware to power up), careful device enumeration and initialization and stuff like that.
Yep. Keep it all hot. North/southbridge, controllers, everything. Skip POST. If something has changed we can just restart once it's warmed up. Anything that still needs a traditional boot (eg a disk drive) we just be assumed to have not changed until proven otherwise. Fuck the MBR. Fuck ROM and BIOS and CMOS. If you don't need to do it between RAM frames, you don't need to do it until you're told to reboot.
If you've been unpowered all day, or if your hardware has changed, or you're worried about security, then you can choose boot from scratch. The only other reason, IMO, is because the computer has just been put together. If all those parts can restore their previous configuration, all they have to do is signal "yep, I'm still in the same configuration" and we should be able to pick up where we left off (again, except for disk drives/ram/networking etc).
I like the clean slate we get from boot-from-scratch. Some pieces of hardware have subtle state corruption issues that can accumulate over uptime (kinda like Windows) that get flushed out by a reinitialization.
Instead as much as possible of the boot process should be taken off the critical path and be treated more like hotplug/optional peripherals. Those HDDs? They can spin up while I'm already logged in (assuming they're not the boot drive).
Similar claims have been made about MRAM, FeRAM and similar devices for many years, hailed as replacement and unification of both storage and DRAM. MRAM isn't completely vaporware, but it's not available at the prices or densities of DRAM.
So, will it scale down? Will it be cheap to manufacture?
Currently, performance is hypothetical. This and DRAM both work by charging up a little capacitor; this tech uses tunnelling so that the capacitor can be very highly isolated. That's why it doesn't discharge.
The smaller the capacitor, the faster it can charge/discharge. This tech has only been tested at sizes ~1000x large than the state of the art, and the speed advantage assumes it scales perfectly with the scaling laws. Reality is never that kind, but it might be mostly that kind.
It's still theoretical, though. There might be some manufacturing quirk that makes it not work as well at small sizes. Defects that don't matter now might be huge at that scale. If power requirements creep up, they may kill longevity, which may require them to sacrifice speed... everything has to go right, or it can become a balancing act.
Assuming everything goes great, it's still somewhat more complex than DRAM- more layers. It will certainly cost more than conventional RAM, but with ICs in particular it's very hard to know if that will be 10x more or .1% more.
It's going to be very expensive (lower densities than NAND and a somewhat exotic process for making it) and it hasn't been proven at geometries smaller than 20nm; it will only be faster than RAM if it continues to scale.
Or rather, the silicon "in mice" equivalent: in a test sample 1000x the scale, with only hopes and wishes that things won't change too much when they scale down.
All the cool mice these days are running around with memristor-based brain implants. This will be a huge upgrade for them. They'll be able to spend a small fraction of their usual daily time running in the hamster wheel, charging up their symbiote brains.
Not being made in a way that is usable in current systems, not having a commercial scale manufacturing process yet, and not being proven for long term use yet.
Rebrand the name, as currently it is misleading... However, the technology looks interesting for storage devices if it indeed exceeds SLC Flash specifications.
However, after the Violin Systems boondoggle one may find it significantly harder to find growth capital.
>"Moreover, the UltraRAM researchers asserted that the new memory tech is expected to be capable of 1ns write operations, which is about 10x faster than DRAM."
That'll be really nice if they can get it into production...
It would likely be a replacement for applications where low latency and non-volatility are required but size is less important. A microcontroller with a good sized chunk of UltraRAM could allow for a type of Harvard architecture where program code runs right off of where it's stored. You can have the microcontroller shutdown completely and start right back up where it left off with NV memory, just write the registers to memory right before shutdown and load them back on boot. You can have very power efficient devices that never really have an off state because they are always hibernating when they aren't doing anything.
It doesn't need to have consumer-wide adoption to be a success.
They can cater to a niche business market to which UltraRAM can add ultra-high value (pun intended) for particular data processing or persistence needs.
One idea that comes to my mind is stock markets. Automated traders took over it and their fight in on the sub-milisecond scale.
Imagine how much an investment bank would pay for UltraRAM if it allows them to process real time data much faster and make ultra-money with it? (again, intended and not sorry about that!)
Closer to home, I can think of a few competitive video game players with more money than sense who would spend in the thousands for a DRAM replacement that allows to go from 150FPS to 300FPS. Depending on workload, DRAM latency generally ends up being the bottleneck at such high refresh rates.
Assuming it can be as dense physically as current flash and RAM, there is a pretty nice market to target in mobile devices that's always looking for things that can lower battery use.
Conveniently that's also a market where Apple and Google have enough control over the software side to make things work with a new, weird memory scheme (RAM slower than permanent storage, but still needed because of durability).
Well, on the other hand GDDR and HBM are two competing technologies both of which are still reasonably alive, with HBM being a promising candidate to replace GDDR for good eventually.
> Databases are automatically obsolete. A file system is enough and performance is improved by not using a database.
This makes no sense at all. Databases are much more about what data structures are used internally and the high-level interfaces provided to access said data than the underlying technology used to persist data.
There is also the matter of how much data can/needs to be persisted, which is not addressed at all.
Locking and data structures are more or less a solved problem.
Persistence is not yet solved.
Our programming models are currently heavily influenced by the way we store and query data and the underlying registers/memory/cache/storage HW. With PRAM, we can simplify programming models using Persistence Ignorance (PI).
Well, that's one hell of an undersell of what a database like PostgreSQL, MSSQL or MongoDB does.
It's not just that people "can't write priginal applications" but that in fact people shouldn't always write their own bespoke single-purpose databases for each application. Getting ACID, MVCC, efficient storage, indexing and backups etc. at the same time is hard, really damn hard. You might get over some of them, e.g., efficient storage, with hardware but there's no free lunch on those topics.
A database is like using a library: You can always write it from scratch (and sometimes you should even) but in 99% percent of cases you should rely on the tried, battle tested existing solution.
> Databases are a means to query and index data with greater performance than without.
I don't think you understand your own point. The querying, indexing, and performance bit are tied to the data structures used internally by the database, not the technology used to persist data.
What exactly will become obsolete? (IMO) file systems are a degenerate form of databases already, with graph structure, large enough directory entries becoming b-trees etc. But even with fast storage you'd still need indexes for any access pattern different for what's encoded in the directory tree.
I believe the author means database as disk persistence. But we could all host sqlite db's in /dev/shm today if we wanted.. I guess it's just excitement about persistence at speed :)
The original post is deleted, but in a proper ACID database, persistence and speed are quite highly related. The DB can't truly move on until it has confirmed that a set of writes has truly hit the persistent storage, and the time required for that is vastly larger than the time required to process a transaction in volatile memory. We cheat in every possible way, notably with battery-backed NVRAM, so in actual practice the cost isn't always visible. But if you need truly persistent transactions, then fast persistent memory is a godsend. (Especially if it's big enough to put the WAL into. Though even a couple of bytes to store the last committed transaction ID can be useful.)
I come from the future and I can tell you that the 1’000 years were greatly exaggerated, as the UltraRAM doesn’t handle well radiations fallout from World War III.
I also come from the future, and the reason for the war was to finally get rid of the imperial system. Despite billions of people having died, we all think it was worth.
You can disagree with the tests or interpretation but it's easy to find the paper rather than just assuming no testing.
Endurance figures come from actual testing with their 20um version. Retention is based on looking at the decay over 14h. Since it decays to begin with then plateaus they look at fitting a line to it from some time before the plateau (otherwise the answer is "infinite years") which gives 10^7 hours.
I would assume they would have a couple of such devices running since publication till now. Instead of >24 hours they probably know the endurance after about 500 days today, allowing for more convincing extrapolations towards 10^7 hours.
It is unclear from the article if these plots are tests of individual memory cells or a large collection of cells. Any serious attempts would involve an array of cells so a million such graphs can be plotted together etc.
> 10 million write/erase cycles
this is not going to compete with DRAM, which needs to endure trillions of write/erase cycles in its lifetime.
Unless they grossly underestimated its durability, a name like UltraFlash would seem more appropriate?!