So, it's been a while since I tinkered with hardware, but I'm making the blind assumption that although the bus speed is twice as fast as DDR4, there is also twice the read latency, as has always been the case with DDR memory, meaning the net read latency improvement is basically nil.
Of course we expect every generation of Double Data Rate memory to double the data rate. But it's always been a bit misleading to say "twice as fast".
Latency is limited by the speed of electricity which is a large fraction of the speed of light, so unless you move DDR closer to the CPU it physically can't get 2x as fast... ever.
light speed * 0.951 (electricity) / 2 (round trip) / 2 feet (CPU to furthest RAM chip along a wire) ~= 250 million cycles per second.
However, cache is now huge. So, it's latency is usually less of an issue than sequential reads/writes and total size.
Out of curiosity, why did you ballpark the length of the wire to "2 feet" (close to an order of magnitude high, way under 1 digit of precision) but specify the wave propagation speed along that wire to one part in a thousand (way more precision than appropriate: line capacitance changes significantly with trace width, pcb thickness to ground plane, pcb composition, etc...)?
You're point isn't invalid but it's sort of misapplied. In fact the latency of DRAM on modern systems is dominated almost entirely by precharge time within the memory IC, which is about an order of magnitude slower than signal propagation. And it's not changing, either. DRAM cycles have been sitting at around 60 MHz for a decade and a half. We're dropping voltage as we shrink the cells, and so there's no net increase. Realistically signal propagation will never be the limit.
I very much rounded the output to represent minimal accuracy and while 9.51 e-1 is not accurate to 3 digits 1 digits of accuracy * 1 digits of accuracy = less than 1 digits of accuracy.
Anyway, if you actually trace the furthest bit of a ram chip to the CPU it's a longer path than you might think > 1 foot but < 3 feet.
As to my point tCL is a round trip latency and actually not that far from optimal. DDR is not designed for pure random access so much as cheap access to lot's of ram so yes there are many trade offs, but they are more reasonable when your close to hard limits.
Traces only go from CPU pin to Module, latency includes from memory controller CPU pin, CPU pin to PCB, PCB to module, module to RAM chip edge, ram chip edge to actual memory location.
I keep hearing this, but it doesn't jibe with the RAM timings I see. tRAS is usually 50ns or so, and propagation delay is 2ns or so (see below). What gives?
My suspicion is that amplifying the differential voltages produced by femtocoulombs of charge at 100% reliability is a harder problem than moving the DRAM chips closer to the CPU, and that speed-of-light has gotten the blame in "pop architecture" due to sloppy overgeneralization.
2*2 feet is long for a memory bus, but it's partially cancelled
out by the high velocity factor (sorry, "speed of electricity")
of 0.951 in parent's calculations. Instead, I'm going with 0.5*2
feet at a velocity factor of 0.5 for a 2ns roundtrip propagation delay.
Major factor in DRAM latency is in moving these charges around the chip (between the array and sense amplifiers). The wires in the array have some finite conductivity and current handling capability thus this cannot happen instantenuosly.
On the other hand sequential access bandwith is limited almost entirely by the external interface and has more to with economical factors (pin count is major part of IC price) than any physical limitations.
tCL is (very) roughly the RAM equivalent of sequential access time. I'm talking random access time, tRAS, which includes closing the last row (precharging the diff amps) and opening the next (waiting for the amps to stabilize). It's called Random Access Memory, so I think it's more fair to judge it by its random access time.
DDR is not really setup for random access, because it's setup for talking to L2/L3 cache not registers. tRAS ~= tCL + tRCD + tRP which kind of but not exactly exactly what you want for random access.
Yes, tCL is a round trip time for accessing an already open row, and it puts a sharp upper bound of ~5ns on propagation delay. Random access time, approximated by tRAS, is still ~50ns. You contend that tCL is the bigger problem, I contend that tRAS is the bigger problem. The key statistic is <#CAS/#RAS>, which I haven't been able to find easy values for.
I don't understand your argument that the L2 and L3 caches tip the balance in favor of <#CAS>. If anything, I'd expect them to do the opposite, because the SRAM and DRAM both aim to exploit locality, meaning that better L2 and L3 caches would reduce #CAS faster than #RAS. Of course, armchair reasoning about such complex systems as memory controllers and caches can only go so far, so I wouldn't be shocked to be wrong, but I would certainly demand actual statistics before changing my view.
All this means is that we should stop thinking of this stuff as RAM. Only the L1 cache is really RAM. Everything else is just a kind of fast, volatile, solid state disk that just happens to share an address space with the RAM.
Yes, I always considered that inflated bus speeds a scam.
It is like having a subway line driving trains every 2 minutes, but only 1 train in 15 taking passengers, rest going completely empty.
If you look at CAS latencies [1] then it's clear that memory speeds stagnated a long time ago:
That's not a very accurate analogy, since sequential reads do improve. CAS latency is one limiting factor on random reads, which has been constant / only slightly improving for a very long time; significant improvements last happened here around with DDR and SDR (i.e. 20 years ago).
This is a central reason why the cache size per thread has not changed much in the last 20 years (beside of course fabbing/yield limits). (And because caches are subject to a similar effect as Dennard scaling)
(btw. I wonder if it'd have been smarter by the memory industry if they specified latencies in absolute numbers (or relative to the memory array clock, not the bus clock), not bus cycles, since the rising latency numbers have been a marketing issue in every generation.)
If cas latency haven't changed for 20 years, and the memory wall in general being a huge pain, does anybody know why didn't some sort of 2.5D memory(even using low-resolution lithography and carriers cheaper than silicon), haven't entered the market ?
If heat is the problem, and since the memory wall is a 20-year problem, was there some new cooling tech ? or is it just standard industry inertia(and just being satisfied with half solutions) ?
No, it's a fundamental issue with the technology. Roughly, the ratio of power density to capacity-that-can-hold-information-for-a-refresh-interval stays more or less constant, independent of fab node. In fact, newer generations (DDR3, DDR4) have been pushing it here (cf. rowhammer).
I doubt we will see a drastic improvement in DRAM latency unless the underlying process changes a lot (i.e. different physical storage mechanism or very different manufacturing).
"Analysts didn't expect DDR5 to be developed"
Is that the same analysts stating that Apple will have an event at the same time as last year?
If an improvement can be done cheap and easy it will be done. Implementing a memory controller for DDR5 is probably a trivial job very similar to current DDR4.
Integrated GPUs has been on the rise for a long time. This is good news for reusing current memory controller designs while providing higher bandwidth.
JEDEC officially specs DDR4 RAM up to 2133 MHz. But manufacturers seem to have left this far behind already, with Corsair and G.Skill shipping modules at 4200 MHz. I don't see why JEDEC doesn't just add a few more rows to the table of official speeds. The only real advantage for a new spec is to bump the density.
But that was the end of 2015, we are into 2017 and i have yet to see this again on the market. And they are likely very expensive.
I really hope DDR5 bumps the spec to 512GB rather then 256GB. Unfortunately the DDR memory / GB prices is actually on the rise. The only thing i could see changes in the market is when all the Fabs from China goes online in 2018, and China continue to pour in tens of billions of money in it.
I have impression JEDEC constantly underestimate what manufacturers can actually do...
I bought last year DDR3 RAM, 2400mhz.
So, when DDR4 started to be sold, capped at 2133, DD3 2400 was already sort of easy to find (and it is faster than the DDR4 too, due to smaller latency, the few times I decided to run my computer in some OC competition that relied on RAM it ended on the top 1% of the air cooled machines despite not being an exceptional machine neither prepared to OC, Intel XMP runs crazy fast on my machine, the OC people I talked all said it was because of my fast DDR3 RAM).
Right, but it's only an "overclock" because the official spec only goes to 2133. And that means that compatibility is very hit-or-miss. It could benefit from being more standardized.
Would the equivalent amount of DDR5 actually feel any different to DDR4 to the end user? It seems like hardware is getting better but basic software (web browser etc) is getting more complicated/bloated at the same rate, so things don't actually feel 10x faster than they did a few years ago.
Computers with good SSDs and not too much bloatware (relevant on Windows) do feel faster to me than computers used to be.
I think people can get nostalgic accidentally and overestimate how fast things were in the past. I remember multi-minute loads for games on my Commodore 64. I remember multi-minute Windows 3.1 bootups. I remember watching a JPEG progressively creep in on the early web. I remember watching my school projects in C++ take noticeable time to compile despite what we would now consider their absurdly simple nature. I remember when I was reluctant to click on a 1MB download.
But you do want to avoid having a modern Windows machine on a slow spinning-rust hard drive. Yeow.
Yeah I think you're right - moving from mechanical to SSD made more difference in perceived speed than more/better RAM and faster CPU. Boot times are for sure a happy thing of the past, even opening up programs like Photoshop used to take long enough for me to alt-tab into the web browser for a bit while it started up, whereas now it's instant.
I've recently switched from 8GB MBA to 16GB MBP and to be honest, I'm not sure I've noticed any more speed above what's to be expected from any brand new machine before it gets bogged down with shit. My MBA used to be able to handle multiple big programs running in the background - 2 of the Adobe suite + Chrome + small stuff, so I haven't felt much use for the extra juice day-to-day, though I imagine the RAM is useful for video editing etc.
"Only cheapskates and the stubborn would pass on an SSD."
Even a year ago I was still going "eeehhhhhh, weeelllll, maaayybee..." but sometime in the last year I'd say we crossed over for any professional. It's no longer a matter of paying $100 for a big-enough spinning rust or $600 for a too-small SSD, now it's more like "Do I want a really fast 512GB or a slow 4TB for the same price?", which are both as of right now ~$150 give or take $30 depending on your quality needs. (I just checked.) That's still quite the spread on size, but you can fit a lot in 512GB.
The people that I had in mind asking me would not know what an SSD is (the less technically literate). The people who go to Dell, HP, etc, and are trying to decide on a "good" laptop.
>Would the equivalent amount of DDR5 actually feel any different to DDR4 to the end user?
Most likely not at all. Browser speed has more to do with network speed and browser/website design than your hardware.
Browsers and websites still make poor use of multiple cores,
and making sites performing well is a never ending chase between designers adding more ads, images and visual effects, browsers trying to figure out tricks to make coping with that possible and developers implementing those tricks.
Is the image of an DDR5 DIMM? because from the look of it it has some serious power management and storage components looks more like an NVRAM DIMM to me.
Edit: the image name has 8GB NVDIMM in it, so yeah this isn't DDR5.
How much difference is there between GDDR and regular DDR? I think that GDDR runs at higher voltages and temps than regular DDR. I'm assuming if GDDR5 is available, then it's kind of inevitable that regular DDR 5 would be coming at some point.
I thought the main difference with graphical RAM was that the monitor output DAC could read from memory at all times, regardless of what the GPU is doing with it. And GDDRn is not the same generation as DDRn; the last GDDRs (4,5) have been based off DDR3, I think.
Graphics RAM was dual-ported (=two readers at the same time) in the early days, but hasn't been for a long time. The concurrent accesses are managed by the GPU internally now.
Because LPDDR4 wasn't available in sufficient quantities or not supported by the platform SkyLake only supported LPDDR3 and the current KabyLake SKUs also do not support LPDDR4.
That's due to integrating the memory controller onto the CPU, you can't switch memory controller to use a newer spec, you're stuck with whatever the CPU supports.
Games are a special breed of application since they delegate a lot of work to the GPU and often are GPU-bound, not CPU-bound. Other workloads often end up with the CPU waiting for memory accesses. The most obvious example, besides stream-processing, would be garbage-collected languages. Simply scanning the heap will benefit from all the memory bandwidth it can get.
Even if speeds don't improve perceptibly because the CPU is simply waiting for the human 99% of the time it still may benefit laptop battery time due to increased power efficiency and the CPU completing that 1% sooner and returning to low power states.
> Games are a special breed of application since they delegate a lot of work to the GPU and often are GPU-bound, not CPU-bound.
Yeah, and the GPU (at least a dedicated one) has its own VRAM, so accesses for which RAM speed would otherwise be important end up relying on VRAM speed, I assume.
Of course we expect every generation of Double Data Rate memory to double the data rate. But it's always been a bit misleading to say "twice as fast".