I dug into this for a hn comment months ago, but I think it's 2-3 orders of magnitude difference in latency. RAM is measured in nanoseconds, PCIe is measured in microseconds
I don't think you've checked those numbers. SSD access is in the order of 10-20 microseconds (10,000 - 20,000 ns) and memory bus access is ~10-15 nanoseconds.
Here's the comment I made a couple months ago when I looked up the numbers:
I keep hearing that, but that's simply not true. SSDs are fast, but they're several orders of magnitude slower than RAM, which is orders of magnitude slower than CPU Cache.
Samsung 990 Pro 2TB has a latency of 40 μs
DDR4-2133 with a CAS 15 has a latency of 14 nano seconds.
DDR4 latency is 0.035% of one of the fastest SSDs, or to put it another way, DDR4 is 2,857x faster than an SSD.
L1 cache is typically accessible in 4 clock cycles, in 4.8 ghz cpu like the i7-10700, L1 cache latency is sub 1ns.
I have absolutely checked those numbers, and I have written PCIe hardware cores and drivers before, as well as microbenchmarking CPUs pretty extensively.
I think you're mixing up a few things: CAS latency and total access latency of DRAM are not the same, and SSDs and generic PCIe devices are not the same. Most of SSD latency is in the SSD's firmware and accesses to the backing flash memory, not in the PCIe protocol itself - hence why the Intel Optane SSDs were super fast. Many NICs will advertise sub-microsecond round-trip time for example, and those are PCIe devices.
Most of DRAM access latency (and a decent chunk of access latency to low-latency PCIe devices) comes from the CPU's cache coherency network, queueing in the DRAM controllers, and opening of new rows. If you're thinking only of CAS latency, you are actually missing the vast majority of the latency involved in DRAM operations - it's the best-case scenario - you will only get the CAS latency if you are hitting an open row on an idle bus with a bank that is ready to accept an access.
https://en.wikipedia.org/wiki/CAS_latency is how long it takes to read a word out of the active row. Async DRAM had "row access strobe" and "column access strobe" signals.
Synchronous DRAM (SDR and DDRx) still has a RAS wire (row address strobe) and CAS wire (column address strobe), but these are accessed synchronously - and also used to encode other commands.
DDR DRAM is organized into banks (for DDR4/5 these are grouped in bank groups), rows, and columns. A bank has a dedicated set of read/write circuits for its memory array, but rows and columns within a bank share a lot of circuitry. The read/write circuits are very slow and need to be charged up to perform an access, and each bank can only have one row open at a time. The I/O circuits are narrower than the memory array, which is what makes the columns out of the rows.
The best-case scenario is that a bank is precharged and the relevant row is open, so you just issue a read or write command (generally a "read/write with auto-precharge" to get ready for the next command) and the time from when that command is issued to when the data bus starts the transaction is the CAS latency. If you add in the burst size (which is 4 cycles for a double-data-rate burst length of 8) plus the signal propagation RTT to the memory, you get your best-case access latency.
The worst-case scenario for DDR is then that you have just issued a command to a bank and you need to read/write a different row on that bank. To do that, you need to wait out the bank precharge time and the row activation time, and then issue your read or write command. This adds a lot of waiting where the bank is effectively idle. Because of that wait time, memory controllers are very aggressive about reordering things to minimize the number of row activations, so you may find your access waiting in a queue for several other accesses, too.
Also, processors generally use a hash function to map logical addresses to physical DRAM banks, rows, and columns, but they set it up to optimize sequential access: address bits roughly map to (from least to most significant): memory channel -> bank -> column -> row. It is more complicated than that in reality, but that's not a bad way to think about it.