Because SRAM is essentially a flipflop gate. It takes at least four transistors to store a single bit in SRAM, some designs use six. And current must continuously flow to keep the transistors in their state, so it's rather power hungry.
One bit of DRAM is just one transistor and one capacitor. Massive density improvements; all the complexity is in the row/column circuitry at the edges of the array. And it only burns power during accesses or refreshes. If you don't need to refresh very often, you can get the power very low. If the array isn't being accessed, the refresh time can be double-digit milliseconds, perhaps triple-digit.
Which of course leads to problems like rowhammer, where rows affected by adjacent accesses don't get additional refreshes like they should (because this has a performance cost -- any cycle spent refreshing is a cycle not spent accessing), and you end up with the RAM reading out different bits than were put in. Which is the most fundamental defect conceivable for a storage device, but the industry is too addicted to performance to tap the brakes and address correctness. Every DDR3/DDR4 chip ever manufactured is defective by design.
A nitpick: if the chip is manufactured in CMOS technology (as it's typically done), then no, current does not have to flow to keep the transistors' state (it's sufficient that a potential difference is maintained), only to change it. There's a tiny leakage current however, which over a few billion transistors adds up.
The key point is that the refreshes do not need to happen very often. Something like once per 20 ms for each row was doable even by an explicit loop that the CPU had to periodically execute.
And this task soon moved to memory controllers, or at least got done by CPUs automatically without need for explicit coding.
I have always had some questions about these low level details.
Back when it needed to be explicit code, what exactly was the code doing? I tried to find some example of what it might look like online but search is so muddy.
DRAM has destructive reads and is arranged in pages. When you read from a page, the entire contents of the page are read into an SRAM buffer inside the memory chip, the bit(s) selected are written out to the pins, and then the entire contents of the SRAM buffer is written back into DRAM.
For old DRAM, usually half the bits in an address selected the page, and the other half selected the word from the page (actually, often a single bit, and this was extended to a full word by accessing multiple chips in parallel). Set your address lines so that the page address is in the low order bits, and any linear read of 2^(log2(DRAM chip size)/2) length is sufficient to refresh all ram. Many early computers made use of this to do the refresh as a side effect; as an example, IIRC the Apple 2 was set up so that the chip updating the screen would also refresh the ram.
The inventor of DRAM, Robert Heath Dennard, just died a few months ago and I was reading his obit and his history.
I think the long and short of it is that DRAM is cheap. DRAM needs one transistor per data bit. Competing technologies needed far more. SRAM needed six transistors per bit for example.
Dennard figured out how to vastly cut down complexity, thus costs.
(I think I more or less know, but I’d rather talk about it than look it up this morning.)