Your understanding's right: PDF's that are text + fonts are easy and fast. I'm concerned about the other kind, that's scanned pages. Any sheet music from Petrucci / imslp.org for one example. That kind is a sequence of raster images, stored in compressed-image formats which most people aren't familiar with, because they're specialized to bi-level (1-bit, black and white) images. A separate class from photo-type images. The big two seem to be JBIG2 [0], and CCITT Group 4 [1], which was standardized for fax machines in the 1980's (and still works well!)
Personally I have the impression that CCITT group 4 compressed PDFs are displayed very quickly, unless they are scanned at 3000 DPI... Can't say the same for JBIG2 or JPEG/JPEG2000 based ones.
I'd assume that the photo-type image decoder is optimized, right? If so, how does the optimized photo-type decoder compare to the apparently unoptimizable JBIG2 decoder?
I'm not knowledgeable to speak to that, but just to clarify—the low-hanging fruit in libpng I mentioned is in simple, vectorizable loops—conversions between pixel formats in buffers. Not in its compression algorithm (which isn't part of libpng—it calls out to zlib for that).
[0] https://en.wikipedia.org/wiki/JBIG2
[1] https://en.wikipedia.org/wiki/Fax#Modified_Modified_READ
(You can examine this stuff with pdfimages(1)—or just rg -a for strings like /JBIG2Decode or /CCITTFaxDecode and poke around).