This is done with OCR, specially when digitizing things.
The Internet Archive does this a lot. There was an old 90s book that I needed but they never made an electronic edition. So the internet archive, scanned and digitized it with character recognition to make a select-able PDF like this, as well an epub that can be read on your phone like any other ebook.
Now that book is available for anyone to borrow and read. (its still copyrighted so you gotta access it via their DRM controlled app/website, but that can be easily broken and its better than not having access to that book at all)
He means that Acrobat Pro includes an OCR system that you can use to add a searchable text layer to scanned documents. Readers like Acrobat Reader and PDF.js do not perform OCR. You won't be able to use them to search scanned documents if the document creator did not run OCR.
Google runs its own OCR pass on scanned PDF documents in order to index them better. It can be annoying when you get a 50 page scanned document as a search result and then find out that it doesn't include a text layer, so you need to run your own OCR or skim the whole thing to find the relevant parts.
What PDF.js is showing is an invisible text layer overlaid on top of the original image. It does not do OCR which can take up to 1-2 seconds per page, it would be too slow and require a large-ish neural net if you care about accuracy.
The Internet Archive does this a lot. There was an old 90s book that I needed but they never made an electronic edition. So the internet archive, scanned and digitized it with character recognition to make a select-able PDF like this, as well an epub that can be read on your phone like any other ebook.
Now that book is available for anyone to borrow and read. (its still copyrighted so you gotta access it via their DRM controlled app/website, but that can be easily broken and its better than not having access to that book at all)