Hacker News new | past | comments | ask | show | jobs | submit login

Why don't PDFs just render the text and store it as some form of vector image file.



PDFs come from PostScript. PostScript was invented in 1984. The Apple LaserWriter, introduced in 1985, was the biggest selling early PostScript printer. It had a 12 MHz processor, 1.5 MB of memory, and communicated at 0.225 Mbps.

To make this work effectively, printers would cache fonts. That saved on overall file size, which was important for storage and transmission. But the real driver was rendering speed. Most documents are pages of text at a small number of sizes and there are a small number of letters.

If you're going to print 300 lower-case "e" characters, all in 10.5 point Times New Roman, it would have been ridiculous to do the hard work of rendering the bitmap from vector each time. You render it once, cache the bitmap, and then just plop the bitmap in the right spot.

I know this because circa 1993 one client had me build a custom font that varied letterforms slightly to mimic a hand-lettered effect. They ran these weekly newspaper ads for their big wine store, and the were paying a guy to hand-draw the whole thing. They wanted to keep the casual look, but save on the cost. (And I presume the guy was kinda tired of writing the same things over and over, but I never met him.)

I learned enough PostScript to make it happen, decomposed each letter into strokes, and then drew the strokes with slightly different alignment each time. It worked fine in simple tests, but the first time I rendered a page, I thought I had broke the printer. Instead of the printer's top speed of 8 pages/sec, a full page took over 15 minutes.

So as usual with "why don't they just" questions, the answer is, "because 'just' is sweeping some things under the rug". It was harder than it looked at first glance.


Fonts can have different glyphs/hinting at different sizes and you don't know how big a PDF will be rendered.


That also takes more space. Understandably, when memory was counted in kilobytes and disks were counted in kilobytes, file formats were designed differently than they are today.


Usually you store the text anyway, so you can select and search it, or have it read by accessibility tools.

And most of the time, the letter spacing is the default given by the font, so you just have to encode the position of the beginning of a run of text. So it is pretty space efficient, too.


A font is some sort of vector image file.


Not always. Bitmap fonts are a thing.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: