This leads to an obvious question: "why did the programmers not always just embe...

Theodores · on Sept 16, 2017

So here is how it happens...

Graphic designer designs cheque. For the design to be signed off he/she includes 'lorem ipsum' placeholder text for the special numbers at the bottom.

Design gets signed off, a template is made for the programmers to use.

In the code a third party library is used to make it 'easy' to create a PDF. This process consists of opening the template, adding a line of text to it and printing it as a PDF to another file, ready for the printer.

A little while later the graphic designer edits the template to make a few amends. The file is re-saved, this fresh copy no longer contains the fonts not in the document. The placeholder text having long gone, the font for it is not saved. The other fonts for the cheque are, they moved across and were catered for by the software in the updated template.

The software runs exactly as before, just the template file has been updated. However now the font is not found unless installed or cached on the computer or printer.

The programmer never had to embed the font, his/her third party library abstracted that requirement away. The programmer had worked with the library before and knew that it was best to use Helvetica because PDF knows that is a built in and therefore does not need to bloat the document with the default fonts. Any other font would add megabytes to the document. So there was probably no oversight by the programmer.

However there may have been a micro-manager that was 'responsible' for micro-managing the update to the template. This probably involved meetings and conference calls and deadlines for 'the project' on the whiteboard. Not wishing to overstretch programmer resources, the micro-manager took it on himself to make sure the programmer was not 'interrupted' so he got another lackey to upload the new cheque template. This all worked fine initially.

Had there not been a micro-manager then the graphic designer would have had to have worked with the programmer, without the micro-manager or his lackey. The programmer would have picked up on the smaller file size as this would be a noticeable change. Instinctively the programmer would have made a test run and, not having fonts on his/her dev box, would have detected the problem right away. Meanwhile, the lackey with no knowledge of things like version control just uploaded the template as told, blindly unaware of the requirements to 'check your work'.

Why didn't the micro-manager check that the font wasn't embedded? That is what I want to know.

ianai · on Sept 16, 2017

That one hit way too close to home. I'm the lackey in that story 40 hours a week.

MBCook · on Sept 15, 2017

Maybe because PDF has been around a long time and it used to be that embedding even a basic font would have been a MASSIVE increase in file size?

tedunangst · on Sept 16, 2017

While we're on the subject, PDF software typically supports embedding just a subset of the font, containing only the characters used in the document, to save space.

MBCook · on Sept 16, 2017

Really? Neat.

KekDemaga · on Sept 15, 2017

Why don't PDFs just render the text and store it as some form of vector image file.

wpietri · on Sept 16, 2017

PDFs come from PostScript. PostScript was invented in 1984. The Apple LaserWriter, introduced in 1985, was the biggest selling early PostScript printer. It had a 12 MHz processor, 1.5 MB of memory, and communicated at 0.225 Mbps.

To make this work effectively, printers would cache fonts. That saved on overall file size, which was important for storage and transmission. But the real driver was rendering speed. Most documents are pages of text at a small number of sizes and there are a small number of letters.

If you're going to print 300 lower-case "e" characters, all in 10.5 point Times New Roman, it would have been ridiculous to do the hard work of rendering the bitmap from vector each time. You render it once, cache the bitmap, and then just plop the bitmap in the right spot.

I know this because circa 1993 one client had me build a custom font that varied letterforms slightly to mimic a hand-lettered effect. They ran these weekly newspaper ads for their big wine store, and the were paying a guy to hand-draw the whole thing. They wanted to keep the casual look, but save on the cost. (And I presume the guy was kinda tired of writing the same things over and over, but I never met him.)

I learned enough PostScript to make it happen, decomposed each letter into strokes, and then drew the strokes with slightly different alignment each time. It worked fine in simple tests, but the first time I rendered a page, I thought I had broke the printer. Instead of the printer's top speed of 8 pages/sec, a full page took over 15 minutes.

So as usual with "why don't they just" questions, the answer is, "because 'just' is sweeping some things under the rug". It was harder than it looked at first glance.

MBCook · on Sept 15, 2017

Fonts can have different glyphs/hinting at different sizes and you don't know how big a PDF will be rendered.

cbhl · on Sept 15, 2017

That also takes more space. Understandably, when memory was counted in kilobytes and disks were counted in kilobytes, file formats were designed differently than they are today.

captainmuon · on Sept 16, 2017

Usually you store the text anyway, so you can select and search it, or have it read by accessibility tools.

And most of the time, the letter spacing is the default given by the font, so you just have to encode the position of the beginning of a run of text. So it is pretty space efficient, too.

kps · on Sept 15, 2017

A font is some sort of vector image file.

yellowapple · on Sept 16, 2017

Not always. Bitmap fonts are a thing.

zaxomi · on Sept 16, 2017

> This leads to an obvious question: "why did the programmers not always just embed the font?".

Historically the size of hard-disks have been very limited, so it was important to save a file with as few bytes as possible.

Also the serial transfer speed to the printer was very low, so it was important to limit how much data that was sent for each print job.