Because a) first fit, best fit or something else is trivial to implement, having a good total fit implementation that works in a lot of circumstances is not trivial (I have re-implemented a simpler version of the algorithm in Ruby including using a hyphenation library). b) you can ask the same question: why does almost no word processor implement this algorithm? And with word processors it is much more important! The answer is simple: Most word processors are doing many things badly so having a total-fit h&j (hyphenation and justification) algorithm would only make the output slightly better but still much inferior to good book typography.
why does almost no word processor implement this algorithm?
In TeX the line breaking depends on the whole paragraph. I imagine it could be visually quite distracting if the line-wrapping jumped around in a WYSIWYG word processor as you added text to a paragraph.
Edit: ah. I see this is noted in the posted article’s To-do section:
Figure out how to deal with dynamic paragraphs (i.e. paragraphs being edited) as their ratios will change during editing and thus visibly move around.
Try editing paragraphs in InDesign. In practice, it’s really not much of an issue, because paragraphs don’t actually jump around all that much, and to the extent that they do, it’s really not particularly distracting.
I tried it in Photoshop which since CS2 (or so) has an implementation of the Knuth & Plass algorithm. I found it quite distracting myself. Granted, I was on the lookout for text jumping around and had hyphenation disabled.
What about a menu (ribbon) item called "apply very nice typography to your document" which recalculates/reformats the paragraphs? Would this be a solution?
This is exactly why I prefer authoring documents in LaTeX vs WYSIWYG tools: I can concentrate on semantics and not worry about representation. Writing stuff in Word forces you to be distracted by its appearance on the page and can drive me crazy with micro-adjustments. With LaTeX, I just write the document and the computer does what computers are good at doing.
> I just write the document and the computer does what computers are good at doing.
Which is: Interpret your instructions very literally, so you still need to go over your document carefully to fix problems like urls running off the edge of the page?
Smart-arse joke aside, I agree with you that LaTeX is generally great, but it's not perfect, and sometimes it is difficult to make it do what you want.
Maybe. But there are probably all sorts of annoying details to deal with.
For example, the pagination of the document could change, and so that would lead to further editing to fix it. Then how does line-breaking work during that editing phase, or in general after applying the menu item?
I don’t actually use a WYSIWYG editor any more, so I’ll leave my speculation there.
> Or perhaps: the users don't raise their voices to have something better?
I think that probably most people don't realize there is something better, and can't even really tell the difference between typical word processor output and nice TeX-style optimal line breaks. It seems to me most people don't care that much about typography.
Alas, typography is one of those fields where most people only ever notice when you get it wrong.
A document presented with good typography might be easier to read for lengthy periods without losing concentration. There might be fewer distractions, like rivers of space running through the text, hard-to-read shapes due to poor kerning, or chunky CAPITALS and long numbers where small caps and old-style figures would not have disrupted the flow. There might be subtle visual cues to help the reader understand the material more quickly, like moving captions and headings closer to the subject material or spacing out bullet lists a little so their items are clearly separated.
But in a world where most people using word processors don't know what a stylesheet is and emphasis tends to consist of centring text, setting it in bold, capitals, and double-underlined, and then hitting enter a few times either side to space it out a bit, I think we can safely assume that most people either don't know about the subtleties of good typography or just don't care. The world might be a slightly better place if leading word processors all adopted better typographical conventions by default, and a few of us would appreciate the ability to produce better quality results, but I'm afraid it's never going to be a selling point for most people.
Word processors may be a lost cause, but we're in the middle of an e-book war (apparently), surely someone is going to display standard epub books better than anyone else and steal that all important typography snob demographic?
(b) is not correct. The time complexity of the total fit algorithm (as described in Knuth&Plass' paper) is almost the same as the simple (first-/best-fit) method. They use a technique called dynamic programming to divide the problem into pieces where the perfect line break of a sub paragraph is valid for optimal breaking of the whole paragraph.
It is definitely slower. The computational complexity of the Knuth-Plass algorithm might be of similar (or maybe even the same) order as just fitting as much as possible on each line, but it definitely at least has a different constant out front. I’m not exactly sure what the speed difference is, but it’s non-negligible.... though on modern hardware either way is absurdly fast, and there’s no reason the Knuth method couldn’t be done in real-time even for quite large documents.
If you don't know the constant, how can you say that it is non-negligible? I doubt that one can measure the difference in speed rendering an average webpage with either algorithm.
And your remark about real time: Even the old (and slow!) TeX can do linkebreaking for several hundred pages of text in a second. Including writing pdf. So you are probably right about real-time h&j even for quite large documents.
Knuth-Plass takes O(kn) where n is the number of words and k the number of words per line. That’s definitely slower than the naive line-by-line method.
But the better algorithm can be supposedly improved with the method in this paper (I haven’t read it and don’t have time just now, so I’m not sure how much slower it would then be than the naive way).
I haven't really wrapped my head around the paper yet, but I think it only holds if you take out some of the elements of the original Knuth & Plass algorithm (the box, glue and penalty model.) Nevertheless, linear time seems possible for a total-fit algorithm if you're willing to give up that model.
Nobody denied that it is slower in theory, but I still claim that you won't be able to measure the difference on an average web page. So time is not an excuse for not implementing this (and this was the original question).
Yeah, I never would have disagreed with that. Which is why I said “Both seem like pretty weak excuses to me.”
By “non-negligible” I meant something like 2–10 times slower (you’d have to test to be sure... I tried searching around but couldn’t find any direct speed comparisons). For typical pages it’s not going to matter, but if you have a page like the HTML5 spec, where it takes a half a second to resize a window now, it’s going to be noticeably slower using a more sophisticated paragraph composer.
Keep in mind, Safari doesn’t even do kerning, because they’re afraid it’d be too slow. (Also a ludicrous excuse for desktop hardware and typical web pages, IMO.)
And the Safari guys also use the same excuse (“it’s too slow”) for not doing real color management with CSS/HTML colors.
Before we make fun of the webkit folks for not doing the proper work because "it's too slow," remember that these are the same folks that we praise for making the fastest dang HTML rendering engine on the planet. (Or near enough.) Their culture of performance has significantly raised the standard of web browsing performance, and that's not something to take lightly.
I imagine it would have to make a fairly significant difference in rendering quality in order to make any speed losses worthwhile. Death by a thousand cuts and all that.
I would like to have two browsers of the same type, one with high(er) quality typography and one as it is now. And then compare the speed. That would be interesting.
Oh, sorry. I think maybe these are ones that have full text available if you arrive via a google search? I don’t think I’m currently logged into anything special.
Speed is a pretty important reason, and users not caring is also a pretty important reason. Users do care about speed, and don't care about absolutely ideal line breaking; so why trade off something they do care about for something they don't?
Well, it was an important reason in 1990, maybe, or even 1995. But we’ve had a couple of orders of magnitude improvement in computing speed since then. There’s no reason to favor some marginal speed advantage (tenths or hundredths of a second) to better layout.
Tenths of a second can be pretty damn significant. Every tenth of a second extra load time leads to approximately a 1% drop in sales, according to research done by Amazon.
One of the big points of competition between browsers these days is in speed. Part of the reason people switch from IE to one of the more modern browsers is because they are just that much snappier.
Maybe. Though the only time there’s going to be a noticeable slowdown is when you’re dealing with huge amounts of text. Like an amount that will take 20 minutes or an hour to read. At that point, an extra 10th of a second to render, in return for a much more pleasant reading experience seems like a no-brainer trade-off.
On a typical Amazon page the difference is going to be unnoticeable.
You can just do the fast algo. first, and do the more sophisticated one, when your browser is idle. (The text will jump around a bit, but it does so anyway while loading.)
In the end, most of the big browsers are open-source, so I guess you could try submitting a patch, but I agree with most the answers here about performance, laziness, and the fact that users don't care.