Hacker News new | past | comments | ask | show | jobs | submit login
Using hyphenation and justification on the web (alistapart.com)
44 points by duck on Sept 7, 2010 | hide | past | favorite | 31 comments



This is the sort of thing that belongs in the browser itself, not specified in the HTML for each page. We should be able to write text in HTML as ASCII, and if the HTML says "justify this text," the browser should know the hyphenation points for all of the words. It should be the responsibility of the rendering engine, not the content.


I don't know if hyphenation should be automatically on just because justify is, but a separate CSS property, or a hyphenate option on the "word-wrap" property, would certainly be a more elegant solution than all of these scripts.


Justification implies hyphenation elsewhere. Personally, I think justified text is fugly without it.


TeX justification really needs to be ported to browsers. Are those hyphenation algorithms really that hard to implement?


Are those hyphenation algorithms really that hard to implement?

Yes: see (for example) http://en.wikipedia.org/wiki/Hz-program .

A really good H&J algorithm is doing a significant amount of work. It has to think about things like successive hyphens (which get distracting quickly), rivers (the unpleasant vertical lines of whitespace that appear especially in over-spaced text), and so on. This involves lots of backtracking and image processing, and it fires every time the layout changes. It’s all possible to do in a browser someday, but it’s not trivial.


> TeX really needs to be ported to browsers

FTFY

Seriously though, browsers need better typography in general. http://nitens.org/taraborelli/latex


No, the challenge here is typically with having a high-quality hyphenation dictionary.


I don’t think implementation is the problem. Standards are. This really should be done by the browser but it’s nowhere to be seen in the standards process.

It wouldn’t be strange to have a CSS property that controls hyphenation, ‘text-overlow:ellipsis’ (which adds a ellipsis before the text overflows [+]) also changes the text at that level.

[+] http://dev.w3.org/csswg/css3-text/#text-overflow


Lack of standards haven't stopped browsers from implementing features in the past.

True hyphenation requires:

0. A property to enable or disable hyphenation 1. Hyphenation algorithm (mostly a solved problem) 2. Hyphenation dictionaries: One per language, and need to be high quality -- OpenOffice has the best open source one, but generally these are not very common

I'm not convinced that 0 or 1 are the blockers here.

There could be another issue: speed. Doing proper hyphenation and justification slows layout.


Look what I found: http://www.w3.org/TR/css3-text/#hyphenate (That’s the current CSS 3 Working Draft.)

It seems I was a bit too brisk with my assertion. Hyphenation is coming (at some point in the indefinite future), only the implementation details seem to be not yet agreed upon – not even in a rudimentary way – which is probably why no browser dares to implement it.

The Working Draft links to another Draft where a possible implementation is detailed [1]. Looking around the public mailing list it seems that some have been barking up that tree for a long time.

Standards really don’t seem to be the holdup. At least some people have campaigned for the inclusion of hyphenation in the past.

I still think that implementation isn’t the problem, either. This seems to be one of those eccentric typographic details that is ignored and can safely be ignored because the web doesn’t depend on it (kind of like kerning and ligatures which are now only slowly finding their way into browsers). It’s also not exactly flashy like rounded borders, drop shadows or gradients. That might be the reason why browser vendors are dragging their feet. They could implement it but it’s not super-trivial so they rather wait and see.

[1] http://www.w3.org/TR/2007/WD-css3-gcpm-20070205/#hyphenation


A dictionary is insufficient for languages that use compound words such as German and Dutch. You also need some algorithm for taking apart such compound words.


Definitely -- but the algorithms are well-known here, right?


Read a Dutch newspaper for a while, and you will not make that claim anymore. I do not know how well LateX works here, but I would guess that it only inserts 'safe' hyphens, using a word list. Without a word list, there simply is no way to know whether e.g. 'verstoren' is 'ver-storen' (a verb meaning 'to disturb') or 'vers-toren (could be a tower to store fresh stuff in or to sing from, but AFAIK is not an existing word) 'verst-oren' (AFAIK a meaningless word) without understanding the semantics of a text.


Just rip it out of LaTeX.


There are also opinions that text justifying is a "crime".

Some (http://line25.com/articles/10-usability-crimes-you-really-sh...) say that justified text is hard to read for Dyslexic users.

Others (http://www.v7n.com/forums/web-usability/42975-justify-not-ju...) claim that ragged right side can help "keeping place" in the text. (but why then the books are justified?)

Anyway because of that and because it is troublesome to implement justification correctly I think adding those soft-hyphens isn't worth a trouble.


It’s not so bad if you use hyphenation appropriately. Which, as the submission explains, is sort of doable on the web but has serious downsides.

What you would really want is a fancy justification algorithm like the one InDesign uses (it takes into account not just the current line but the whole paragraph and it avoids, among other things, white rivers in the text) with lots of knobs to turn.

That’s currently just not possible on the web which is why justification is often not such a good idea even if you can find workarounds. But it’s certainly no crime. (All books, newspapers and magazines do it.)


I'm not crazy about hyphenated, justified text in the first place. It may look pretty, but I get briefly jolted out of the flow of the text at every truncated word.

Maybe it's a relic of printing processes that can be left behind.


A good typesetting program like TeX minimizes the occurrence of broken words. The sample page the article linked to (http://readableweb.com/ala/booklook/lanhamvolatilehard.htm) looks pretty awful to me, way too many hyphens. You don't see that many broken words in professionally typeset text.


TeX is a very good typesetting program.


One of the most irritating things about the Kindle, apart from the overpriced books and lack of second-hand sales (which would apply some pricing pressure), is its lack of hyphenation.

The text looks absolutely hideous at large font sizes. I almost think it would be better non-justified.

I don't get jolted out of the flow of text with good hyphenation. It's more distracting to me to see irregular spacing in the line. The extra spaces when the justification is working very hard feels slower-paced, and the dense lines feel higher-paced, giving this weird rhythm, something like: "sooo tthheee mmmaaannn walkedacrossthe roooaaad".


I'm much more a fan of using justified text with a minimal amount of hyphenation and word-splitting used only for really long words to prevent jarringly large spaces between words. Justification in general looks and reads fantastically well, but I agree that a lot of hyphenation and word-splitting makes it worse in both cases than a ragged right edge.


I never notice it when reading books. You do?


More so in newspapers and magazines than books, but those, too.


That's surprising to me. The only place I see un-justified text is online. Books, magazines, newspapers, even the research papers I read all have justified text with hyphenation.


How on earth did you get from your question and my response to that? You're not merely making a non sequitur, you're misrepresenting what I said.


My line of thought: You notice hyphenated words when reading. I had to ask myself, "Do I?" I don't recall noticing, but it's hard to determine that you don't notice something. So I then had to think about my reading material. Online, most text is not hyphenated - that is the genesis of the discussion, so I knew that immediately. Then I realized that the majority of my reading material is hyphenated - books, newspapers, magazines and research papers. (I even had to check some research papers to verify it.) So I concluded that I don't notice hyphenation, since I a) can't recall noticing it when reading, and b) I encounter it a lot because most of my reading material has it.

I was surprised that you did notice it, since most books are hyphenated. And now I get to ask: how is what I said a non sequitur, and how have I misrepresented what you said?


"TypeSet[1] is an implementation of the Knuth and Plass line breaking algorithm using JavaScript and the HTML5 canvas element. The goal of this implementation is to optimally set justified text in the new HTML5 canvas element, and ultimately provide a library for various line breaking algorithms in JavaScript."

[1]: http://www.bramstein.com/projects/typeset/


I think the deal breaker here is that search only works on FF.


I seem to remember that justification makes it harder to read much in the way all caps do. Ragged edges supposedly helps you stay in the right line when reading. Does anyone have any info on this?


It is a "well-known fact" on the internet, based on my searching, but I couldn't find a definitive source on the matter, even though I'd swear I've seen one. (Perhaps I too am simply remembering the "well-known fact".) I even found several people asking for the definitive source and coming up empty.

The best justification (no pun intended) for using "ragged text" I found was under the heading "Text Alignment" on http://community.infragistics.com/ux/articles/text-treatment... .

It is also my opinion that full justification may make the text block as a whole prettier but makes the text significantly harder to read. However, soft hypens and zero-width spaces still have a role to play on the web even so because justified and ragged-right text both have problems with very, very long words, and on the internet multi-hundred character words actually show up with some frequency, mostly as URLs.


I've heard the "line identification hypothesis" mentioned a few times in regards to preferring ragged-right, but I'm pretty sure that's just a guess. It seems readers do find fully-justified text harder to read on the web sometimes, but I'd be willing to bet that it has a lot more to do with column width, leading and word/letter spacing. (Without appropriate hyphenation, word spacing can vary greatly in justified text, especially when lines are short-ish and words are long-ish.) I've found that a good screen-purposed serif typeface set with extra leading (1.2 em seems to be about right) and a line length of 35-40 ems works very well as justified text for "publications". Decent automated hyphenation ought to make things a lot better.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: