Hacker News new | past | comments | ask | show | jobs | submit login
Why It's So Hard to Design Arabic Typefaces (wired.com)
65 points by qzervaas on Oct 26, 2015 | hide | past | favorite | 62 comments



I'm Arab. And almost nothing on this article is accurate.

Arabs are so behind on IT. But on fonts and design I can say they are one of the best on the world. There's a huge number of open source Arab fonts on Arab websites.

It's not so hard to design an Arabic typeface, it's only more work. You need to work on every letter 3 times since it doesn't show up the same depends on were the letter is on the word (first, between two letters, or last) most of the letters need to only be edited depends on their place but a few are totally different. Note that you write Arabic letters "glowed?" to each others.

I think the things that stand on the face of Arabic fonts to be more used are that people used to Arial/Tahoma and that browsers aren't supporting downloading fonts smoothly yet.

Also if all software used UTF-8 by default it would be great. I see a lot of software trying to handle Arabic with some stupid hacks.


I am also wondering whether this a submarine PR piece for the startup mentioned in the article - TPTQ Arabic. The founder was the primary source for pretty much the entire article. If there were other people quoted (like actual arab newspaper editors, bloggers, web designers, app devs, or even arab readers) i would be less skeptical.


Sometimes authors try to exaggerate to get that "wow!" from the reader. It happens a lot when an author is telling you about a culture or climate you don't know about.


I am not concerned as much by the loftiness of the claims (i honestly don't have much reference to judge them by), as i am worried about the lack of sources cited for the claims.


> "glowed?"

"Glued" was the word you were looking for, "attached" or "connected" is the word you actually are looking for. It's a familiar concept to anyone who's written in cursive, which would include most of western europe actually (though in latin-alphabet cursive scripts, letters don't change that much based on their position, just the connection).


We used to have the long-s (ſ) as a special variant but I think that went out of style a hundred years ago. In German it's mostly seen in old Fraktur texts, so these days it's still sometimes found in newspaper logos set in Fraktur -- incidentally, contrary to what most people seem to think, the nazis actually tried to get rid of Fraktur and Sütterlin (the handwriting style at the time), there was just a lot of old signage around during WW2, which is why it has entered popular culture as "that weird typeface the nazis used".

German still has an sz-ligature (ß), but like the Dutch ij-ligature that has actually ceased to be considered a ligature and become an actual character (which is why the ij-ligature in Dutch sometimes appears as ÿ or Ü and why you never see German sz-ligatures separated into the actual letters "s" and "z" -- in fact, the ligature is instead rendered as "ss" if it can not be printed as a ligature and actually resembles an ss-ligature in most fonts, maybe except for Berlin road signs, which also sport a tz ligature).

Oh, and, because I grew up with computers I never adopted proper cursive handwriting and to this day still write in a slurred print without connectors (which is probably why it took me ages to develop a distinct signature).


>> incidentally, contrary to what most people seem to think, the nazis actually tried to get rid of Fraktur

Wiki says they were for it before they were against it: The Fraktur typefaces were particularly heavily used during the time of Nazism, when they were initially represented as true German script, the press scolded for its frequent use of "Roman characters" under "Jewish influence" and German émigrés urged to use only "German script".[6] However, in 1941 Fraktur was banned in a Schrifterlass (edict on script) signed by Martin Bormann as so-called Schwabacher Judenlettern ("Schwabacher Jewish letters").[7]


Yeah, it's pretty hard to figure what to ban as "too Jewish" when the criterions are so generic they can easily apply to everything the same.

Folk knowledge has it that the nazis got rid of Fraktur and Sütterlin because they figured it'd be easier to adapt to the modern script when you rule the world than to get the world to adopt to your weird local script. I'm sceptical as to whether there's any truth to that claim -- although Germans are known for ruthless efficiency, the nazis had a tendency to put their ideology first.


Unicode disagrees about IJ and ij… although there are glyphs for them, they are "compatibility decomposable characters" which basically means they are deprecated. The rendering software should handle proper collation, capitalization, etc. without using the digraph glyphs (e.g. Firefox gets this right).

(A similar situation applies for English di- and trigraphs such as "ffi" and friends.)


> German still has an sz-ligature (...) that has actually ceased to be considered a ligature and become an actual character

Can you explain why this happened? As a non-german speaker it always struck me as a stupid confusion between orthography and typography, but I assume there must be some historical reason for it?


In the modern (post-1997) orthography, ß and ss have different pronunciations and ß acts like a single letter (doubled consonants make the preceding vowel short, but ß does not)


I'm not sure when exactly it happened but the sz-ligature (ß) has been treated as a single character for decades if not centuries. However it's not always behaving like one.

For uppercase, instead of using the actual uppercase sz-ligature glyph (ẞ, see https://en.wikipedia.org/wiki/Capital_%E1%BA%9E) Germans will actually replace the "ligature" with SS. You could argue that this makes it an obvious ligature (although it would then be mislabeled as it's obviously an ss-ligature, not an sz-ligature) but Germans also write out umlauts as the base letter followed by e (ü -> ue, ä -> ae, ö -> oe) and these wouldn't normally be considered ligatures. The sz-ligature is also often called "sharp s", further showing it's not really thought of as a ligature.

According to Wikipedia the ß has been around since as early as the fourteenth century but its history seems a bit nebulous. I would think that it just used to be a common ligature before orthography became standardized in German and later became a character of its own when orthography became more standardized. Modern German orthography actually has a strict rule for picking between ß and ss: short vowels are followed by ss and long vowels are followed by ß (eliminating the ambiguity between Masse (mass) and Maße (metrics/dimensions)).

But another thing ß and the umlauts have in common is that they're not considered part of the alphabet. Children learn the same "Latin" alphabet as in English (except for the names of the letters, obviously) and if anything, Ä, Ö and Ü are added to the end on charts (but not typically in mnemonic songs or rhymes). Incidentally, ß is rarely added in those cases at all (likely because it has no accepted uppercase variant and only occurs in the middle or end of words, unlike A-Z and the umlauts).

And don't get me started on sorting. There are actually two legitimate ways to sort words in German: either you treat umlauts as vowel+e and ß as ss, OR you list ä after a, ö after o and so on. I think the former has become the norm for most intents and purposes but the other one can still be found in various places (including, I think, print phone books).

So, yes it's confusing, but German "Eszett" is not just a ligature and German umlauts are not just vowels with diaereses (unlike e.g. ë in names like "Zoë" where the diaeresis just indicates that the "oe" should be read as two separate vowels). Nobody really knows where this entire mess started but at least we generally have come up with consistent rationalisations for how we use them.


Marginally interesting, the logotype for Stuttgarter Hoffbrau uses the Fraktur version of the capital S. When I moved to southern Germany, I'll admit it took longer than I'd like to figure out that those bottles didn't actually say "Guttgarter Hoffbrau".


I'm from Cologne. The local newspaper is called "Kölner Stadtanzeiger".

Foreigners and children are always terribly confused why the title reads "Rölner Gadt-Unzeiger". Fraktur capital letters are pretty insane.

EDIT: Also, Hofbraeu -- umlauts are "escaped" by putting an e after the base character. This only adds ambiguity (and only in some cases anyway) rather than altering the meaning altogether (Bräu -> brew, i.e. beer; "Brau" doesn't mean anything, although if used as a prefix it can mean "brewing", e.g. "Braukultur" -> "brewing culture", but "Bräukultur" would be something else and doesn't exist as a word).


RE Edit: Yeah, I'm just too lazy to care if there isn't a confusion point when I'm on a PC (I'll go find an o-umlaut for schoen if I'm feeling adventurous). Apple makes i18n support like that much more convenient, I wish Microsoft would as well. Memorizing ASCII codes for characters isn't particularly nice.


As a native Dutch speaker I would disagree with what you wrote about the 'ij'-ligature. It's definitely not considered a single letter and I don't recall ever seeing it written as 'ÿ' or 'Ü'.


I'm not sure whether it's a Belgian or Dutch thing and what the context is, but I've seen it written as something very similar to ÿ (both in handwriting and in print) and I've also seen variants of a broken Ü as uppercase ligature.

I'm not saying it's particularly widespread or widely accepted, but I've seen it in real-world use by native speakers. For all I know it was just a stylistic choice or very experimental, but the same could be said about German uppercase SZ (ẞ -- vs the lowercase ß) and that one even has its own Unicode codepoint (although most people pretend it doesn't exist).

It doesn't seem too surprising either. German umlauts for example evolved a lot throughout fairly recent history (by European standards) -- we used to have a tiny superscript "e" instead of the two dots.

EDIT: To clarify: I was apparently wrong about the uppercase version (doesn't seem to have dots) but the case for ÿ seems pretty solid.


I was definitely taught to write it as a single letter, and that seems to be fairly traditional: https://en.wikipedia.org/wiki/IJ_%28digraph%29#/media/File:L...


Those combinations are phonemes, nothing more.

Addendum: The letters 'i' and 'j' look distinct (though very close) to me in that image. At first I thought it was just weird 'kerning', but the caption seems to say it's a single glyph. That doesn't need to mean it's a single letter though; it can merely be a ligature.

Upon further reflection, I'm not sure I understand what you mean by being thought to write them as a single letter. I was only thought cursive in school and I'm not sure what difference it would make to write them as one or two letters in cursive.


I've never seen it with the dots, but you do see y and a broken-U fairly often. Examples, including y with dots, at https://en.wikipedia.org/wiki/IJ_%28digraph%29


Agreeed. I may have remembered the uppercase form wrong. You're probably right that it was a "broken" U rather than a U with dots.


Check out any of Dijkstra's handwritten papers. He uses this variation when signing his own name.


I couldn't immediately find an image of that, but in cursive 'ij' would look quite similar to 'ÿ', I guess.


Ligatures. We have lots of them in old German fonts (Fraktur), too.


Ligatures are more specific to things such as œ, æ, etc.


In German you could argue uppercase letters are word-starting versions of lowercase letters.


Only if the word is noun. Also by the same reasoning uppercase letters are sentence-starting in english (not to speak of capitalized names and 'I' and all that stuff).


I tried writing English-style German. It might look like this:

    Palm Ström, etwas schon an jahren,
    wird an einer straßen beuge
    und von einem kraft fahr zeuge
    über fahren.

    »Wie war« (spricht er, sich erhebend
    und entschlossen weiter lebend)
    »möglich, wie dies unglück, ja –:
    daß es über haupt geschah?

    Ist die staats kunst an zu klagen
    in bezug auf kraft fahr wagen?
    Gab die polizei vorschrift
    hier dem fahrer freie trift?

    Oder war viel mehr verboten,
    hier lebendige zu toten
    um zu wandeln, – kurz und schlicht:
    durfte hier der kutscher nicht –?«

    Ein gehüllt in feuchte tücher,
    prüft er die gesetzes bücher
    und ist also bald im klaren:
    Wagen durften dort nicht fahren!

    Und er kommt zu dem ergebnis:
    »Nur ein traum war das erlebnis.
    Weil«, so schließt er messer scharf,
    »nicht sein kann, was nicht sein darf.«


Yeah thanks. I write French that way too.


I have to disagree with you on the point that designing an Arabic font is an easy feat, unfortunately it isn't. Maybe for the run-of-the-mill, cookie-cutter typeface, this could be true but if you'd like to import a certain style to the Arabic script, you'd face the usual constraints and limitations of the Arabic letters that they are not as malleable and flexible as their Latin counterparts esp. if you're aiming for 100% style consistency and uniformity across the set.

This is a fixed monospaced Arabic font [0] and I'm pretty sure it took the author a whole lot of time and effort to pull this great work off.

[0]: http://makkuk.com/kawkab-mono/


There's almost no use for a fixed monospaced Arabic font on Arabic, and even so there's plenty. Most designed Arabic fonts I see today are for the purpose to be used on other designs like on Photoshop, and there's like more freedom and coolness on them than a good font that you will want to publish a book with or use it on your blog.

I think it's the same reason Japan still sticking with their old school looking websites, we are still sticking with our old school fonts.

I think also when there's not much Arabs on big IT company, you never have that Arab guy that can advice that shipping good Arabic fonts on the next OS will be good (except for Linux(s)... Arabic fonts are fine there). I notice also the Arabic version on popular websites normally have a poor choices of words.


I have no idea why this comment has downvotes--what's wrong with it? If there is actually an inaccuracy in Mimick's argument, I'm eager to hear it, as a naif in Arabic typography.


It's best to ignore early downvotes.


Downvotes should be deactivated. If someone really disagrees, they should do better than just throw zero bits at people without explanation. Downvotes only cause confusion and don't improve the conversation any.


Down-votes are a way of distinguishing between posts that actively degrade the environment versus those that are just passive noise (like redundant comments.) Fanatical behavior thrives without them.


To better understand what actual Arabic looks like take a look at this phrases which is also available as a single Unicode code point and the same thing written using regular characters:

    ﷽ vs بِسْمِ ٱللّٰهِ ٱلرَّحْمٰنِ ٱلرَّحِيمِ

In my understanding, before computers even when people were writing Kufi they never used the same shape for a letter across their writings.


For the curious, https://en.m.wikipedia.org/wiki/Bismillah_ar-Rahman,_ar-Rahe...

If people want to learn to read arabic a bit, at just the signboard level, to get by if you happen to be stranded in the Middle East or something, grouping the characters into similar patterns can make it a lot easier: http://gituser1357.github.io/arabic-alphabet.html


Does Arabic have a similar enough set of phonemes to English to be able to write English using the Arabic script? Not that it's a particularly useful thing to do in of itself, but the script itself is beautiful, and I'd be interested to play with it.


You wouldn't be able to get some very basic English sounds with only the Arabic letters used to write Arabic. (For example, /v/, /p/, and the distinction between /dʒ/ and /g/.) But the Arabic script has been used to write a whole lot of languages (some of them completely unrelated to Arabic)

https://en.wikipedia.org/wiki/Arabic_script#Languages_curren...

and sometimes by adding additional letters or reinterpreting letters to represent phonemes that Arabic doesn't have

https://en.wikipedia.org/wiki/Arabic_script#Special_letters

It's much like the way the Hebrew alphabet was modified to write Yiddish (a Germanic language), including the addition of new letters

https://en.wikipedia.org/wiki/Yiddish_orthography#The_Yiddis...

One example in the Arabic script is the letter pe (پ), which is very basic in Persian (an Indo-European language written with the Arabic script) but doesn't occur in the original Arabic alphabet.

https://en.wikipedia.org/wiki/Pe_%28Persian_letter%29

https://en.wikipedia.org/wiki/Persian_alphabet#Changes_from_...


Theoretically, Arabic no but Persian yes. I can say with confidence that Persian have phonemes such as /v/, /p/, and the clear distinction between /j/ and /g/ because you might find this funny that Standard Arabic doesn't have a /g/ phoneme but in other derivative variations like Egyptian, Yemeni, Omani, it's featured prominently, and when they like to denote that, they esp. Levantine people usually resort to write it as غ like in «Google غوغل» which is the equivalent of the Parisian /r/, which is in my opinion it's still wrong, very wrong actually.

It's safer for you to go with Persian other than Arab if the existence of these equivalent phonemes is a priority for you.


It worked for Afrikaans[0], so I suppose it would also work for English.

[0]https://en.wikipedia.org/wiki/Arabic_Afrikaans


Here's example of using Arabic script to transcribe an European language:

https://en.wikipedia.org/wiki/Arebica


English script doesn't, and we use it for English language, so why not?


Yes you can you'll have to use an extended version of the alphabet (the characters that have 3 dots on them) and you'll also be kinda forced to use diacritics


Is that long letter what they call a kashida?


Yes.


It is more also a problem of the software to support RTL. For example, I'm using the Linux systems a lot, and love the console interfaces. Sadly only a few terminals has a good support of the RTL + BiDi(rectional text) + Arabic script: mlterm, konsole. Even the gnome terminal has problems with RTL + BiDi. [1] And those are only a small part of the all terminals presented in the *nix world. Compare for example with this list [2], which I made for the tracking support of the True Color (24 bit color). I wish that all of these terminals (which are actively maintained) will have the support of RTL languages in general, and Arabic in particular.

[1] https://bugs.launchpad.net/ubuntu/+source/vte/+bug/263822

[2] https://gist.github.com/XVilka/8346728


So if I'm not mistaken (my wife's family is of Lebanese descent, so I did short course on Arabic years ago; also I had to learn a little when I was working on LibreOffice) but the difficulties with Arabic are:

- Each character can have a final, medial or initial form; it all depends on where the character is located in the word

- it's a cursive script, so letters join together...

- it uses ligatures in (what appears to me at least) complex ways

- kashida elongates letters, but there are typographic and aesthetic rules for when and how it should be used.

- there are complexities galore around diacritics [1] and text justification [2]

1. http://arxiv.org/pdf/1107.4734.pdf

2. http://www.tug.org/tugboat/tb27-2/tb87benatia.pdf


I love square kufi even though I can't read it, I assume it's impractical for a typeface? http://www.sakkal.com/instrctn/sq_kufi_hashem.html


I often think about the strange coincidence that English is the easiest language to represent in a computer. Only 26 characters, with no accents, all independently placed in series. The only awkwardness is the existence of both capital and lowercase. (How much simpler would things be if case-sensitivity wasn't a consideration?) But at least capitalization is one-to-one in English, unlike German's "ss" rule. All this makes it easy to make a keyboard for English typing, too. Every other language and script I can think of is more difficult to handle in software. Perhaps Cyrillic is similarly computer-compatible, but I don't know enough about it to be sure.

It is just strangely convenient!


So are fonts for Japanese and Chinese.

Font designers have to literally author each letter and there are thousands of them...

Fortunately, there are enough interests in Japanese fonts, so there are varieties of them. The problem is that many of them are fairly expensive to license perhaps because of work involved making them. (Thus, most of people use ones come with the OS; in case of OS X, they bundle premium fonts. Japanese government also distributes free fonts,[0]) I don't know the situation for Chinese, but probably similar situation.

[0]: http://ipafont.ipa.go.jp/


I know nothing about font design; as far as I thought, I assumed that the computer read some given vector data from a font file, and then displayed them, with maybe a bit of logic to take care of things like kerning.

How does modern software render fonts that have letters that change drastically based on their position, like Arabic?



Have a read of http://behdad.org/text/ (at least for Unix based systems).


another plug for the Noto typeface - one of the few with Naskh Arabic and Urdu/Persian Nastaliq typefaces (https://github.com/googlei18n/noto-fonts/issues/39)


>Typeface design has a western normativity problem

Why does it seem like everything has a western normativity problem? What things have an eastern normativity problem?


It's a silly statement. Type, composed of little chunks of metal placed adjacent to each other, has a problem with scripts that aren't practically decomposable into little chunks placed adjacent to each other. But that's equally true of ‘western’ cursives and equally false of ‘eastern’ non-cursives, including Semitic scripts like Hebrew and Ethiopic, as well as South Arabian¹, which would have been perfectly suited to type but was displaced by the cursive Arabic script.

¹ https://en.wikipedia.org/wiki/South_Arabian_alphabet


uhm, you do realize South Arabian alphabet was last in active use more than a thousand years ago. It now bears no resemblance to modern Arabic type. At all.

I'm reading your argument as suggesting that over a quarter billion Arabic native speakers should have learned a new script? Not sure if that was sarcastic. Either way, if you are making this argument, it is easier for these speakers to learn English instead..


The point is that the practicality of movable type for a script has nothing to do with whether or not the script is ‘western’.


Have you considered the trajectory of history over the last 500 years or so.


You'll have to enlighten me.


not the gp, but I think the parent commenter means that most technological advances of the last century took place in the West and thus suffers Euro/Western-centrism.

A thousand years ago, when the Islamic caliphate was a center of science and research, Farsi and Arabic were the major languages of scientific publications and inquiry, and there was likely Eastern-centrism in many of its aspects.

Software, programming language, web-design, typography, etc. was all developed for the west and internationallization was not a major concern. Consider, for example, the timetable for the development and standardization of ASCII vs UTF-8, as well as trends of the use of UTF-8 in the web (vs. ASCII) and browser support for UTF-8. Western-centrism is not the result of malice, but it still happens.

Take Medium, for example, which launched a few years ago without proper support for RTL languages and still doesn't do a good job at that. This is a classic trade-off you do when building an "MVP". The trouble is that there is an entire ecosystem where reasonable software-engineering tradeoffs read to one culture/set of cultures being prioritized over others. Short of people of RTL cultures creating their own twitter, medium, etc., there will always be a gap.

Of course, Arab developers, like many others, do try hard to bridge the gap. Edraak is a fork of the edX MOOC platform that added proper UTF-8 and RTL support to edX in the last year or so.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: