Hacker News new | past | comments | ask | show | jobs | submit login
Who reads digitised Malay manuscripts? (blogs.bl.uk)
56 points by diodorus on Oct 20, 2021 | hide | past | favorite | 28 comments



I made a police report in Thailand many years ago at the local police office.

The lady at the desk insisted that, rather than writing my home address down, I pronounced it so that she could transcribe it into Thai script.

I always wondered what they would use my home address in Thai letters for.

I am sure the Swedish postal service would have a hard time if they ever decided to write me a follow up letter.

I also wonder what it would look like if they realized that it should need to be in Latin alphabet and they had to translate back it before sending.


The Swedish Post Office had at least before a department for unreadable addresses, that would put great pride in finding the correct recipient. I have sent letter addressed to "the red house with blue door, if you take the first road right from the ferry stop to this island in the Stockholm Archipelago", and received a letter with the address as a rebus.

It might have change now, but many years ago I guess the letter addressed to you in phonetic Thai would have got delivered!


I once received a letter addressed to: Dotan, Sarid, Israel

That letter crossed the Atlantic ocean, the European continent, and found my one-room lodging slightly over 10,000 KM away. I like to think that this is some kind of record character count vs distance in modern times.

Though this might be the most impressive feat of decyphering a letter, translating handwritten kio-8r displayed as latin-1: http://mottr.am/2012/09/18/letter-to-russia/


Some years ago while living & working in South Asia, I tried this with a letter to my family back in England, writing the address in Devanagari script. (Another time, also tried it with Urdu.) The UK post office managed to decipher and deliver them just fine.


That was probably many years ago.

One of the doubtful benefits of tax-financed government monopolies.


This comment got a -1 rating.

To clarify I am not a die-hard neo-liberal (in the original meaning). I am Swedish after all.

Just that the team that had the hobby to be mail-detectives, could instead have been employed by the government as nurses or caregivers.

Or even improved the package delivery services in the postal service (very much needed).


The reason for these departments is to maintain the secrecy of correspondence ("brevhemlighet"). This department is the only one within the postal service that has the right to open mail and try to decode who is the recipient. The rebuse or other shenanigans is not the majority of letters, but unreadable handwriting, wet or in other ways damaged so the recipient cannot reliably be established.


Opening mail to preserve secrecy of correspondence sounds rather counter-intuitive.

Why not just shred them?


This is an interesting question in general. Many languages have undergone a change in writing method (generally towards Romanization): Turkish, like Malay, has gone from an Arabic-inspired to Latin script, Vietnamese from Chinese characters to Latin script, and various ex-Soviet languages like Moldavian and Azerbaijani have moved from Cyrillic to Latin script. What happens to historically important works written in the old style? Are they republished with the new alphabet?


What is mentioned the website is basically the Jawi script.

I am a Singaporean and got married to my wife based in Kelantan, Malaysia and i had a culture shock when i had to see Jawi manuscripts on every shops here in Kelantan.

For shops signboard in Jawi is also common in Terengganu. Even my marriage certificate is in Jawi script. For most of the religious education, they still use Jawi as a script to master before they can become Islamic scholars. I think it also applies to Indonesia in the same context.

The Jawi script in Malaysia is preserved because it is under the King/Sultans decree for it to be an official manuscript and the usage for education and official matters.

Writing a keyboard for Jawi script is relatively easy when mapping phonetic latin alphabets with the same sounding Jawi (arabic alphabet) (eg. A = aleef, B = ba)

I had written the keyboards mapping for Windows, UNIX (xkeyboard) and iOS/Android but at that time there weren't any support but now there is (Called "Malay Arabic")


It's pleasing to see this kind of...is arcana the right kind of word?

We once visited an SMK Agama school because they were having trouble explaining their issue to the support desk. They were creating Word documents with a custom Jawi TTF that was...weird (sometimes it was one-to-one with sensible ASCII equivalents, but ligatures seemed to be implemented in completely random parts of the Unicode space). Nobody really seemed to know where the TTF came from either, it just sort of bounced around the aether via email, Dropbox, and ancient Google results.

They were naturally confused that copying and pasting perfectly "legible" Jawi script from Word into our <textarea> was rendering it into gibberish.

But half of us had never even heard of Jawi, we just assumed it was Arabic (but it isn't, apparently, it's more subtle than that).

We didn't quite want to commit to supporting the font directly, because it opens all sorts of questions about what to do if the receiving user doesn't have that font installed, how many fonts do we support, what if there's another font that uses different mappings (which there was)?

We did, however, produce a document that pointed to https://www.pendidik2u.my/cara-betul-install-jawi-di-kompute... with some custom explanatory text and steps, to support users who wanted to write Jawi directly in a more Unicode-friendly way (we didn't call it that, we just explained that other users "wouldn't need the font", which they approved of).

I'm not sure how many users this ultimately helped, or if they just gave up and embedded Word docs instead of using native platform text, and it's unfortunate that there isn't a more formal, one-click-install keyboard for this on the Microsoft Store.

(I also seem to remember a Tamil user who said that there might be a section of Unicode called "Tamil", but it's missing a tonne of characters they would like to use, so Unicode still has some way to go I guess)

But it is one of my favourite bug reports :)


Thanks @criag0990 for sharing your problems with the Jawi Keyboard. Didn't know about Microsoft Store.

I'm using a macOS so i didn't notice - also macOS already has their own "Malay Arabic - Jawi" Keyboard built in so thankfully i didn't need to make one for macOS.

But i'll try to upload my version up to Microsoft Store (which is not SIRIM but based on phonetics from latin alphabets to make typing more natural as you don't need arabic stickers or arabic keyboard - You can use your normal QWERTY keyboard.

you can download it from here: https://jawikey.com but i will try to get it in Microsoft Store. :-)

I am not sure about the font because we use Ubuntu fonts and it works perfectly fine. (maybe missing for "va" character)

As for the font is that as long the font supports those jawi-specific unicode characters everything should be fine. For now, things have gotten a lot better with Jawi as compared 6-7 years ago.

The version from your URL is actually the official Malaysian SIRIM Standard which is based on the Arabic keyboard.


Ah, I see, so yours is using a different key mapping, but still outputs the correct Unicode characters? I remember the stickers on the keyboards :)

By "weird TTF font" earlier, I meant they had a Jawi font that basically worked like Wingdings - they were typing a genuine ASCII "A" (0x41/0x61), but the font was "rendering" an aleph (I think). So it looked OK with the font. But it wasn't "proper" Unicode.

I'll keep https://jawikey.com in mind for the future though :)


> (I also seem to remember a Tamil user who said that there might be a section of Unicode called "Tamil", but it's missing a tonne of characters they would like to use, so Unicode still has some way to go I guess)

Tamil Unicode is perfectly usable; this sounds like someone who didn't understand the encoding model, and expected to see a separate Unicode character for each consonant/vowel combination.


That is entirely possible. As somebody who doesn't read/write anything beyond English, I didn't want to disagree.

The "virtual keyboards" I've seen in use for non-Latin writing systems seem cumbersome to me as an outsider, so perhaps they found it difficult to produce the right combinations to achieve what they wanted.

I'm always a bit disappointed that my response to the issue of trying to digitise someone's culture is to kinda shrug and say "that's the best we've got".


Jawi is basically Arabic script it just has a few extra glyphs to handle phonemes that don't exist in standard Arabic. For example /p/ or /ng/ (velar nasal).

Anybody who can read the Quran and speak Malay can also read Jawi with minimal effort, which is most of population of the Malay archipelago.

A similar but distinct script was used for languages of Indonesia, e.g. Javanese and Sundanese.


That would be Pegon, https://en.wikipedia.org/wiki/Pegon_script.

However, both Javanese and Sundanese also have their own distinct writing systems that are not based on Arabic (https://en.wikipedia.org/wiki/Javanese_script, https://en.wikipedia.org/wiki/Sundanese_script) and would not be at all readable to an Arabic reader.


Not sure how true this is, but in Singapore I've heard that the only use for Jawi left is in the weekly Friday sermon (I just checked - weekly khutbahs are uploaded weekly in Jawi). It's a very critical aspect of making language relevant. Unfortunately I doubt this will continue for long, especially as senior religious leaders (who grew up reading Jawi) eventually pass.


I've never seen a sermon in Singapore (usually shown from a projector) in Jawi. Maybe the Imams there do read the same copy but in Jawi and the congregants in latin alphabets because i don't think the masses can read Jawi.

Maybe i should check on MUIS's (Singapore's Muslim Religious Authority) website to see if there are Jawi copies of the sermons.

On another note there is Arwi which is an Arabic-Tamil script, which is in a more dire state than Jawi. Unlike Jawi, Arwi is not preserved (or protected by authority) in anyway like how Malaysia does and most of the corpus written in Arwi is now being eaten by termites, which is a really sad thing that the knowledge specific to the Indian muslims is getting lost in time and is not being digitized.


That's interesting!

My mother grew up in Perak (40s and 50s) and I just asked her about this and she said she doesn't particularly remember seeing Jawi (she was from a different community). Of course all those pre-independence documents like my parents' wedding license are in English!

I don't remember it much from various 60s/70s visits as a kid and it's the kind of thing I used to looked out for. Times have changed!


Perak is on the west coast and was one of the first states to come under British influence. The states the grandparent mention are on the east coast and tend to be developmentally uh at an earlier stage.


Definitely going to agree with that “uh” but things in the 60s and 70s were really different.


Even within the languages that have always been written in say the Latin alphabet, there's a wide variance of scripts.

Reading document written in Roman cursive or Renaissance handwriting requires quite a bit of training.


A lot of languages change their script for political reasons, rather than for convenience, and it shows.

If your language has more than 0 diacritics or if you have eight pronunciations for words that have an "ough" in them, then please re-evaluate how you write stuff down.

I can't imagine the billions of human attention-hours that have been lost to learning/editing/being confused by awkward writing rules. Obviously not only the script, but the spelling too.

Mrs. Delaney made me cry in first grade on my 6th attempt to spell "elephant" and I'll never forgive her.


    … diacritic symbols such as acute accent for high tone and grave
    accent for low tone …
  
    All too often, tone orthographies are established by fiat and
    defended by anecdote. Whether or not tone is marked, the most
    frequently cited justifications offered by the designers are either
    linguistic analysis, or socio-political factors, or an
    impressionistic evaluation that ‘we tried it and it seemed to work
    fine’. This article presents objective evidence that an existing
    tone orthography for an African tone language actually hinders
    fluent reading and writing.
http://cogprints.org/2173/3/lgsp42.pdf ( also http://cogprints.org/1446/5/identity.pdf )


I understand your gripe about inconsistent spelling, but what do you have against diacritics? Diacritics typically serve to make orthography more consistent in some matter, typically more phonetically consistent.


Old Turkish uses the same dictionary (with a few exceptions) as current Turkish; so an automated, one to one mapping between the writing systems is not that hard to implement. My ex's dad actually did charity work for the government to do OCR for Ottoman archives and they were quite successful with their results. I see no reason why they can't automate transpiling the letters since the digitized version of the original text already exists.


Well, in the case of Turkish, the old works were still in Turkish, except they were written in arabic alphabet. They were just republished in latin alphabet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: