Hacker News new | past | comments | ask | show | jobs | submit login
What character was removed from the alphabet? (2020) (dictionary.com)
261 points by paulkrush on June 17, 2023 | hide | past | favorite | 195 comments



I thought this would be about Old→Middle→Modern English, in which case the lost letters are Ƿ, ð, þ and æ.

https://en.wikipedia.org/wiki/Middle_English#Alphabet


I think it's interesting that despite our access to this sort of information, a huge number of people still don't understand that "Ye Olde" was not pronounced "Yee" and rather "The".


I know it's pronounced 'the', but 'yee' is more fun


“Yee oldy hat shoppy” is the most fun.


Milliner doesn't have the same ring.


What if it was a hatter and not a milliner?


Just mercurial madness


You forgot to throw “timey” in there too ;)



It is because the Y was a character called 'thorn' and it symbol was Y drawn as a stick, pronounced the like the 'th' in the and thorn. Also s got reversed and became the z we know today. This article was written by someone who knows noYing.


Actually, the article ends with "The ampersand isn’t the only former member of the alphabet. Learn what led to the extinction of the thorn and the wynn." (which links to https://www.dictionary.com/e/letters-alphabet/)


Not thorn. Thorn is the vertical bar with circle: unvoiced th.

It’s eth (voice on, like in pronouncing “that”).

Icelandic still uses both, didn’t replace both with ambiguous “th”.

https://en.m.wikipedia.org/wiki/Eth


If you're speaking, and don't pronounce it "yee", nobody will know that you're making a reference to the "Ye Olde" orthography.


I only learned that in university. I did a joint major in English and history, but it actually came up in a history module in an offhand comment by the lecturer. I’m not surprised it’s little known.


Old English is just generally not a priority. There are only so many hours in the schoolday and so many schooldays in the year.


> Old English is just generally not a priority

“Ye Olde” is Middle English (or maybe even Early Modern)

Old English doesn’t look like English at all. To my eyes, it looks Scandinavian-even though I know it is closer to Dutch than to Swedish/Danish/Norwegian. Actually, the closest living relative to Old English (apart from its descendants) is Frisian, a minority language of the Netherlands and northern Germany.


Need to teach kids to be self learners. I knew these facts about old English, but I wasn’t taught them in school. I learned them on my own.


Other kids know facts that you don't because people care about different things.


My immature side just thinks it's fun to write þorn.


To mean this song, I presume: https://www.youtube.com/watch?v=xSZBIs0gs0E


That’s eth (voiced th), which was the greek letter delta with hash across the top, visible in Icelandic still, and some typographer used a Y at some point and overlooked the loop at the bottom.

Typos evidently perputuate!


Well, world is huge and amount of stuff you might not know is endless.

https://xkcd.com/1769/


[flagged]


It's very fortunate that we have a place like this where we can indulge in «intellectual curiosity» and gain relative protection from said «people».

I understand you are judgemental about «people» like the character of Chris O'Dowd, Roy, in "The IT Crowd" - https://www.youtube.com/watch?v=fZv_TARX3lI . There is no need to focus about them. You can rest on these refreshing notions we are discussing.


I think it's really interesting, both as a view into how quickly English was changing around the advent of the printing press, and as a way to appreciate how quickly common knowledge can be totally lost or supplanted.


I'd think HN, of all places, with so many software developers would actually be a pretty good audience for pedantry.


I feel the same way about fonts, but if the front page HN is any indication, I'm probably in the minority.


Agreed. Pronunciation is almost always the least interesting thing about the evolution of language.


Also yogh https://en.m.wikipedia.org/wiki/Yogh

There’s a politician in the UK called Menzies Campbell, his name is pronounced “mingis” because that “z” actually used to be a yogh


That explains why his nickname is "Ming"!


Very interesting! One of the most famous Australian prime ministers was known as "Ming" and now I know why.

> Menzies was proud of his Scottish heritage, and preferred his surname to be pronounced in the traditional Scottish manner (/ˈmɪŋɪs/ MING-iss) rather than as it is spelled (/ˈmɛnziz/ MEN-zeez). This gave rise to his nickname "Ming", which was later expanded to "Ming the Merciless" after the comic strip character


ESL learners would probably appreciate the visual split between voiced ð & voiceless þ from the th digraph that isn’t a dental fricative (e.g. Thailand, Thomas, Thames). With such a reform we could replace “the” with standalone “ð” which would bring symmetry with the single-character indefinite article, “a”.


Well, Ƿ & þ are covered in a separate article: https://www.dictionary.com/e/letters-alphabet/


They mention those with links at the bottom of this article as well



"It would have been confusing to say “X, Y, Z, and.” So, the students said, “and per se and.” Per se means “by itself,” so the students were essentially saying, “X, Y, Z, and by itself and.” The term per se was used to denote letters that also doubled as words, such as the letter I (for “me”) and A. By saying “per se,” you clarified that you meant the symbol and not the word.

Over time, “and per se and” was slurred together into the word we use today: ampersand."


I’ve heard some people say that as a child they thought the letter before P was “Elemeno.” So it certainly tracks that if you ask kids to recite “and per se and” that they might think the whole phrase is the name of the letter.


This is why I taught my kids to sing the alphabet forwards and backwards. If you sing it forwards it can sound like you are saying "elemental pee" which sounds scientifically interesting but doesn't help anyone learn what that letter is supposed to look like.

We used wooden alphabet puzzles as a guide so it could reinforce the idea the you are saying L-M-N-O pretty fast.

Since the ABC song is one of three songs that use the same tune it is easy to teach an infant or toddler their alphabet as you sing them to sleep.

The ABC song, Baa-Baa Black Sheep, Twinkle, Twinkle, Little Star all use the same tune.

Add in the backwards ABC's and you get four songs sung with the same tune so that the whole thing becomes tonally monotonous and soothing and it will tend to relax your child as you rock and sing.

With practice I got pretty good at mixing verses from these four songs on the fly so that the song eventually morphed into a single tune with disconnected lyrics. I used it as a challenge to keep myself awake while I rocked them to sleep.

For grins, the Backward ABC lyrics are:

Z, y, x, w, v, u, t...

S, r, q, p, o, n, m...

L, k, j, i, h, g, f...

E, d, c, b, and a.

Now I've sung them backwards to you,

Can you sing them backwards too?

Pretty easy to see how each letter breaks free of the original forward limitations.


For anyone else considering this method, I don't recommend it. I, unfortunately, taught myself the alphabet backwards one day when I was REALLY sick and needed to lie down completely still in a silent room (so I was too bored to do anything else).

Now, I sometimes get confused about what letter comes next. If I'm at `G` am I supposed to say, GFEDCBA or am I supposed to go from my forwards anchor of ABCDEFGH ? This happens when I'm at a nice anchor in my backwards memorization that's a non-anchor in my forwards memorization.

But I mean, yeah, neat party trick that I can say ZYXWVUTSRQPONMLKJIHGFEDCBA (and also type it really quickly).

Interestingly, my breaks are different from the parent commenter's:

ZYXWVUT

SRQP

ONMLK

JIH

GFEDCBA (this part was easy from the start because I played piano)

No alphabet song involved at all.


> Now, I sometimes get confused about what letter comes next.

Reminds me of this classic, the reverse bike:

https://youtu.be/MFzDaBzBlL0


By the way, there is a really irritating Japanese version of the English alphabet song whose verses don't end on the "ee" letters: G, P, V, Z. (Z being "zee" in the USA). So there is no rhyme, and less variety in rhythm.

It goes something like.

A B C D E F G

H I J K L M N

O P Q R S T U

V W and X Y Z

Y and Z

Here is an example: https://www.youtube.com/watch?v=UlZNXUWh9Do

*facepalm*


I cant watch the video now, but the British version is a bit like that:

A B C D E F G

H I J K L M N

O P Q

R S T

U V W X Y Z

Yes it doesn't rhyme but it scans so much better that, once you're used to it, the American LMNOP version sounds ridiculous.


The LMNOP is just rhythmic variations: you have a run of 16th notes.

Such a pattern of 16ths occurs in "have you any wool" in the English nursery rhyme "Baa Baa Black Sheep", which is sung to the same tune.


I agree that sort of flourish is a lot of fun, if it's not in a song trying to teach you something so fundamental.


You're not wrong. That is distressing.


My whole family can do this. Dad would drill us after supper on summer evenings. We also knew the entire list of Presidents (up to the 1970's anyway). To be different I did the Vice Presidents.

The only one that ever got any mileage out of it was my little sister. At a poker party she was going into the kitchen for some soda. Somebody asked as she entered 'Cindy, can you say the alphabet backward?' As she opened the fridge, got the 2-litre and poured, put it away and left the kitchen she sang the alphabet song backwards. Never stopped to ask why they wanted to know. Just did it and left, cool as a cucumber. My sister is so cool.


Your kids will beat the field sobriety test!


Something I've always wondered is how people actually do the backwards alphabet test. I don't think I could efficiently do it sober, but I do have an O(n^2) algorithm for doing it; sing the ABCs in your head, and only say the letter before the one you've already said.

So like, abcdefghijklmnopqrstuvwxyZ, abcdefghijklmnopqrstuvwxY, ...

You can also use this algorithm to reverse a linked list, though you will not get the job if you do. I'm wondering if the cops are as picky as Google interviewers.


A few years back for fun, I learned to say the alphabet backwards as quickly as I could. I used basically the same technique I use as a musician to learn complex phrases; start with a short sequence (like "z y x w") and repeat it over as slowly as you need, and increase the speed as you get better at it, then either extend it a few more letters (e.g. "z y x w v u t") using the same strategy or learn the sequence starting with the next letter separately ("v u t s r q") and then "glue" them together when you're proficient with both. It took me only a few minutes of practice to get the whole alphabet this way, and it's stuck fairly well despite not really ever practicing it other than to occasionally show people as a party trick. Strangely, the letters seem to work better in that order for me; I can actually say the alphabet faster backwards rather than forwards, although it's hard for me to to tell if that's actually due to the sounds blending better objectively or due to the fact that I tend to have a bit of trouble with enunciating clearly in general and the "backwards" route skipping some learned bad habits that the forwards route uses.

Since I don't drink any alcohol and don't drive, I can't imagine I'd ever have a chance to use this in a field sobriety test, but I also suspect that a cop who pulled me over wouldn't find it particularly amusing, so I wouldn't be eager to try it anyways.


I’ve always assumed a gotcha here is when the driver says “officer, I don’t think I could do this sober” and now they’ve admitted guilt.


Look up the field sobriety test Steve Martin does in LA Story - hilarious


Is this online anywhere? I took a look but couldn't find a snippet on youtube.


Funny you mention that since the first time I heard them sung backwards involved a field sobriety test that the driver passed.

I decided that it was a useful skill. Later when our kids were born and the long early morning hours were filled with rocking chairs noises, diaper changes, feeding and burping I took the opportunity to add that version to my song list. Sometimes I ended up singing every song that I knew any words from so it helped a lot when I was tired to focus on one tune.


If you don't care about rhyming you might as well just change the original song.

abcdefg, hijklmn, opq, rst, uvw xyz


I like this post, but I cannot get your backwards ABC rhythm to make sense.


Yeah I can’t get it to fit the usual melody. Did you happen to put it on YouTube?


No I haven't recorded it for YouTube. I do have it on some home video somewhere, maybe. I would probably have to dig for that and maybe do a format conversion from 8mm tape.

I diagrammed it out in the post above. I hope that helps.


This works for me:

Z, y, x, w, v, u, t...

S, r, q, p, o, n, m...

L, k, j,

I, h, g,

F, e, d, c, b, and a.


Yes this also works but the reason I went with "L-K-J-I-H-G-F" over this deals with the fact that I can get each letter distinctly enunciated whereas if I mash the last line as you did there can be some muddying of the first two letters and you end up with another "elemental pee" problem in teaching the letters.

I think F and E are too easy to turn into "effy" when you sing it out.

It's a personal preference. Yours works too.


Oh, you're right. That is a good point. And after looking at your detailed breakdown [0], your original line breaks do make sense. It does work better singing as "Baa Baa Black Sheep" with different lyrics than as "The Alphabet Song" with backwards lyrics.

[0] https://news.ycombinator.com/item?id=36373590


You have to do it like you handle the switch from Twinkle, Twinkle, Little Star to Baa, Baa, Black Sheep.

NOTES = 0

ABC = 1

BWABC = 2

TTLS = 3

BBBS = 4

[0] C, C, G, G, A, A, (long G)

[1] A, b, c, d, e, f, (g)...

[2] Z, y, x, w, v, u, (t)...

[3] Twin, -kle, twin, -kle, lit, -tle, (star),

[4] Baa, baa, black, sheep, have, you, (any wool)?

[0] F, F, E, E, D, D, (long C)

[1] H, i, j, k, l, m, n, o, (p)... (the elemental problem rises because they crammed an extra letter into this line and distinguishing each individual letter gets muddy because they used the two D notes for four letters - l, m, n, o. This is not optimum)

[2] S, r, q, p, o, n, (m)... (gives each letter an opportunity to be clearly enunciated)

[3] How, I, won, -der, what, you, (are)...

[4] Yes, sir, yes, sir, three, bags, (full)

[0] G, G, F, F, E, E, (long D)

[1] Q, r, s, t, u, (v)... (pause on the F at S, or use a long F between S and T sop that the third and fourth notes merge on S)

[2] L, k, j, i, h, g, (f)... (fits the flow and allows each letter to have a distinct sound)

[3] Up, a, -bove, the, world, so, (high)

[4] One, for the, mas, -ter, one, for the, (dame)

[0] G, G, F, F, E, E, (long D)

[1] W, x, y, and, (z)... (Both notes G, F are merged to handle the single letter they are sounding and the E includes "Y and" to get you to the last letter)

[2] E, d, c, b, and (a). (This handles the rhythm like we see in [1] treatment of the stretching or pause on the third letter of the series)

[3] Like, a, dia, -mond, in, the, (sky)

[4] One, for the, lit -tle, girl, who, lives, down the, (lane)

[0] C, C, G, G, A, A, (long G)

[1] Now, I've, sung, my, A, B, (C's)

[2] Now, I've, sung, them, back, -wards, (to you),

[3] Twin, -kle, twin, -kle, lit, -tle, (star)

[4] Baa, baa, black, sheep, have, you, (any wool)?

[0] F, F, E, E, D, D, (long C)

[1] Tell, me, what, you, think, of, (me).

[2] Can, you, sing, them, back, -wards, (too)?

[3] How, I, won, -der, what, you, (are)

[4] Yes, sir, yes, sir, three, bags, (full)

That's all I got. That's how I sing it. I tried to break out the notes as they flow in each song. Commas separate each note in the song and the parentheses around the last word or letter denote a long note. Where you see two words behind one comma those two words use the same note - "for the" is an example. It all fits for me though I guarantee that there is more than one way to skin this cat. This way works for me.


It took me a bit to get it, but this is an excellent explanation. Thank you :)


Are there any colemak enthusiasts teaching their kids the alphabet in colemak order as read from the keyboard.


I for one will teach my kids Dvorak.

Single-quote/quote comma/less dot/greater p y … a o e u i d h t n s …


That reminds me of how Big Bird (the child proxy in Sesame Street before Elmo came along) thought the alphabet was one long word, pronouncing it "Ab-kuh-def-ghee-jeckle-manop-kwer-stoov-wixizz", and sang a song pondering what it might mean: https://www.youtube.com/watch?v=qTvhKZHAP8U

Those Sesame Street people really understood how a kid's mind works.


I thought there must be two versions of the letter P: regular P and elemeno P.

I wasn't sure why there would be two kinds or when to use each kind, but I figured they'd explain later.


My kids thought Elmo was part of the alphabet for an embarrassing amount of time.


Reminds me of a silly skit one of my old coworkers did a while back (one of many):

https://www.youtube.com/watch?v=cH_ynnZzJjg


And this is how windows are opened, skeptics are born, and deep thinkers who are not afraid to challenge conventional wisdom are created. Pull back the curtains to reveal how everything really works.


pre-kindergarten I thought it was elemeno!


Similarly 'Saint Nicholas' becoming 'Santa Claus'


I have seen this explanation printed so many times, and unattributed, that I wonder if students of that era actually said "and per se and".


From The Frumentary by William King (1699)[0]:

  U’s conversation ’s equal to his wine,  
  You sup with W, whene’er you dine:  
  X, Y, and Z, hating to be confin’d,  
  Ramble to the next Eating-house they find;...   
  And Per Se And alone, as Poets use...
See also an elaborate classroom game described in the Documents of the Board of Education of the City of New York (1861)[1]: "One [student] represents &—called ‘And per se and’—as being appended to the alphabet, but not belonging to it....The merriment of this pastime turns upon the endeavor of An’ per s’and to take precedence of Z, and so get fairly into the alphabet..."

[0] A 1781 printing: https://babel.hathitrust.org/cgi/pt?id=njp.32101068156031&vi...

[1] https://babel.hathitrust.org/cgi/pt?id=nyp.33433075984876&vi...


These are great. Did you just know about these already or did you research them just now? If the latter, how did you find these in such short order?

Always impressed by people who can find primary sources for things quickly!


In this case I just did a full-text search of HathiTrust's catalog for "and per se and" (quotes of course are part of the query in this case). These are two results of many.


Not just a fisherman, but a teacher. Gracias!


Me too, but given certain educational methods of the past, I can also imagine it happening. Here's an example of teaching Latin in the 19th century. Read the first 1/3 from the link from today's front page: https://news.ycombinator.com/item?id=36354213. Latin was taught by rote, by repeating without understanding. One of those head masters must once have thought it would be correct to call it "and per se and," and had the power to make generations do it.

On the other hand, why it would have become so widespread as to become a word is what makes me doubt the story.


Some probably did, the same ones that addressed their dad as "pay-ter".


I'm sorry, can you explain this one?


In British families of a certain class, at least on TV, the son would address the father in Latin as "pater" but with a long A sound. It comes off now as an absurd affectation, much like using "per se" regularly in the name of a letter.


The reason I’m inclined to accept this narrative is that not only have I not seen any plausible alternative etymology, there is no alternative etymology available period.


I would recommend against that methodology. A lot of etymology is not easy to find, but plausible etymology is easy to make up.


You must know my wife. She's quite confident that the idiom "balls to the wall" has something to do with a person being put up against a wall to be executed by gunshot.


A and I used to sometimes get 'per se' to clarify that people were referring to the letter, not the one-letter word


Yep, I learned this from the History of English podcast. I highly recommend it for anyone who likes this sort of trivia about the evolution of English.


And here I thought ampersand had something to do with Ampere, the unit of electric current…


I really like this as an alt etymology. If it got popularized to save characters in telegrams, it could have been the “electric and”. Another way it could be from Ampere is that his son was a philologist and could have conceivably promoted the idea of an ampersand.



I think you might be confusing ampersand with the Tironian et which looks like a 7 - Unicode point U+204A. It is still visible in Ireland on the old Post 7 Telegraph boxes. Tiro invented a shorthhand system and his Tironian et represented the sound "et". I recently dived into this whole subject so it's kinda fresh in my mind.


Maybe the second link i provided has it wrong. You seem to know more so I defer to you.


From the book “Shady Characters” by Keith Houston

"Among Tiro’s notae was an innocuous character representing the Latin word et, or “and.” Though this was only one symbol among many (in their most elaborate medieval form, a system descended from Tiro’s original cipher comprised some fourteen thousand glyphs), the utility of Tiro’s system ensured that his et sign would considerably outlive both its creator and its sponsor. This was not, however, the storied ampersand: when Tiro created his so-called Tironian et, the ampersand was still more than a century away."

The book itself is worth a read for all the other punctuation marks it discusses.


"&" wasn't ever in the alphabet in the same capacity as the letters that represent sounds. It was there as a keyword signifying the end of the list. Then its name was expanded to clarify that it was not a normal letter but a keyword. Then that expanded name was misinterpreted by people who never needed the list termination signifier to begin with.

Lesson? KISS. They should've implemented it with brackets.


What, you don't use & in your everyday spelling of words? &, b&, c&le, d&y, gr&, p&a, r&om, v&al, w&er...


Let's not forget &c


$ cat /&c/passwd


[1] xxxxx

-bash: c/passwd: No such file or directory

$ cat: /: Is a directory


> It was there as a keyword signifying the end of the list.

as opposed to simply not continuing the list, or using the word "end" somewhere? this doesn't make sense, to me. I find it difficult to believe this.


I made an assumption based on how I understood the article. After reading your comment, I searched around for more info. Unfortunately, I can't find anything definitive. There are a lot of articles about when and why it was removed (alphabet song based on Twinkle Twinkle Little Star), but not much about why it was added.

The common speculation seems to be that they put it in there simply because it's a symbol. I think about the alphabet as representing a class of characters that represents sounds, which biased my interpretation of the story. However, others could have thought of it as a list of symbols to know in order to read (besides punctuation).


It's supported by the several references to old teaching materials people have posted.


Really they should have had a null. Create extra havoc with C strings if null had the need to be a common char


Danes and Norwegians are way ahead of you: https://en.m.wikipedia.org/wiki/%C3%98


What does Ø have to do with null? I mean sure it has a similar shape to zero but so does O.


It’s also the alternate representation of an empty set in maths.


But it goes the other way. That's math using non-latin letters, which is a common practice. Not a weird coincidence.


When I was very young, I remember AE being used in various printed materials (school books).

https://en.wikipedia.org/wiki/%C3%86

But by the time I was in High School it seemed to have disappeared.



If I didn’t have dyslexia before, I do now after seeing the awful attempts to make specific sounds stand out.

> Any advantage of the I.T.A. in making it easier for children to learn to read English was often offset by … being generally confused by having to deal with two alphabets in their early years of reading.


Japanese kids seem to manage learning 4 of them (5 if they're Korean-Japanese. And yes I'm including the Roman alphabet). Though it's true early- reading books tend to stick to just hiragana & katakana.


They manage but that doesn't mean it's good for learning.

But also, multiple symbols for a sound is a lot simpler than having two alphabets that work in completely different ways. And even worse is the two alphabets sharing symbols and giving them different interpretations.


"AE" as you have it (with your link presumably you mean Æ) is the letter "ash" from the Old English alphabet.

Þ called "thorn" is another, which type setters replaced with a Y so as to not add another letter to their collections. It made the "th" sound and why "Ye Olde ..." is a spelling convention but people of the time would never have said "yee" for it.

Other letters were lost too, eth, wynn, and this is all contributes to why Modern English spelling conventions are kind of awful for ESL learners.

https://en.wikipedia.org/wiki/Thorn_(letter)


The advent of the standard typewriter and computer keyboard killed many a special letter/ligature.


The british spelling is supposedly encyclopædia and Britannica still seems to use it (with and without ae = æ ligature).


We still see this in French semi-regularly, such as œil and cœur


How popular are the ligatures in french in practice?


People don't usually use them informally but it's technically incorrect not to use them. You'll mostly see them in books and articles.


I live in Canada and all the egg cartons say œufs


ex æquo and curriculum vitæ are two common examples of æ in French (although they're obviously direct borrows from Latin).


Ah, mondegreen

https://en.m.wikipedia.org/wiki/Mondegreen

I remember the Maxell tape ads (ibid) "Me ears are alight".

https://youtu.be/XEe0qqPAC6E


"Beelzebub has a devil for a sideboard"


The girl with colitis goes by


In the classic alphabet song & comes between Y and Z. It’s been hiding there all this time.


I naively thought to be "part of the alphabet" our ancestors would have needed to, you know, use "&" in words - the article weakly hints at this for "&c" as being equivalent to "etc" (saving keystrokes even before they had keyboards, I guess). But given that they didn't, say, write "sand" as "s&" I will politely refuse to accept it as a letter.


[Laughs in Laughing in b& & v&]


I always thought amper was another word for merchant because in my native German the & symbol is called "Kaufmanns-Und", literally "merchants-and".


This was good planning, if it was still part of the alphabet we couldn't use it for an URL query parameter delimiter.


I think it's too bad that we didn't choose delimters that had no chance of appearing in actual text. Commas are a big one, when parsing csv there's always the problem of having commas in the text of a field. One hack I have used if I don't want to delete them is to swap them for a character I imagine will never be in the text, such as | (pipe). It all could have been avoided if we had some standard delimiters that were not part of common text.


https://en.m.wikipedia.org/wiki/C0_and_C1_control_codes#Fiel...

The problem is that any delimiter that has no chance of appearing in actual text will be hard to discover, and cannot appear on a standard keyboard (so it is not easily human-writable). So we are kind of stuck with the comma for human readable formats.


The standard characters are ASCII/Unicode field separator, group separator, record separator and unit separator.

https://en.wikipedia.org/wiki/C0_and_C1_control_codes#Field_...


I have actually used two of these in some kind of hell SQL query that needed to concatenate strings for the fastest path between database and frontend as possible. I was surprised to see how well it actually worked, I assumed surely some kind of step in the middle of the chain would break non-printable characters.

Not using these is wasting so many bits of one-byte character encodings, I don't understand why we even need to bother escaping CSV files if we could just use the appropriate control characters instead.


The tab character (U+0009) is a good candidate for this. Many CSV parsers already support it.


The more immediate problem with Tim Berners-Lee's choice of delimiters in URL syntax is that ampersand starts an entity reference in SGML default concrete syntax and thus <a href="bla&x=y"> will be rejected as a reference to an undeclared entity "x" in SGML (whereas in XML it will be rejected as incomplete entity reference missing a terminating ";" character).


This doesn't work. There will always be that one guy thinking "what if we put a csv in another csv?!" So escaping it is.


I've done a lot of csv parsing with shell scripts (usually using awk or cut). My rule of thumb is that if i know the csv may have some commas in quoted text, I can just remove them and work with the shell script. If it's going to have newlines or anything more complicated, I need a real csv parser.


In the early 1970's, I saw the & appear in some of our older alphabet books. The absolutist child in me tried to work out if that bit of archaic data was authoritative or not.

Those books were likely printed in the 1950s.


A few years ago I was idly thumbing through old books at a used store. Looking over a remedial arithmetic book, I was surprised to find that they referred to zero as “the cipher”. Exponentiation and the process of finding a square root were “involution” and “evolution”, respectively. My math education was pretty ad-hoc (complicated story), so maybe these are better known than I realize, but I’ve never heard them. I ended up grabbing the book just for the vocabulary. These examples are just the ones that jumped out most in the first couple of pages.


Never would I have thought that "&" was ever part of an alphabet. It's more of a symbol, like "." or ";". HN is sometimes a source of curious things.


I always thought of it as a ligature for "et" (like ffi is a ligature for ffi) rather than a letter. But I suppose the sz ligature (ß) eventually became a letter, so there is precedent.


æ (æsc/ash) was a letter in English, too, but, like &, stopped being one even though it continued to be used (as a ligature).


The difference is that ß is actually used to spell words. & has never been used as part of a word, it only ever st&s alone.


Spanish also treats multiple-letters as letters in their own right, doesn't it? And Dutch with ij.


Not really, depending on who you ask. In the 90s, the RAE declared that ‘ll’ and ‘ch’ were no longer letters, but they’re still taught as letters in many places. ‘rr’ was never a letter in its own right, but many people consider it to be, as a kind of parallel to ‘ll’ and ‘ch’.


That's good info to learn beyond the strips of letters at the tops of classrooms, thanks!


Treating ij as a single letter in Dutch makes no sense, unless one also treats au, ei*, eu, ou, and ui as single letters. And possibly sch as well.

* especially ei as it's the same sound: eis (demand) and ijs (ice) are homonyms.


ß is a ligature for ss, right? ſs


It's called es-zett (S Z), and I believe it originally was a joined long s and a z, like ſ𝔷. (There doesn't appear to be a fraktur long s in unicode, but I include the fraktur z to show where the shape comes from.) I believe these days it is typically written as "ss" when the ß character is not available.

Edit: The name points to sz, but it's possible that it replaced both ss and sz. I'm not an expert.


I was hoping this article would be about the letter C and be from the future.

The letter C is useless. It either makes the sound that S does or the sound K does. Instead of C we should just use an S or a K for every place a C is.

Some may argue that without C we wouldn't have CH. And then I say, "Why do we need two characters to represent a single sound? It should have it's own letter."

Alas, English spelling is all kinds of messed up. And I'll have to resign myself to that fact.


For example, in Year 1 that useless letter "c" would be dropped to be replased either by "k" or "s", and likewise "x" would no longer be part of the alphabet.

The only kase in which "c" would be retained would be the "ch" formation, which will be dealt with later.

Year 2 might reform "w" spelling, so that "which" and "one" would take the same konsonant, wile Year 3 might well abolish "y" replasing it with "i" and iear 4 might fiks the "g/j" anomali wonse and for all.

Jenerally, then, the improvement would kontinue iear bai iear with iear 5 doing awai with useless double konsonants, and iears 6-12 or so modifaiing vowlz and the rimeining voist and unvoist konsonants.

Bai iear 15 or sou, it wud fainali bi posibl tu meik ius ov thi ridandant letez "c", "y" and "x" -- bai now jast a memori in the maindz ov ould doderez -- tu riplais "ch", "sh", and "th" rispektivli.

Fainali, xen, aafte sam 20 iers ov orxogrefkl riform, wi wud hev a lojikl, kohirnt speling in ius xrewawt xe Ingliy-spiking werld.


I know this is a popular joke, but with English spelling failing to adapt to the great vowel shift and pronunciation shifting in other ways as well (like no longer pronouncing the k in knight), you might as well apply these rules. They make no less sense than the current spelling, and perhaps the removal of unnecessary letters makes the situation even slightly better as well.

With some kids even learning to read English by learning the general shape of the words rather than the individual letter pronunciations (based on some flawed research), I wouldn't mind a general spelling reform to bring the situation under control.


Yeah yeah yeah but this starts to fall apart on the second wave. Why would you get rid of y? The last thing we need is fewer vowel glyphs.

This is just a fun little writing and should not be used to imply anything.


> aafte

Uses long a instead of æ and removes the r from the end for vowel coloring? How would you separate pastor from pasta? I refuse to accept this reform based on its regional bias.


The further down you got, the more like German it sounded as I was mentally pronouncing it--but it was somehow a smooth gradient. Weird!


Which was exactly de point of de joke originally.


Did you transform the text to Esperanto orthography? Nice.


C is also used for sounds that sound like “ts”; you can’t make that sound with an s or k alone.

Sidenote: I see that some transliteration systems make Q represent “ch”.


In Slavic languages and in German, the 'c' is (almost?) always the 'ts' sound. That makes it as useful as the 'x', although that's also one we could do without.

The 'c' could also be considered useful exactly because it has the unique property of changing its sound depending on the surroundings. In Dutch, for example, the word for politician is "politicus" where the 'c' had a k sound, but the plural is "politici", where the 'c' has an 's' sound. It would be a lot less clean if we had to swap 'k' and 's'. Maybe we should only use the 'c' for words where that property matters.

I hope you all remember the various "language reform" essays that get progressively less readable as various redundant letters are dropped or replaced.

Although you could also argue we should add some. The 'e' can be pronounced in several different ways, and not being able to tell the difference can occasionally lead to confusion. In Dutch, for example, the words for "a" and "one" are the same: "een", but pronounced differently.


> That makes it as useful as the 'x', although that's also one we could do without.

I would like to further improve the 'x' by making its pronunciation 'sk' at the beginning of a phoneme, rather than 'z', 'ex', or maybe 'k'.

It sounds fine in words like 'xylophone' or 'xray', and there are plenty of words like 'xy', 'xate', 'xunk', or 'xeleton' so we could re-balance our alphabet.


> I hope you all remember the various "language reform" essays that get progressively less readable as various redundant letters are dropped or replaced.

Can you cite any that are putting in legitimate effort and expertise into the letter choices, and not picking a whole bunch of things that are half bad on purpose?

Also it's a bad idea to rate an orthography based on how it looks with 0 hours of practice.


I'm told that this is a consequence of the Romans absorbing the Etruscans and being so heavily influenced by the Greeks. There were just too many velar consonants to go around, and they were exchanged sometimes, like Caius and Gaius.


C, S, K and X. W and V and U. J and G. Q! The alphabet is a hot mess.


I keep saying that, by chance, it's a pity that the English alphabet has exactly 23 letters and doesn't have a single one more. Had it one more letter, say "ñ" or "ç" or some other, that'd make it perfect for base64: 24*2 + 10 = 64. Instead now we have 62 "base" alphanumeric alphabet, and two symbols that are disagreed upon since they have to be chosen depending on the context.


Wait, here I've been thinking the alphabet has 26 letters all this time?

(double checks keyboard)


Nope, it's 23, the same as the number of fingers and toes you have.


>the same as the number of fingers and toes you have.

You really only have 18 fingers and toes when you think about it - 10 toes + 8 fingers.

Those other two appendages are thumbs. Even my kids know this.

EDIT: You also have two sets of knees - your low knees, and your high knees.


I only have one heiney, but I'm curious about your second one!


I used the head and shoulders knees and toes song to help teach my kids all the main parts in the song. Then I worked on showing them they had two sets of knees.

The low knees were the ones you pat with both hands in that song. (Your ordinary knees.)

Since they knew that song I taught them that they actually had two sets of knees, low knees and high knees (hiney, LOL). When I asked them to pat their knees they would use both hands to pat their (low) knees and then reach up behind and pat their high knees (hineys). The best part was that they would announce which knees they were patting. Pretty funny for me helping my toddlers attain true enlightenment.

I thought it was funny. My wife didn't at first but everyone who knows me understood why it worked out this way.

I kept on the message that they only had 8 fingers too in the hopes that they would argue with their teachers about it and win by showing those teachers that two of those digits are actually thumbs. I made sure they understood that they have 10 digits but only 8 of them are fingers, the other two are thumbs. Ha ha, whose the smartest now, teacher?


wow how did I mess up my post so badly? I didn't even do the proper math _now_, ofc the alphabet has 26 letters and my math above only works for 26 and not 23. I did check back when I had this thought many times. I guess I might have just had a brain fart now, I just copied what googled said while writing my post and somehow Google said 23 in a quick/bad search, so I just copied that number without checking it.

26 * 2 + 10 = 62, almost 64

(26 + 1) * 2 + 10 = 64


Wouldn't surprise me in the least if ChatGPT gave the alphabet written out as an example of a word with 23 letters.

I wonder what the history is behind why base-64 encoding became so widespread. Sadly I can even recognise certain sequences now without having to decode them. But the fact it's not even safe for use in URLs/filenames (due to + and /) would suggest it's not really fit for purpose (and yes I know base64-url encoding exists, I once had to write code that had to deal with strings that could use either encoding, and you could only tell by looking for the presence of _ or -. Certainly that _ was not originally chosen as one the extra 2 characters needed on top of A-Za-z0-9 for full encoding is a bit baffling. "-" is often used as delimeter though - my other pick would have been #.)


> 26 letters

J, U, and W are sometimes considered pretenders. Late additions, at least.

J is just a stylized I. U is a variation of V. W is of course just a double U.

G is also a relatively recent entry, if you go back to the Romans from which the Latin/English alphabet is derived.


Q also. K can be used instead.


I used to see typos like that and double check that I didn't shift universes. I eventually figured out the the shifts aren't that obvious.


You're right


Edit: I mistakenly wrote 23 _now in this post_ while I've had this thought many times with the correct alphabet number count, 26. In fact my math above only works for 26 (or 27 if it had one more letter).


The alternate question is "why are these 26 characters the American English alphabet?" It's fairly arbitrary, a collection of historical accidents and changes in orthography.

And it's incomplete. You can't really write American English with just the usual 26 letters. Ñ and the ʻokina are proper letters and necessary for writing a bunch of American words correctly. The various kahakō are helpful too but they are treated as diacritics and not full fledged separate letters.


Not just American English; lots of countries use the same alphabet. Even countries that have more characters than these 26, still often consider these 26 to be their alphabet.

Weirder still: we often call it the Latin alphabet, but Latin never had some of these characters. No 'j', 'u' or 'w' in Latin, for example.


> Ñ and the ʻokina are proper letters and necessary for writing a bunch of American words correctly

Depends what you mean by “correctly”, but in practice they’re not necessary to write American English, and most people don’t use them (except in very high-production-value writing, like professionally edited books and magazines).


Tell that to the people who live in Española, NM; the official name of the city includes the ñ. Or Hawaiʻi, although for the latter "Hawaii" is at least correct from an official government placename perspective.


at a push, you can transliterate ñ as ny or ni, or even use the Portuguese nh if you really want to push the boat out, but when an ñ isn't available, most people seem to just drop it entirely and let you figure it out from context


I was expecting this would be about the letter Ⱶ or one of the other Claudian letters of the Latin alphabet: https://en.wikipedia.org/wiki/Claudian_letters


In the early 1800s, school children reciting their ABCs concluded the alphabet with the &.

It would have been confusing to say “X, Y, Z, and.” So, the students said, “and per se and.” Per se means “by itself,” so the students were essentially saying, “X, Y, Z, and by itself and.”

So it was just some shorthand tossed in there


I think of Hacker News as a source of interesting things that I as a nerd, did not know & this fits.


OP says 'per se' means 'by itself'. But in this case (the alphabet song) it makes more sense for 'per se' to have it's other meaning: 'as such'. That is, the letter means the same as 'and'.


Q: What character was removed from the alphabet?

A:

   1> (diff (range #\a #\z) "the alphabet")
   (#\c #\d #\f #\g #\i #\j #\k #\m #\n #\o #\q #\r #\s #\u #\v #\w
    #\x #\y #\z)
Plus all all non-alphabetic characters other than space.


Ever wondered about those old 'U's that look like V?

Now you know why 'W' is so named.


Tom Bombadil


Related from 2022: https://news.ycombinator.com/item?id=32249465 ("The History of ‘Ampersand’ (2020)", 178 comments)


The Story of Ampersand, in The Phantom Tollbooth universe.

https://sharegpt.com/c/J1U3T7m


Imagine a roguelike using all of these obscure characters where key plot elements are the elucidation of their true histories.


Part of the english alphabet, but in which region, all english speaking places? I'm curious where it comes from.


Given the dates involved I’d guess maybe England what with English be the language of the English and all. ;)


English was spoken in several places outside of England by the 1800s.


Yes that's true but the article specifically refers to ampersand already being part of English alphabet for many years before that.


Why is M before N in the alphabet. It should be N first, then M!


&totse


ch, ll


Good old 'che' and 'elle', that used to confusd so many non-Spanish speakers when they looked up words in Spanish dictionaries before these letters were disolved into their constituent parts 'c', 'h' and 'l'. Of course, that was possible because they are not technically characters, but letters that were removed from the alphabet.

Fun fact though, there is one character that was removed from the Spanish alphabet and the whole of Spanish orthography but remains in use in many others. Even better, this character actually originated in mediæval Spanish, so it could be argued that it wasn't removed: it was let loose!

That character is the 'ç', which is now so closely associated to French, but remains in use in other close neighbours of Spanish such as Catalan and Portuguese and has been adapted for use in many other languages: https://en.wikipedia.org/wiki/ç


Google




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: