Dislosure: I'm not directly from the fields of the Sciences Of Angles And Ambiguously Crossing Lines nor I've every seen or used this symbol before. However to me it's, pretty evidently, supposed to be a "no right angle" symbol.
(A) It's in the math section,
(B) it's with angles,
(C) the thunderbolt ↯ is commonly used for "not" or more specifically for dis-proof in this area and
(D) at least by my 30 s internet search on a mobile phone I couldn't find any other "no-angle" or "no-right-angle" symbol.
Someone could argue that usually you use a simple strike through as like as in ≠ (unequal), ∉ (not-element-of) or ∅ (empty set) but I would say it was chosen to avoid confusion in this case. The angle itself (without the "no/not") consists of only to orthogonal lines so it would be kinda complicated to "strike it though" in any direction without ambiguity that would resemble a triangle, a fork or whatnot.
It's used in german mathematics education (secondary level), either to mark a contradiction in a proof or more generally to mark an erroneous statement.
But I have never seen it to mark negation of a condition, that's usually done with a slash (as in ≠ ≮ ≯ ≰ ≱ ≴ ≵ ⊄ ⊅ ⊈ ⊉ ⊊ ⊋ ∉ ∌ ∄ ∦, you get the idea).
So for "not a right angle" I'd have expected a "right angle" symbol with a slash through it.
Funny enough, I've only seen it at the Gymnasium (secondary level) and not in the University a few years later -- then indeed the usual symbols were the 'slashed' relations like you've described, or the bottom symbol: ⊥ in logic. Maybe it's an idiosyncrasy of a certain subset of math teachers.
But how would you position the slash to get a somewhat easy to decipher symbol? To me, the right angle symbol seems to lend itself more to this unorthodox negation through the contradiction symbol than to negation through the normal slash.
Same. Never seen that symbol in my life. I've seen ¬, ~, !, etc used for not/negation in computer science, math, logic, etc.
And some commenters said they used it to mark proof by contradiction, but why is there a need to mark it when you are showing it via proof? A canonical example of proof by contradiction is proving sqrt(2) is not rational. Never have I seen it marked with that symbol. Where would you even mark it? At the beginning with the assumption? Or at the end like QED?
I was taught it in extracurricular mathematics in Australia. We were taught that it goes at the end of a contradiction proof once the contradiction has been found. We used to write it extra large, like lightning strike. I think of it like a proof mic-drop.
It's the first symbol referenced for symbols used in proof by contradiction to show contradiction [0]. I know that's not exactly "not" or "disproof" but I think that might be what the poster was getting at.
I submit to you that it's clearly not a thunderbolt but an arrow indicating changing directions; that being overlaid on top of a pair of axes is obviously useful in the study of non-Euclidean geometry to indicate the use of wibbly-wobbly dimensions.
I've thought that it would be cool to have a Wiki with an entry for each character, describing what it is, and its history. Although that wouldn't help for mystery characters like this one, there are a lot of characters with stories behind them.
I was just discussing :man-in-business-suit-levitating: with some friends earlier today. Also an interestingly cryptic background, albeit not an unsolved one.
The story behind MIBSL is definitely fascinating and some great trivia there. There’s a longer article about it here: https://www.newsweek.com/2016/05/06/secret-ska-history-man-b... that covers not just the inspiration for the emoji itself, but a brief history behind the inspiration behind the inspiration. Lots of levels of metaness to unpack.
Aside from the table describing each symbol, if you scroll to the bottom of the page, it links out to full articles related to each. For a full list see...
I like this idea. It would serve as a place to put a well-sourced answer to the question about this character, and the talk section could be used to discuss further investigation into the topic, or when new uses inevitably arise.
I don't see the contradiction. The only thing they used from the name is the "right angle" aspect. Given their argument is this is a composition of thunderbolt + X, for some X (and derived from their prior knowledge of thunderbolt's compositional meaning), deciphering the image as "thunderbolt + right angle" is trivial and consistent with the naming origin in TFA.
> In Unicode, the symbol for a right angle is U+221F ∟ RIGHT ANGLE (HTML ∟ · ∟). It should not be confused with the similarly shaped symbol U+231E ⌞ BOTTOM LEFT CORNER (HTML ⌞ · ⌞, ⌞). Related symbols are U+22BE ⊾ RIGHT ANGLE WITH ARC (HTML ⊾ · ⊾), U+299C ⦜ RIGHT ANGLE VARIANT WITH SQUARE (HTML ⦜ · ⦜), and U+299D ⦝ MEASURED RIGHT ANGLE WITH DOT (HTML ⦝ · ⦝).[5]
> In diagrams, the fact that an angle is a right angle is usually expressed by adding a small right angle that forms a square with the angle in the diagram, as seen in the diagram of a right triangle (in British English, a right-angled triangle) to the right. The symbol for a measured angle, an arc, with a dot, is used in some European countries, including German-speaking countries and Poland, as an alternative symbol for a right angle.[6]
I think perpendicular most commonly refers to lines/vectors/planes etc., while the right angle symbol refers to angles. Also, there are often multiple symbols expressing the same thing.
I believe in German (possibly also other languages) the thunderbolt ↯ is commonly used to mean "this is a contradiction" in a mathematical proof, equivalently to in English a kind of ⋕ rotated by 45° or the symbol ※.
The symbol ⟂ on the other hand means "false" and is used in particular in formal logic.
These unicode characters feel like they were given to us from an alien species or something.
How did it we end up with so many characters of unknown origin?
I had no idea what it meant or was used for, thus assigned it a “descriptive name” when collating the symbols for the STIX project. (I still have no idea, nor can supply an example of the symbol in use.) […] it is the case that ISO 9573-13 existed long before either AFII or the STIX project were formed. […] I once asked Charles Goldfarb what the source of these entities was, but remember that he didn’t have a definitive answer.
>These unicode characters feel like they were given to us from an alien species or something.
I worked at a large media company that had lots of differing icon sets in play across different media.
These icons were in SVG and they had been optimized pretty intensely. In some cases due to a bug in one of the optimizing tools some types of bezier curves got weird, so instead of say the round headed person with their hand held up to say stop it was the star headed monstrosity pointing to doom from the heavens. Because of how the icons were used and not used these optimized errors were actually sitting around so long that nobody had examples of the original icons although one could guess because in some cases we had similar ones in other projects that had not been optimized.
So maybe a similar thing would be the source of these weird alien entities.
I would've thought they'd have a table of every icon and a description or something, maybe at the time it was never taken very seriously or likely to take off as it did, so people didn't bother. Like IPv4...
No it did not. Klingon was originally proposed in 1997 and rejected in 2001. A second proposal was made in 2016 with more optimistic noises. But AFAIK it has yet to be accepted.
It is also, like Tengwar and Cirth (which AFAIK remain unincluded even though they are on the BMP roadmap), held back on IP grounds. To my knowledge, the IP issues remain fully unresolved.
Klingon is included in the ConScript registry, but that is unrelated to unicode itself, it performs ad-hoc and non-standard allocations in private use areas.
> Notably, it appears that anyone could register a glyph with the AFII for a fee of 5$ to 50$ (about 8.60$ to 86$, accounting for inflation). Even if the International Glyph Register can be found, it likely merely contains another table with the glyph, the indentifier, and the short description. To know its origins would require the original registration request that added the character, but it’s unlikely that such old documents from a now-defunct non-profit organization in the 90s would have been kept or digitized.
Could be any random kid who found out about this and wanted the cool symbol they made up registered.
In some sense, you can still do! The Ideographic Variation Database [1] essentially allows a definition of new CJK ideograph [sic] as a glyphic subset of existing characters, with a possible processing fee.
Something similar exists in JIS called 幽霊文字 (ghost characters), which refers to kanji of mysterious origin with no real-world usage that somehow made its way into the JIS character set. After some investigation, most of them turns out to be mistranscriptions of kanji from old historical materials.
Due to this thorough investigation, the committee was able to pare down the number of kanji for which the source cannot be confidently explained to twelve, shown on the adjacent table. Of these, it is conjectured that several glyphs came about due to copying errors. In particular, 妛 was probably created when printers tried to create 𡚴 by cutting and pasting 山 and 女 together. A shadow from that process was misinterpreted as a line, resulting in 妛 (a picture of this can be found in the Jōyō kanji jiten).
I remember convincing friend to build unicode pokedex extension that collected all the unicode symbol he was exposed to via cansual web browsing. Never followed up but I think it'd be neat, or something along the lines of rare unicode browser bingo.
I suspect there's an entire alien alphabet (like Marain, for instance) in there someplace. There was a proposal to stuff Klingon into the Private Use Area, at least...
If you're willing to use a discontinuous subset you could probably find close enough glyphs to make a full Marain. Ordering would be messed up and require a lookup table though.
(Edited to upload the image to imgur and avoid spammy advertisements).
Here I'll date myself: I remember this as "diode with a gate". Back when we did circuit diagrams with stencils, you had the diode stencil which looks like a triangle with a line on top, and then with the electrical stencils you had "decorations".
The intention was to put down the original symbol on the paper, move the decorations stencil over top of it and then add the required decorations. It's why diode symbols look like this: https://imgur.com/a/0tSLV7O (notice "step recovery diode").
OK so why do we have a seperate decorator for a diode? Can't we just have a pocket full of stencils for diodes? Space was at a premium back then. It goes back to daisy wheels and typeballs: https://en.wikipedia.org/wiki/Printer_(computing)#Impact_pri... You would have one position for "diode" and one position for "decorator" and the printer would know when it got one ASCII char it would print the diode, then send whatever the thin space is to advance the print carriage a small step, then print the decorator.
Someone should be able to find a daisy wheel or typeball dedicated to circuits and bear this out.
Ironically imgur is nowadays very, very user hostile.
You can't view an image without JavaScript. Once you enable it you get "f*ck your privacy" popups, and ads if you don't have a blocker. On mobile I can no longer view anything on imgur at all, only the top bar renders for some reason. There seems to be no way around this.
It was also recently, although after the downhill, bought by a company that specializes in buying dying social media platforms and milking them dry with questionable ethics. How questionable? Well, they got into a Darknet Diaries episode https://darknetdiaries.com/episode/93/
Could (more or less) fit that description and would make more sense as a symbol. Something like it even made it into Unicode (https://emojipedia.org/chart-decreasing/)
That to me immediately communicates a decreasing chart. I would have no idea that the right-angle lines represent right angles generally and not chart axes.
The article makes a decent case for the symbol to be a chart symbol that means "no right angle". The zig zag arrow apparently being a shorthand for "no" in that particular circle.
It looks like a symbol that someone added for completeness but isn't particularly useful even in the field.
I remember back in the day we used to find publicly exposed Windows FTP servers, create new folders using some messed up unicode characters and upload pirated games and movies there to share with each other. The only way to open those directories was to specifically type the exact path in unicode, simply double clicking on the folder in filezilla or windows explorer resulted in a error. Sometimes the admins themselves couldn't delete them and just left them there. Good times.
I remember the days of people beginning to abuse ftp sites, all us admins shutting down our writable ftp upload folders, and thinking, "this is why we can't have nice things." It was the beginning of the end of the early, friendly internet.
I've heard a lot of people pronounce it like that, but I'm pretty sure that's not correct. It's clearly the English word "wares"[1] with the S replaced with a Z, similar to "hackz" and "cheatz", which were also common in that era. I think the "wah-rez" pronunciation came from people seeing the l33tspeak and not recognizing the original word behind it.
It's not a synonym for "goods", because only one type of thing was ever "wares"; software. It's just for dividing up the sections of your piracy BBS into, like "filez" (files, multi-kilobyte textfiles full of instructions on how to make bombs etc), "imagez", "warez", etc.
Anyway, by 1990, in the piracy circles I distantly associated with, it was quite common to pronounce it like "juarez". Sort of semi-ironically, like, it's obviously the wrong pronunciation, but nonetheless everyone uses that pronunciation on purpose. So, what could be more correct than "the thing everyone does"?
Of course, pronunciation only happens in meatspace (or at least it did back before MP3s and before YouTube and so on), and of course I'm talking about clusters of teenagers separated by thousands of km. We had "meetupz" or "meetz" in my city, which is how I know how "everyone" pronounced it... but it's certainly possible that in most cities/whatever there was some other pronunciation rule.
> It's not a synonym for "goods", because only one type of thing was ever "wares"; software. It's just for dividing up the sections of your piracy BBS into, like "filez" (files, multi-kilobyte textfiles full of instructions on how to make bombs etc), "imagez", "warez", etc.
Citation needed there.
I have always assumed it came from fleamarkets where people selling pirated VHS films and knock-off Rolexes would be described as “selling their wares”. Changing the s to a z was an obvious step in 90s internet culture.
Okay, so my citation is, I was there, I was a (fringe) participant in pre-internet piracy culture, starting in 1990.
Pirate BBSes would have various "goods" (in the sense you and GP mean) available for download, including images (hint: some of them may have involve ladies), text files, and software. Sometimes there would also be sections for various art media created by users, such as .mods or ASCII art or poetry or whatever. Those various "goods" would never be all slopped together, they'd be divided into categories. And the category called "warez" would never, ever, have anything in it other than pirated software.
I agree that the s-to-z thing is just classic hacker/leet culture, though it's not internet culture, because it predates the people in question having internet access. I'm saying that the "wares" that becomes "warez" is not "wares-as-in-goods", it's "wares-as-in-softwares". It's pluralized even though "software" is a non-count noun, because then it fits with "files", "images", and so on. And yes, ultimately the "-ware" in "software" is from the sense that you and GP are talking about; I'm saying that the etymology is not directly from there, because otherwise all the other kinds of pirated stuff would also be "warez", and it never, ever, was.
I'm not sure how you are missing it, but hardware and software both etymologically have ware (as in a manufactured article, product, or merchandise) built-in to them. Without ware, there would be no hardware or software, or warez. The root of these words, also silverware, cookware, courseware, Tupperware, Corningware, etc., indeed is "ware." And wares is merely the plural of ware.
I too never seen "warez" used to refer to anything other than pirated software. You make a good point about the derivation; it probably is directly from "software". Adding a superfluous Z to the end of a plural mass noun was also a characteristic of l33tspeak, as I recall.
This reminds me, my friend and I were the only people we knew who'd even used the internet in the late 90s so no one was around to correct us, and 3 of the apparently incorrect pronounciations we had agreed on were:
I do not get it. Did you have to shut it down? Does not make sense to complain that someone uploaded stuff to a public unprotected writable storage. Wouldn't securing it with a set of credentials suffice?
I think, you don't get the full grasp of "early, friendly internet". Very few people do today. In my bubble - programming, for example, young people can't even imagine that there were times when you could focus on _things_ instead of writing layers of security code around them.
It makes me sad to think of all those simple little services we used to run on *NIX machines, like `finger` and `whois`. You'd never want to disclose that information now, but at the time it was quite nice to be able to see if a friend or colleague was around with a simple network query.
The GP is saying "I miss the days where I could easily exploit people" and the response was "I miss the days where we respected each other enough to not exploit each other". It wasn't naive or irresponsible, but reflective of a time with more trust, cooperation, and good intentions.
Reminds me of a few years ago, when I accidentally exposed my Domoticz install to the internet without authentication. I've had missed something in my Nginx config with X-Forwarded-For headers. After about a week or something apparently a foreign visitor came by my install and decided to have some good fun. Turning my lights on/off at random times. It took me about 3 days to realize what have happened, but in the mean time he didn't just destroy my install and only mess with me. Which was really sweet, because nuking the system would be far easier than opening the webpage every night.
That was a good and fun security lesson though and now I always check outside security with a mobile hotspot.
That’s like saying it would be naive and irresponsible for me to go outside without a life preserver today despite an unforeseen catastrophic global flood drowning the lands 10 years from now. It was a different world, with different expectations and frameworks.
That's like saying it's naive and irresponsible to gooutside without locking your front door when you live in a tiny remote village with 40 other people you've known for your whole life.
Some were open for uploads by design, in spirit of sharing things - essentially use the free space left after maib purpose to provide friendly mirrors for things like new projects etc. I recall using Archie to find copies of open source software at the ending edge of that era.
Some also were used as submissions for projects, long before sites like sourceforge started. Especially since plonking a bigger source dump on newsgroups wasn't exactly well received.
Sometimes people should be able to do nice things without it getting abused, no?
In The Netherlands, in the nicer neighborhoods we have something called a ‘buurtbieb’ aka a ‘neighborhood libraries’, which is a weatherproof cabinet where people can put surplus books that other people in the neighborhood can borrow.
Of course you could take all the books or use the cabinet to store candy, but why would you?
We have these throughout many neighborhoods in my city in central Florida, USA. We’re a college town so I just assumed it was somehow connected to that. Neat that it’s an international thing!
True, though to be fair most people never get to use private libraries. Or they used a library at their University that was technically private, but that gave access to the public as well. Public libraries are ubiquitous and very normal, while private libraries are the exception.
In America, public schools all have private libraries, reserved for attending students. (Maybe some operate as public libraries, but I've never seen nor heard of it.)
Furthermore, public libraries are not necessarily free. In America they virtually are all; fees only for late returns. But this is not globally true; in some parts of the world, libraries open to the public charge a fee for checking out books, or even require a fee for entry.
It's a normal elision, yes, we all picture a public library when we say "library". But "free library" isn't redundant or weird, because "public" is a modifier of library, not a trait.
People tend to call their personal library a "book collection" or the like, but it's a library, in just the same way that a Little Free Library is.
So most people who read have at least a small private library, whether they think of it in those terms or not.
There are two libraries near me that aren’t free - they charge an annual “membership” fee. One even operates more like an old blockbuster when it comes to newly released books. They charge a daily rental fee! It’s 25¢ a day, I believe.
Just keep using it as a generic name. They've already lost the generification war. Are they seriously going to track down and sue neighborhood libraries?
Good luck getting a jury to enforce the trademark.
Obligatory "Free as in beer vs. free as in freedom" comment. I have pulled stuff out of small community bookshelves that would never have seen their chance in a "professional-run" public library, both bad and good.
In Puerto Rico there are quite a few of these on the sidewalk and despite the rains they are generally always stocked with books. There are bars on everyone's windows and doors, but books piled up on the street.
there's one down the street from me but instead of books it has canned food. It says "little free pantry" on it. It must have been around for a while because the neighborhood it's in has long sense been gentrified and is populated with very well-off residents vs the working poor that use to live there.
I was doing the same on MS-DOS, keeping "secret" files on a floppy disk with a directory having a name ending with an invisible Alt+255... it was even impossible to look inside it with the Windows 3.1 file manager.
We did the same thing using the character for a non-breaking space, I think it was ALT+0160. It would sort last in the list, and just be an effectively-invisible entry unless you were really paying attention. Combined with an exploit we had to change users on the FTP servers behind most dialup ISPs hosting (the free couple Mb hosting you’d get with your dialup account that very few people cared about or used), meant we had pretty much unlimited file hosting, filling random families web hosting with hidden folders full of mp3s and warez.
You too, huh? This was my first foray into the "dark" side of the Internet as a kid, pre-Web, hanging out with pirates on IRC and get "hired" to go around the early 'Net and fuck up people's upload folders by creating hidden directories we could load with our group's warez. ^H^H^H^H
There are some kanji scripts that has no record of existing usage in the JIS character encoding which was also incorporated to the Unicode. It's called "ghost character" in Japanese.
I feel bad for the font designers who have to put all these inane characters in, have to draw them and hint them, and they have no purpose except they have to be there or someone will complain.
Fortunately there are only a handful of such cases. But unfortunately there are tons of commonly used CJKV ideographs; typical Chinese or Japanese fonts are of course not expected to have all Chinese characters (there are almost 100,000 of them while OpenType fonts can only have 65K glyphs), but they are expected to have thousands of commonly used characters.
The person who appears to have done the work of collecting this character (and others) for submission into the Unicode process back in 1997[0] (Barbara Beeton) has actually responded to the StackExchange question[1].
Unfortunately even she is not aware of what the symbol is actually for.
So Unicode has all these mysterious characters... but I would bet that it's still true that many people on the planet speaking common languages can't even type their name...
This post is from 2015, and I'd love to know if unicode has added better support for non-English languages since then.
I was very surprised by your comment and by the article you linked that the name Aditya cannot be represented in Unicode. I think it can be represented: আদিত্য.
I am not a Bengali-speaker, but I am familiar with the class of scripts to which the Bengali script belongs, abugidas. These scripts assume a vowel following every consonant. When two consonants occur one after the other in a word (a consonant cluster), this must be represented specially, because if you just wrote (consonant, consonant) it would be pronounced (consonant, inherent vowel, consonant).
The "ty" in Aditya is one such consonant cluster. The way this cluster is written is ত্য. This is represented as three code points (I think I am messing up the proper terms), one for the "t", one to "join", and one for "y".
Some people think of the special shape that the final "y" as a separate character on its own. In fact, it has it's own name (ya-phalā). I can understand why it would be confusing to see that the ya-phalā can't be typed as its own single character (" ্য"), but it really has to do with a difference in how the input is is implemented and how the person thinks about their own language.
There was a lot of discussion [0] of that point when the Model View Culture article was originally posted 7 years ago.
It's complicated, but the author of the piece seems to take issue with how the character set was designed by the language authorities the UTC delegated to.
I read that "I Can’t Write My Name" article when it came out and it's remarkably misguided. First, there are solid linguistic reasons why Unicode handles that character the way it does. Second, the article completely misunderstands how the Unicode Consortium works. Finally, the Unicode Consortium is remarkably open to character proposals from random people. The author could have written a proposal and fixed the problem in half the time it took to write the article. Source: I am a random person who got multiple characters added to Unicode.
The article present it like it purely due to western-centrism these characters does not have distinct code points in Unicode. In reality the issue is much more subtle - a discussion whether a certain glyph is a ligature of two characters or its own distinct character.
That publication was so good, I was really bummed when they shut down. Looks like they came back for a minute in 2020? I had no idea but I know what I'm doing tonight.
The name itself sounds like it should be a graph of a downward trend line on a graph.
I’m guessing the person who implemented it got this exact requirement wording in the Unicode definition and nothing else, didn’t make the logical connection, and just implemented it as close to literally as they could.
But if I read the article correctly, this glyph comes from a set of math symbols. I don't think "stock goes down" was ever used in any mathematical script.
I generally (perhaps naively) think that going forward knowledge loss won't be much of an issue compared to our history.
Surely the archeologists of the future won't have to wonder what some tool from our times was used for or what some symbol we currently use means… They will have Wikipedia and archive.org and whatnot!
But that fantasy is not compatible with reality where we are already unable to find out what is the purpose of some characters in Unicode.
Even digital storage is not permanent. Important things will be copied and preserved, but I imagine at some point so many of the relics of everyday life will be deleted or deteriorate at some point in the far future, such as this very comment
That presumes humans can access our (electronic) media and understand it, in some 8.000 years or further.
There's no saying that there'll be a society capable of reading bits and bytes by then. Not just collapsed society -they'll hardly be interested in reading a random discussion on an orange forum for a niche group that lived 8000 years ago- but maybe even societies that are vastly technical superior to our own but cannot fathom what things meant 8 millenia back. I mean we have texts from some 600 years ago, that we can read, but cannot understand (e.g. Rohonc Codex). Eventhough our technology and knowledge is far superior to when it was written.
Electronics become unusuable quickly, though. We can find stone tablets and clay pottery, but 10k years from now will they be able to find hard drives and extract useful data? Seems like it can easily go in the opposite direction
There's a process for it. I'm not sure it costs anything but it's a bunch of paperwork. You have to justify what it's used for, why existing solutions don't work, etc. The working group is probably pretty reasonable, but I'm sure it's an involved process.
If you do, can you please tack on symbols for following external links, space bar symbol, and all the other miscellaneous internet adjacent characters I always have to reach to Fontawesome for?
Might we run out of Unicode code points, like we (seem to) be running out of IPv4 addresses?
As another comment mentions, once you add all these snowmen, with/without snow, male female and gender-neutral, in a few skin colour options (plus neutral)... it adds up. Plus, exponential growth once you consider family of snowmen (different number/genders/races of "parents", different number/gender/races of "children" and so on...).
There is no reason to believe the current rate (about ~35,000 over the period 2010--2020) to change rapidly, so we are probably safe for this century. You should be aware that emoji gender and skin color is encoded in character sequences and modifiers rather than atomic characters, exactly in order to avoid that exponential growth.
And in the unlikely case that Unicode gets so many characters somehow, you can always extend it: http://ucsx.org/
The successful bitcoin sign proposal [1] explicitly deals with such a criticism:
> Will Unicode be flooded with symbols for many crypto-currencies?
> Most other crypto-currencies have learned from the difficulty that a non-Unicode symbol causes for
Bitcoin, and use a symbol already in Unicode. For instance, Dogecoin uses Đ, Ethereum uses Ξ, Litecoin uses Ł, Namecoin uses ℕ, Peercoin uses Ᵽ and Primecoin uses Ψ. Some, like Ripple, use Roman capital letters (XRP), mimicking ISO 4217 currency codes.
> While it is possible another crypto-currency will have a non-Unicode symbol that is extensively used in text, this is unlikely.
I think this section was crucial for the eventual acceptance, because Unicode people do care (a lot) about long-term consequences of proposals.
It seem to me that this is something best handled with tag characters, like ¤XBT + (U+E007F) = ₿ (where the letters are from the tag block, U+E00xx). This mirrors one of the two systems for rendering national flags[0], just with a different starting codepoint, and can easily accommodate all the ISO 4217 currency codes and common unofficial extensions. If a system doesn't know how to render a particular glyph it can just fall back to showing the Roman capital letters.
The downside of this approach is size: each tag codepoint (including the end marker) requires four bytes in UTF-8, plus two for ¤, so the sequence above is 18 bytes long.
That sounds interesting, but modern currency symbols are already fast-tracked anyway---they almost always get assigned in the next version of Unicode---and more than one currency symbols for given ISO 4217 code can exist so I don't think it would work.
> modern currency symbols are already fast-tracked anyway
For national currencies, perhaps. New national currencies aren't introduced all that often, and there is a lot of pressure to support them quickly as their use is often mandatory for anyone living in that jurisdiction. For new private currencies, including crypto-currencies, we don't see quite the same eagerness—the observation that new crypto-currencies were more likely to reuse existing Unicode symbols than invent new ones was a consideration in getting the Bitcoin symbol adopted, as they didn't want to open up the floodgates to large numbers of new currency symbols. The tag-based system offers a compromise.
> and more than one currency symbols for given ISO 4217 code can exist so I don't think it would work
That is a bit of a problem, but it could be handled with the variant selector codepoints, for example ¤MOP = MOP$, ¤MOP(VS1) = 圓, and ¤MOP(VS2) = 元, if the symbols have the same meaning. To save some space the VS could replace the end codepoint. For fractional units there could be a different prefix such as ¢ for 1/100 or ₥ for 1/1000 in place of the ¤, or incorporating one of the Unicode fraction codepoints for other ratios up to ⅞ (or ⅑ or ⅒). These would be rendered verbatim in the fallback version, like ¢USD.
> Might we run out of Unicode code points, like we (seem to) be running out of IPv4 addresses?
No. There are currently 144697 codepoints allocated, out of a possible 1.1 millions. And most updates allocate a few hundreds. The large allocations (in the thousands at a time) overwhelmingly concern large additions of CJK unified ideographs (see: 13.0 with 4969 out of 5930 new codepoints, 10.0 with 7494 / 8518, 8.0 with 5771/7716).
There have been large additions of historical scripts (9.0 added the entire Tangut script, 7.0 added 23 different scripts) but those occurrences have slowed down a lot.
The snowmen are in Unicode because they existed in a character set before the Unicode standard was created. Unicode was deliberately created as a superset of all existing character sets at the time.
Some of the glyphs you mention are combinatorial code points. i.e. they are multibyte characters combined to a single character. So you add a gender modifier and skin color modifier to change the appearance. You don't add multiple code points.
It's your device rendering these 2-3 byte character sets as single icons/emojis.
> So you add a gender modifier and skin color modifier to change the appearance. You don't add multiple code points.
FWIW that's true for the skin colors (there are 5 fitzpatrick scale modifiers, U+1F3FB to U+1F3FF), but it's not true for the gender: the basic gendered characters (e.g. U+1F468 "MAN", U+1F469 "WOMAN") were part of the original set "merged" from japanese emoji so the gender-neutral equivalent (e.g. U+1F9D1 "ADULT") was added as a separate codepoints.
We are nowhere close to running out of code points. Unicode as currently defined has 1.1 million, but even that could be increased if there was a need. There isn't, since only 114 thousand are defined.
There are not separate code points for all combinations of genders and skin colors; the characters are made as combinations.
Things like skin tone variations are not defined as individual code points. They are sequences of code points that combine to make the full, customized glyph. So you have one code point for "medical", one for "professional", one for "female", one for "brown skin", one for "blond hair", and from that you get a more specific picture of a doctor..
We already did! That's what happened when UTF-16 was exhausted, which was never the original plan. Just like how the IPv4 internet degraded into a mess of hacks once addresses ran short (like NAT), so too did Unicode start becoming wildly more complex.
Amongst other things, hitting the limit of 16 bits meant the introduction of:
- The concept of "planes"
- UTF-16 combining characters
- UTF-32
- The newfound desire to encode emoji using combining characters, which means many apparently simple emoji are actually hacked together out of a mini programming language (e.g. black man = man emoji + skin tone modifier). Same thing for flags, which are actually two English letters mapped into a different part of the code space and then combined e.g. the British flag is G+B.
It's one reason why emoji broke so much software. It used to be that before emoji nobody cared about characters beyond the basic multilingual plane and ignored them. Then emoji came along and broke everything that assumed a UTF-16 code point == a character.
1) there's only ~150k unicode values defined. If we assume a signed int for available space, we have 2,147,333,647 of 2,147,483,647 remaining. moreso if the int is unsigned. We're fine.
2) they use values that combine like ligatures to create the variants of values. there isn't a combinatorial explosion because color is a modifier value, and sex, and then the underlying symbol. It's not a unique symbol for each combination.
IPv4 ran down because everything needs an IP to be on the net and there are more humans than available addresses, and more gear than humans.
We don't need different characters per human, only to document existing languages and to account for the slow growth of modern hieroglyphs.
But character encodings don't limit the number of codepoints. Unicode is just a big list of correspondences between an integer and a glyph. There's no limit to how many integers you can assign.
Unicode encodings are separate standards that give correspondences between Unicode code points (integers) and byte sequences. If Unicode changes in a way that invalidates an encoding, that just calls for a new encoding.
Yes, it could technically be extended, but the transition would be a massive undertaking, so in practice the encodings do limit the number of codepoints. UTF-16, which creates the limitation, is very widely used and required by major programming language standards like ECMAScript. A lot of software still can't cope with codepoints outside the BMP, and they were established with UTF-16 in 1996.
Besides the difference between the abstract and unlimited Unicode and the encodings, our current "modern" encodings, UTF-8 and the new UTF-16 are artificially restricted and can be trivially expanded into a huge number of codepoints just by removing those restrictions.
New UTF-16? I'm only aware of the original 1996 one, which uses all of its 20 surrogate-pair bits for the codepoint (unlike UTF-8 which can use bits to extend to more bytes). In my understanding, "just" removing that restriction would mean completely replacing the encoding, like UCS-2 being replaced with UTF-16. The new one may have some overlap, but transitioning to it would still be a huge undertaking, and far from trivial (quite a few programs today still use UCS-2, quarter of a century after UTF-16 was introduced to replace it).
Unicode has been limited to 21-bits for a while so that UTF8 is guaranteed to encode no more than four bytes per code point. It can support the full 32-bit code space but changing now will break a lot of validation code.
Note that as it is currently defined, the Unicode codespace ranges from U+0000 to U+10FFFF, with some reserved codepoints (eg to encode surrogate pairs), yielding a total number of 1,112,064 assignable code points.
I find it completely implausible that this will ever change: the current size is baked in too heavily.
• The abomination UTF-16, which is distressingly popular, cannot possibly support it. Replacing UTF-16 would be a massive upheaval in many ecosystems (e.g. JavaScript, Qt, Windows), and there’s no real prospect of most of those environments moving away from UTF-16, because it’s a massive breaking change for them by now. Rather, if the code space were running out, they’d devise something along the lines of second-level surrogate pairs. (And then we’d curse UTF-16 even more, because it’d have ruined Unicode for everyone again.)
• All code that performs Unicode validation (which isn’t as much as it should be, but is still probably a majority) would need to be upgraded. Any systems not upgraded would either mangle or more commonly fail on new characters.
• UTF-8 software would also need to be adjusted, since it’s artificially limited to the 21-bit space; and it wouldn’t be just a matter of flipping a few switches here and there to remove that limit—there will be lots of small places that bake in the the assumption that representing a scalar value requires no more than four UTF-8 code units.
While UTF8 was originally defined as able to encode 31 bits, because of the limitations of UTF-16 RFC 3629 explicitly restricted the unicode code-space to 21 bits (or about 1.1 million codepoints).
I think the current approach is to just invent yet another "meta layer" of characters and declare that this particular sequence of bytes/codepoints/surrogate pairs/grapheme clusters/extended grapheme clusters/zwj sequences/whatever else you can think of has a special meaning and does not behave like you think it does. See also Henri Sivonen's essay on unicode string length [1]
So in a way, Unicode is already long past the time where you invent NATs and other hacks to buy you time with the scarcity problem.
According to UTS #51, as of unicode 14 (and its ~140000 allocated codepoints) there are under 3500 codepoints classified as emoji.
And do keep in mind that #, or ®, are classified as emoji.
And incidentally, U+2654 "white chess king" (♔) was in unicode 1.0. The moral panic around emoji is really tiring, it's absolute, utter nonsense, every single time.
It is a proofreaders mark with languages with long words. The L-shape is "Split the word here" and same with arrow-squiggle on top is "Do it at the next syllable or not at all". For example words "YÖ-KLUBI" and "YÖK-LUBI" have different meanings. Source: I have seen Finnish proofreaders marks.
Here's DIN 16511 https://www2.informatik.hu-berlin.de/sv/lehre/korrekturzeich... for anyone interested. Perhaps someone in Finland could dig further? It might be a bit strange to have proofreader marks for proofreading marks, but maybe something slipped in.
"oikolukumerkit" found an image with more than just the DIN referenced marks, but not much more.
Hm, now that you mention it, I always thought of the external link symbol as being a box with an arrow coming from inside it and protruding out of the upper right hand corner, but I don't see that symbol anywhere in Unicode, and I'm not sure why I have that association.
There is the U+1F517 link symbol but I'm not sure that's communicating the same thing.
To me, it looks like a symbol you would use to denote electricity present. I'd say it was meant to say that an electrical box or some other piece of infrastructure had electricity present. It could even be a non-standard symbol for a ground.
edit: the right angle portion of it looks like the symbol for 3 wire 2 phase electricity used here - https://www.conceptdraw.com/How-To-Guide/qualifying-symbols ..Yes, it is just a right angle. but I could see the electricity symbol being overlaid to indicate that it was an electrical symbol.
To quote a reply from the above StackOverflow thread:
"So, they added a snowman with snow AND a snowman without snow , so that the weather forecaster of this world can avoid the dull snowflake , but we will never get our missing superscript q‽"
I don't understand, why Unicode must (should?) contain superscript and subscript glyphes at all. Declared goal of Unicode is to have encoding of all characters used by all languages, past and modern. Subscript and superscript are not used by any language as separate characters, it is typesetting property. It should be solved by other means, not by character/glyph encoding.
Should Unicode include ALL characters strike-out? Underlined? Double-underlined? Small-caps variant for all letters for languages where small-caps are used in typography tradition?
And, BTW, what do you mean by "all letters"? Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages? So, Unicode must be approximately tripled, bar hieroglyphic part (and why hieroglyphics should not be sub/superscripted?)?
This is probably an edge case, but I work in lab software that uses chemical symbols and having sub and super characters saves lots of headaches. I can just store "CO₂" in a database, query it, and display it back as a simple string, or display values in scientific notation like 1,3×10³, without having to use any formatting.
But to be honest I'm not sure what the parent comment wants to see added because at the moment having all the letters from A-Z, numbers from 0-9, and plus minus and equals signs as both subscript and superscript seems to be enough.
Upper-case subscripts are missing, for one: I'm not allowed to talk about the normal force F_N in plain text email. Superscript and subscript Greek letters would also be nice to have, eg in context of relativity.
Why not Devanagari then? This Europe-centric point of view bother me.
Sure: As I mentioned in another comment, I'd add markers to enable arbitrary super and subscripting.
However, the question I responded to was asking what specifically people were missing in practice, and the examples I gave are things I personally would have used if they had been available.
Should Unicode contain sub/superscript variants of Hangul or Devanagari or letters from hundreds other non-latin-alphabae languages?
Nope, you'd use markers similar to U+200E (LEFT-TO-RIGHT MARK) and U+200F (RIGHT-TO-LEFT MARK) that already exist to indicate text direction (which is also a typesetting property).
They are relevant because Unicode had to define the bidirectional rendering and not every rendering can be automatically inferred from logical (abstract) characters. Unicode has no reason to define the general text rendering including subscripts and superscripts, so there is no reason for Unicode to define control characters for them.
Unicode defines characters, their semantics and (very flexible) guidelines for rendering them. Unlike, say, bold, italic or super/subscripts, bidirectionality is an intrinsic property of those characters and can't be easily refactored.
Unicode specifically states that it doesn't define the semantics of characters. That would seriously interfere with its purpose of defining characters.
There are some notable exceptions, and they are acknowledged to be mistakes.
> Unicode specifically states that it doesn't define the semantics of characters.
The Unicode Standard explicitly says otherwise:
> Characters have well-defined semantics. These semantics are defined by explicitly assigned character properties, rather than implied through the character name or the position of a character in the code tables (see Section 3.5, Properties). [1]
> The Unicode Standard associates a rich set of semantics with characters and, in some instances, with code points. The support of character semantics is required for conformance; see Section 3.2, Conformance Requirements. [2]
To be fair, it refers to "character" semantics which is more or less abstracted by character properties. It is not like that, for example, △ U+25B2 WHITE UP-POINTING TRIANGLE UNICODE CHARACTER can only ever be used for denoting triangles. But it has defined semantics in the way that the character has properties expected for such symbols.
That's a cop out. You could equally say that new emojis shouldn't be added because you should use inline images for those. Or RTL markers shouldn't be added because you should use dedicated text styling for that.
There are a ton of places that don't support superscript markup.
> You could equally say that new emojis shouldn't be added because you should use inline images for those.
If emojis weren't allocated out of compatibility concern, this would be exactly my opinion from the day 1. To be honest I'm not still happy with the current emoji assignments and semantics. Not even Unicode people are satisfied either, there are numerous proposals for replacing emoji with something else (example keyword: QID emoji).
> RTL markers shouldn't be added because you should use dedicated text styling for that.
> There are a ton of places that don't support superscript markup.
Unlike most text attributes, bidirectionality is an intrinsic property of abstract characters and thus absolutely within the Unicode's scope. Ideally you can't and shouldn't make some LTR character to behave like RTL characters or vice versa. Bidi control characters only exist to correct automatic rendering, and can be presented out of band (the Bidi specification is explicitly designed for this use case in mind [1]).
> You could equally say that new emojis shouldn't be added because you should use inline images for those.
Well, that's really a better solution. Or a unicode character that allows you to set a pixel on a 256x256 grid and one to compose them. Strike that. Better not give anyone bad ideas.
Should we also have slanted, bold, semi-bold, light and underlined versions of every code point? Versions with/without serifs? For monospaced text? Those are all presentational matters. That we have super/subscripts in Unicode in the first place seems to have been just a hack to help terminal emulator software deal with obsolete encodings like ISO-8859-1: https://www.unicode.org/L2/L2000/00159-ucsterminal.txt
Those are intended for maths, not for formatted text. Variables in mathematics are usually a single character, so there is a great variety of ways to format the characters to create different symbols. Diacritical marks, underlines, etc. are also used for this.
Fair enough, but general formatting codes would overlap with what is already supported in rich-text formats like HTML or LaTeX. Unicode is a standard for encoding characters, it is not supposed to be a rich-text document format itself.
While I absolutely enjoyed the historical research on such a miniscule mystery, I also liked how it took me two clicks from the front page of HN into an occult eBook about "khaos magick".
It looks like someone asked for a glyph that would look like a chart with a downward trending zigzag, someone ended up getting the instructions and drew this thing, and the request proceeded bunched with other requests through the process with no one adequately challenging that the glyph really looks like what it's supposed to look like.
And yeah, actually a downward zig zag on a x/y plot glyph would be useful to have.
Like "chart with downwards trend" added to Unicode 6.0 in 2010, 25 years after "right angle with downward zig zag" was proposed and included.
This is like the definition of legacy luggage. And somewhere there's probably someone who will argue that if the symbol is not present in a typeface, then said typeface is not "compliant".
Eventually Unicode will think, "Hey, maybe bold, italic and underline aren't just decorative, but required formatting which conveys emphasis, and other information that needs to be contained within the text itself!"
Or, maybe not and we'll continue to lose formatting every time we copy and paste and be forced to use plain text for the rest of our lives. Also, we can color our emojis now, but that WARNING text can't be in red. Because colors don't matter?
Which ever person decided basic formatting shouldn't be in the spec was wrong and we lose important details every day because of it.
> When writing in languages such as Danish and Norwegian, where the empty set character may be confused with the alphabetic letter Ø (as when using the symbol in linguistics), the Unicode character U+29B0 REVERSED EMPTY SET ⦰ may be used instead
The former is probably for the same reason that both plus-minus and minus-plus exist. The latter is commonly used for the "unordered" relation in partially ordered sets.
When I was learning statistical hypothesis testing, I once wrote notes that looked like "H_0: mu ⋚ a <--> p-value: P(T(X) ⋛ T(a))", although I didn't include the equal-to bar.
I think he would have typed things left to right. He only wrote in mirror because it was more ergonomic for him, but there's no such issue with a keyboard.
It can be a symbol for polarization of electromagnetic (EM) waves, with Electrical and Magnetic fields moving orthogonal to each other [1].
Unlike other waves like sound waves, EM has polarization component.
In wireless communication, for example, polarization can be used as another component for diversity to increase the performance of the communication channel.
U-237 exists and has a half-life of about six days; I can't think of a valid modifier that would add that C on the end, though. (Unless you're talking about a very specific isotopic composition of uranium methanide, I guess.)
Translation: Nothing came up on a Google search, and going to the library and looking in a book is hard.
I see this more and more often these days. Bloggers claiming that there is no known origin for something, or inventing their own histories based on nothing more than internet searches.
The internet is vast, but 99.9% of the world's history and information is not online for free.
(A) It's in the math section, (B) it's with angles, (C) the thunderbolt ↯ is commonly used for "not" or more specifically for dis-proof in this area and
(D) at least by my 30 s internet search on a mobile phone I couldn't find any other "no-angle" or "no-right-angle" symbol.
Someone could argue that usually you use a simple strike through as like as in ≠ (unequal), ∉ (not-element-of) or ∅ (empty set) but I would say it was chosen to avoid confusion in this case. The angle itself (without the "no/not") consists of only to orthogonal lines so it would be kinda complicated to "strike it though" in any direction without ambiguity that would resemble a triangle, a fork or whatnot.
■