Hacker News new | past | comments | ask | show | jobs | submit login
The bottom emoji breaks rust-analyzer (fasterthanli.me)
200 points by todsacerdoti on Feb 13, 2023 | hide | past | favorite | 139 comments



This is a bug in lsp-mode, a third-party Emacs package which is an LSP client. Since Emacs version 29, Emacs will include in its standard distribution a different LSP client package, eglot, which does not seem to have this bug:

https://github.com/joaotavora/eglot/blob/e501275e06952889056...


Ha as soon as he mentioned LSP I knew it was going to be about the insane fact that it sends all text as UTF-8 (and Rust uses UTF-8) but it uses UTF-16 code points for column indexes. (I assume that is what it is anyway; I did not make it to the end, sorry.)

They've had an open bug about it for years but I think most people just ignore the issue because to handle it correctly you have to convert to UTF-16 and back (again after VSCode has already done it) and it's just not really worth dealing with until Microsoft fixes their end.

Actually I just went back and double checked and Microsoft did actually fix it last year! You can now negotiate to get positions in Unicode code points instead. Rejoice!


Except, as you'll find people in these very comments arguing, if you don't implement utf16 anyways then you're not a compliant LSP.

Don't forget that utf16/UCS2 is Embrance, Extend, Extinguish at its finest. Literally the whole point of utf16 - not necessarily the explicit one, but the point behind the motivations people had for choosing what they chose - was incompatibility with other software supporting unicode back in the day.


So this all justifies my general stance of "find a Unicode expert" when questions of unicode come up. But that's pretty wilfully blind (although practical).

I would like to understand why UTF-32 didn't catch on as The Standard Unicode for the modern world. it seems that - albeit memory wasteful - it would sidestep a lot of these issues.


> why UTF-32 didn't catch on

> memory wasteful

The answer is in the question really. If you've got a big pile of mostly-ascii data, quadrupling memory/storage to encode it as UTF32 is going to be a pretty tough sell


How many "big piles of mostly-ascii data" are there though? (Does anyone want to write a script which searches /dev/mem and categorises pages of RAM into ascii-or-not-ascii so we can get some meaningful numbers? :P )

(If you’re doing number-crunching on giant CSVs, maybe I can see it being important, but all the ascii files on my desktop that I can think of are pretty trivial)


> How many "big piles of mostly-ascii data" are there though?

Well, the use case mentioned in the article is a pretty good one: program source code. Even if you're going to be writing in a foreign language, all of the fancy punctuation and whitespace that does useful stuff in the language ends up being ASCII, and a good hunk of the standard library is likely to have ASCII names for types and functions, etc.


>How many "big piles of mostly-ascii data" are there though?

You just posted this to one.


Hacker News is a super-text-heavy site, but even with this extreme example, the favicon alone means that 20% of the page weight is binary data. For any normal site, binary data outweighs ASCII by several orders of magnitude. In either case though, we’re talking about a few kilobytes, which I wouldn’t consider “big” — like even if you wanted to write an HN reader app for your esp32 microcontroller, HN being served in UTF32 instead of UTF8 probably wouldn’t be the biggest obstacle :P


> This however is not the case when Japanese text is mixed with ASCII control structures. For instance XML or HTML documents include enough in-line control data that is in the ASCII range that UTF-8 becomes more efficient as a format compared to UTF-16 (before compression). For instance the front page of the Japanese Wikipedia is 92KB in UTF-8 and 166KB in UTF-16.

https://lucumr.pocoo.org/2014/1/9/ucs-vs-utf8/

The favicon btw is cached and amortized across all HN pages whereas the text is not.

I forget where I read this but about a decade or so I remember reading a paper or watching a video (maybe from the Azul folks?) that looked into JVM memory usage and a good chunk of it was strings and the ucs2 encoding was a problem. That’s why even languages that are nominally utf16/32 as the native type will frequently auto detect and special cases latin1 strings (python, js, etc). The other piece of it is that strings are copied around more and processed differently from images. The knock on effects of utf32 can be quite unfortunate (ie rendering your html document is meaningfully slower which you care about even though by weight your images take longer to transfer and show)


> That’s why even languages that are nominally utf16/32 as the native type will frequently auto detect and special cases latin1 strings (python, js, etc).

I feel that this is worth a blog post in itself. I remember years ago comparing Go and C# when processing some large mostly-ascii files. The C# program was faster, much to my surprise, despite storing strings natively in UTF-16. (I don't remember the implementation details, so it may have been an artifact of my implementations).


Pretty much every website, due to HTML being ASCII.


I feel like compression will do a better job than deciding on an encoding scheme ahead of time, no? Once gziped I wouldn't expect a difference


I thought I'd read something about it, but when I googled, what I did find was this old HN comment: https://news.ycombinator.com/item?id=8514519

> UTF-8 + gzip is 32% smaller than UTF-32 + gzip using the HN frontpage as corpus.


In theory, yes. In practice, no. At my previous job, I wrote a short Python script that took /usr/dict/words and gzipped it and also converted to UTF-16LE (inserting a null byte before every character) and gzipping that. The information content is the same, but the compressed UTF-16LE ends up a bit bigger. IIRC, the difference was more than 1%, but less than 20%.

My use case was to show the flaw in the logic a colleague was using to assert that gzipped JSON should be the same size as gzipped MessagePack for the same data, because the information content was the same. It was a quick 5-minute script without having to deal with coming up with a suitable JSON corpus to convert to MessagePack.

Among other things, the zlib compression window only holds half as many characters if your characters are twice as big.


For anyone still reading, out of curiosity, I reran the experiment on my Debian box:

  $ </usr/share/dict/words  gzip --best | wc -c
  261255
  $ </usr/share/dict/words iconv -f utf-8 -t utf-16le | gzip --best | wc -c
  303404
A bit over a 16% size increase form converting the "wamerican" dictionary to UTF-16LE and then compressing.


For the large dictionary it’s a tiny bit worse, but still rounds to 16%:

  $ < /usr/share/dict/american-english-insane gzip --best | wc --bytes
  1778330
  $ < /usr/share/dict/american-english-insane iconv -f utf-8 -t utf-16le | gzip --best | wc --bytes
  2061457


You still have to decompress it on the other end, to actually parse and use it. At which point you have four times the memory usage, unless you turn it into some smaller in-memory encoding...such as UTF-8.


Most encoders let you define filters that restructure the data using out of band knowledge to achieve better rates. For example, if you have an array of floating point numbers, rearranging the exponent and mantissa can yield significant savings if you can arrange for a consecutive run of each separately because the generic compressor doesn’t know anything about the structure of the data. Compressors are great but out of band structural compression/reorganization + compressor will always outperform compressor alone.


I thought HTML5, at least, was UTF-8 by default?


It is, for the exact reason why I brought it up: if it were in UTF-32, it would take up much more memory.

(UTF-8 is compatible with ASCII, but even if it was stored in some other encoding, conceptually it could be in ASCII, you know?)


If we’d design computing technology from scratch today, we might be using 32-bit bytes, or maybe even 64-bit ones. If memory usage is not a concern, there’s really no need to have smaller units.

Our world however runs on 8-bit bytes, so it makes some sense for text to be based on that.

But also, consider Base64 in UTF-32-encoded JSON. ;)


Javascript, html, XML, JSON, source code in general of all languages, to name a few.

Also it's important to look at the time period. The farther back in time you go the larger a percentage of all data was designed for direct human consumption. (This is why things like binary coded decimal existed over binary.)


> How many "big piles of mostly-ascii data"

Most databases. It might be compressed on disk, built no DBA wants all their column lengths quadrupled.


It doesn't matter, but programming pop culture says it's better to use a "fast" implementation rather than a "slow" one, regardless of what you're doing and what you need.


UTF-32 isn't even guaranteed to be a single code point.


No, UTF-32 code units are Unicode scalar values¹, always.

They are not grapheme clusters, such as the "family: man, woman, boy" emoji from TFA.

¹which is approximately what I think you're saying here. I.e., you're trying to say that a code point might span multiple UTF-32 code units; that is not correct. (It should be simple to see how a code point, which has the range [0, 0x10FFFF], can always fit into a u32.)


*grapheme cluster


One of the root problems here is that a concept like "character" is extremely underdefined. What you want to do in terms of counting the memory a string will take up, indicating a particular spot in the string, mapping to arrow keys or the backspace key in a visual program, knowing how far to indent to drop a caret in a visual error report, knowing the width of the string in a visual (especially fixed-width) text display. For ASCII text, you can use the same value to represent all possible slightly different definitions, but in non-ASCII, you need to use different definitions, and there's no one true definition that solves all use cases (no, not even grapheme clusters).


UTF-8 has no upper limit on the number of possible characters / emoji's, now or in the future.

Everything else has.

And then is UTF-16 which has all the pains of UTF-8 with none of the advantages of UTF-32


That's not quite right. UTF-8 is not arbitrary length.

Officially, it's at most four bytes, of which 21 bits are usable for encoding codepoints - so that's an upper limit of 2^21 codepoints.

There is an initial byte encoding the length as a series of ones, so if you went ahead and extended the standard to simply allow more bytes, you could get up to 8 bytes, of which 48 bits would be usable.

I can see that a six-byte version with 31 data bits was previously standardised before they settled on four.

I guess you could extend it further by allowing more than one initial byte encoding the length, then it would be arbitrary length. But at that point I'm not sure if it loses its self-synchronising ability, and in any case it would be a different standard at that point.


> if you went ahead and extended the standard to simply allow more bytes, you could get up to 8 bytes

I think you'd only be able to go up to 7, since 10xxxxxx is still reserved for trailing octets. And even with 7, the entire first octet is consumed by the length indicator alone.

So you get 0xxxxxxx, 110xxxxx, 1110xxxx, 11110xxx, 111110xx, 1111110x, and 11111110 as the 7 different length-indicating head octets. In the last case, you'd have 36 usable bits for encoding a codepoint.


Ah, I forgot 10xxxxxx was not usable, but I also forgot 0xxxxxxx was. What about 11111111? If that's valid then it's 8, if I'm thinking straight.


11111111 is technically possible to use, but it would cause some problems. Sending it over the wire would break telnet, for example. Also since we already introduced 11111110 for 7-byte encodes, we're getting dangerously close to making the UTF-16 BOM character (11111111 11111110) accidentally show up in UTF-8 (this is also why 11111110 wasn't in the original maximum-6-byte UTF-8 spec). I still don't think it's possible to have the UTF-16 BOM show up in our hypothetical extended UTF-8, since 11111111 could never be immediately followed by 11111110 (or vice versa) in a well-formed UTF-8 stream.

Also note that if you did add 11111111 as a valid head octet representing an 8 octet long encoding, you'd still only have 42 usable bits (since the first byte is still entirely consumed by the length indicator)


UTF-32 doesn’t completely solve this either, because grapheme clusters.


And endianness.


To be fair, UTF-32 is actually much better about this than most other encodings for which endianness issues are even a thing. The top byte of any UTF-32 code unit has to zero, and NUL isn't a valid (ie text) character, so any ASCII characters anywhere in the file make the endianness unambiguous.

More generally, every code unit in the file has to have the form 00xxyy00, and the possible values for xx and yy must be in the range 0 thru 16, so there are only 17*17-1 = 288 unicode code points that can possibly occur in a endianness-ambiguous string/file.

And out of those, you have confusions like "Ā" (U+0100 LATIN CAPITAL LETTER A WITH MACRON) versus "𐀀" (U+10000 LINEAR B SYLLABLE B008 A) or "𠀀" (U+20000) versus "Ȁ" (U+200), most of which can only happen if you have no idea what language you're expecting. (Some, like "𐄀" (U+10100 AEGEAN WORD SEPARATOR LINE), are the same regardless.)

Whereas here's a valid bash script:

  ⌀℀ ⼀戀椀渀⼀戀愀猀栀਀猀甀搀漀 椀搀਀⌀ 2>/dev/null || true
  echo "Hello, World!" # ਀
(You can almost do this with C/C++ as well, but something needs to #define "⼀⼀" as "int" for it to work properly, and then only if the compiler accepts non-ascii characters (namely "⼀" aka U+2F00 KANGXI RADICAL ONE) in identifiers.)


Since Unicode only uses 21 bits, it would be possible to define an endianness-safe 32-bit encoding. E.g. shift left by one and set the LSB.


oh, _bother_. :-/


It's pointless to have char32_t if you still need to pull several megabytes of ICU to normalize the string first in order to remove characters spanning over multiple codepoints. UTF32 is arguably dangerous because of this, it's yet another attempt to replicate ASCII but with Unicode. The only sane encoding out there is UTF-8, and that's it. If you have to always assume your string is not really splittable without a library, you won't do dangerous stuff such as assuming `wcslen(L"menù") == 4`.


This all justifies my stance of "don't add complexity unless you absolutely need it."

People are surprised, confused, and sometimes even offended by the fact that I do almost all my work with a plain text editor and a terminal. I have the same sentiment towards those who insist upon large complex fragile stacks of tools and then wonder why they spend so much time chasing down bugs in those rather than working on what they actually intended to.


This justifies my stance of "find the most normal setup and use it".

This is a bug in the less popular third-party lsp package for emacs, which is already quite unpopular.

I use VSCode, an enormously complex system. But so many other people use it there tends not to be this sort of bug. And in the rare case there is one I just wait a few days until someone else solves it.


In this particular case, Vscode doesn't have the bug... Because it's responsible for this mess in the first place. LSP was first introduced in Vscode which is written in C++ and typescript, so they found it convenient to assume a UTF-16 encoding (as is native to Javascript). If LSP had originated elsewhere it'd probably use codepoint or UTF-8 byte offsets from the start.


Exactly. On a moral level that's obviously bad. But on a practical level that supports my decision to use VSCode - they got to define the standard such that everyone else has to work around their mess, making themselves have less problems.


Isn't the problem here that both UTF-8 and UTF-16 were being used at once, with incorrect conversion between offsets. I don't see how adding another encoding would help here.

Sure if, everyone used UTF-32 for everything then these problems would go away but they would also go away if everyone used UTF-8, and most uncompressed files would be 4 times smaller.


To be fair, other operating systems took a while to get unicode right.


I gave up trying to read this. Where was the bug?


The LSP protocol sends indexes. Insanely, those indexes are in terms of UTF-16 code units. Emacs's LSP client implementation here is sending the wrong index: 8 for the emoji's index, but 9 for the index of the next "character". But an emoji spans two UTF-16 code units, so the next index is 10.

Rust-analyzer simply crashes here, but it's been fed hot garbage by the editor. One might argue it shouldn't crash. TFA digs into the details around that, too, because Amos leaves no stone unturned.


I imagine this "UTF-16 code unit indexing" decision is just an artifact of the fact that that's how JavaScript works with strings, and LSP comes from VSCode.


Oh, TFA mentions exactly that.

… but still, encoding your language's idiosyncrasies into the protocol is … poor design. This bug was inevitable with such a choice (although even a UTF-8 byte offset, or a scalar value offset would probably be similarly fraught with error, but UTF-16 seems like begging the universe for it), though the actual conclusion here was a bit different than I thought it was going to be.

(And yes, I know other JS UTF-16 idiosyncrasies made their way into the very fabric of JSON … and those are ugly too.)


Nitpick: lsp-mode is not “Emacs's LSP client”. Emacs recently chose to include “eglot-mode” as part of Emacs itself, and eglot-mode must therefore be considered to be Emacs’ official LSP client, not “lsp-mode”.



Small mistake.

> High surrogates are D800-DB7F

Akshually, high surrogates extend all the way to DBFF.

https://unicode-table.com/en/blocks/high-surrogates/


story of my life with emacs.


Somewhat related, Dropbox fails to sync with certain emojis.


Misleading headline.

> the actual bug: let's add to our code.. an emoji! Any emoji.

> rust-analyzer adheres to the LSP spec. And lsp-mode doesn't.

So “emojis break an Emacs extension”.

Though for me the real takeaway is that LSP specifies UTF-16 offsets. That sounds unpleasant to work with.


> Though for me the real takeaway is that LSP specifies UTF-16 offsets. That sounds unpleasant to work with.

Speaking as someone who has written an lsp server: yes, it is indeed unpleasant to work with.


Thanks, for people looking for bug, see https://github.com/emacs-lsp/lsp-mode/issues/2080


> Though for me the real takeaway is that LSP specifies UTF-16 offsets. That sounds unpleasant to work with.

It is beyond time for UTF-16 to die. And yet it looks like we're stuck with it for the inevitable foreseeable future.


We'll be stuck with UTF-16 as long as we have Unicode, it would seem. The character set itself has to have a hole in the middle of it for the sole purpose of working around UTF-16 limitations.


First time I've heard "pleading face" referred to as "the bottom emoji"


I chuckled at the headline, assumed it would be 'peach', now I'm just confused? The article doesn't seem to address it, except to link https://unicode-table.com/en/1F97A/ as 'U+1F97A Bottom Face Emoji' (which that page doesn't say at all), so seems very deliberate, must be some joke we're not in on (on the 'right' subreddits for or whatever) I suppose.


In communities that enjoy sexual power play you have a top/dominant and a bottom/submissive.

This is particularly prevalent amongst trans people on twitter (where I see it used a lot) you see a lot of people using this emoji under for example a powerful looking selfie of someone who appears dominant, as a playful offer of submission.


That's kind of weird to label it based on sexual roles by default. How would they go about explaining that label to a child? How about puss in boots face?


The emoji has another name, “pleading face”. This is basically how it became to be used by bottoms. Tops don’t plead, but bottoms do.

There’s lots of emojis that the culture has given alternate meanings to, for example the peach and the eggplant.

You could ask “how would I explain this to a child” to anything adults talk about that is sexual in nature. Usually the answer is “don’t.”

I don’t think TFA was written for children and the author wanted to include this common internet meme in to their article title.


Perhaps the author assumes that a child with a deep interest in rust Unicode edge cases has likely been on the internet before, and may well have been exposed to the existence of sex?


On the other hand, if anyone has a good way to explain variable-length integer encodings with constant-complexity backtracking to a child, please let me know.


Every sentence starts with a capital letter, and ends with a full stop. Now, imagine that every sentence is at most four words long. It might be one, or two, or three, but it won't have five words.

Imagine you drop your finger randomly on a word. How can you find the start of the sentence it's in?

(After this, if the child were familiar with binary, I'd show the actual representation of UTF-8, perhaps colour-coded. It's really quite intuitive. No need to go for the abstract straight away: if the child can generalise, they can generalise, and if not, there's no point making it artificially confusing.)


> How would they go about explaining that label to a child?

You wouldn't. Just like you probably wouldn't explain the sexual meanings behind the eggplant or peach emojis to a child.

Not sure why this needs to be a consideration. If you're writing for an audience that includes children, sure, use child-friendly terms and concepts. If you don't care about including children in your readership, go nuts.


But, to explain the categorical meaning of the term, you have to lean on the sexual meaning. That is a pretty big difference. And unnecessarily scopes the audience to people that wouldn't be offended by this. Especially when ⊥ exists. Why is that not available as an emoji?

Granted, I don't think everyone needs to be prudish, such that this is a bit of a tempest in a teapot. But it is very different than claiming that explicitly sexual reframing of other items is the same thing.


No no, that's why some people here are confused and out of the loop - the actual unicode name is not 'bottom face', but 'pleading face' or 'face with pleading eyes' (I'm not sure of a good reference for 'actual' names, got those from two different unicode dictionary type sites).


[flagged]


Top/bottom don’t even have anything to do with gender, and it wouldn’t matter if they did.


It's a fucking pleading face and they made it about sexual roles for no good reason... It's stupid.


The author didn't make it about sexual roles; it was already about sexual roles for a particular in-group that the author is a part of. The fact that you (or I) didn't know about it until now is irrelevant.

I'm sure there are plenty of seemingly-mundane ways that you or I communicate that others might find "stupid" (I know I'm consciously trying to begin fewer sentences with "I mean"). So what? Getting worked up about something like this, enough to post about it on an internet forum, seems a bit much.

It feels like you have some conscious or unconscious axe to grind (or at least some high level of discomfort) with people who have different sexual "culture" (for lack of a better word) than you do. Maybe ask yourself why?


Every meme is stupid. That’s the point.

Also, pleading is an actual part of this very common sexual dynamic. If you’re just made uncomfortable by sex that’s fine but you don’t have to turn that in to judgment of people who aren’t.


> If you’re just made uncomfortable by sex that’s fine but you don’t have to turn that in to judgment of people who aren’t.

You accuse me of making assumptions and then make your own incorrect ones.

I find it annoying when people weave their personal non-technical agendas into technical discussions. I can anticipate your response now... there is no agenda. But go read the article and you will notice is not even specific to this one emoji.

[edit]

Keep up the good work mr/ms downvoter, I've got a lot of points to burn and no shop to spend them in. Good to know I'm pissing off one person who is chronically offended by my skepticism.


> I find it annoying when people weave their personal non-technical agendas

This is the problem with people like yourself: just because someone makes a reference to something prevalent in their subculture in normal writing or conversation, you get all worked up and complain that they're trying to "push an agenda".

No, they're just speaking using their own subculture's jargon. I don't get why you seem to need to label it as something nefarious.

While I was aware of the sexual connotations of "top" and "bottom", I didn't know about this emoji's "alternate name" or its use here. It didn't detract from the article at all for me, and as a bonus, I learned something new about how people different from myself communicate sometimes.

About the only thing I can say in "agreement" is that sure, communicating in this manner can cause confusion and make such communication less clear. But the author probably doesn't particularly care about that, and has no obligation to do so.


> Keep up the good work mr/ms downvoter

It’s they/them downvoter actually.

But seriously, as far as I can tell it’s not possible to downvote people who have replied directly to me. It’s just other people downvoting you because what you’re saying is unpopular.

You’re certainly not pissing me off. That’s why I’m trying to calmly explain your errors. But it sounds like you’ve made up your mind and take disagreement as some kind of encouragement.


Yes, I know it's not you.

> But it sounds like you’ve made up your mind and take disagreement as some kind of encouragement.

I do find disagreement of silent onlookers with nothing to contribute irritating yes. Life is too short to fear controversial or unpopular ideas so I take it as encouragement. I'm happy to change my opinion in the face of compelling arguments, not thoughtless people.


If it were a "fucking pleading" face, it would pretty explicitly be about sexual roles.


> fucking pleading face...about sexual roles

Exactly.

(also, you are so worried about the kids, why are you making sex about anger in your anger over sex?)


I'm not worried about kids, and I don't care about sex, I'm making a point that this has been arbitrarily sexualised for no reason. Obviously you wouldn't give it that label by default, so what is the author trying to do here? This is supposed to be a technical problem.

I also find it funny that pointing out something is a weird and distracting choice is considered angry, and I can only assume by the onslaught of downvotes, uninclusive or something... or maybe my "what if" has been interpreted as a "think of the children", hard to know when people don't bother forming an argument.


The pattern: It's a fucking $X and they $Y

Idiomatically signifies exasperated frustration/anger in many contexts.

My entire comment was just a play on words, pointing out that the word "fucking" means "having sex" (i know it's one of those words with a ton of shades and nuance so there are lots of uses for it but for the play on words, let's be literal - also easier when the context is sexual to begin with). When read in that light - "a having sex pleading face and they made it about sexual roles" is amusing, no?


Fair enough lol, lost in the medium of text I think. "Fuck" is funnily enough one of if not the most adaptive english words.


People put references to hobbies/interests/trends in technical problem discussions all the time; that's what makes a shared culture. Look at e.g. "yeet" in the Rust expression discussions, or cheese/spam/etc. mentions in Python.

Why do you think this should be treated differently when that hobby/interest/trend is sexual? Why do you think something sexual is "weird/distracting" in a way that other interests aren't? Given that you explicitly said "How would they go about explaining that label to a child?", complaining that you're not making a "think of the children" argument and don't care about children seems more than a little disingenuous.


[flagged]


whoosh


[flagged]


Because your comment was ad hominem. I didn't flag it, but this is generally considered counterproductive to interesting discussion on HN and in general.


Oh I see. Yes does seem a bit odd, but their blog to do what they want with I suppose - they don't need to explain it to a child. Also it's possibly just so familiar to them as that they didn't think about it, don't know it as anything else.


Oh won't someone please think of the children reading articles on Rust internals??


It seems speaking of children is a trigger. My point is that it has been arbitrarily sexualised and doesn't seem relevant to the article.


Imagine my surprise in Latin American places where Activo/Passivo is exclusively used to describe top/bottom. I kinda have to assume that those words mean more than their congnates in context, or else I kinda feel bad for gay Latin communities being shoe-horned into archaic roles.


[flagged]


The term “perverts” implies this behavior is abnormal but it’s very common. But in the USA for example the dominant culture pretends like this doesn’t happen and encourages repression of these desires. I’d rather adults be free to express themselves in healthy ways than judge and label them as “perverts” despite their feelings being common and normal.


I don’t mind, but it’s definitely okay for people to be offended by the use of (what seems to be?) BDSM fetish community terminology. I don’t think this is a common term. If I referred to the cat face emoji as the pussy emoji or the cunt emoji in a Rust programming language article, I’d expect some pushback.


i think it also depend on the social group/context. in some place the word bottom kinda lost some of his "sexual" connotation and become just a "fun" world for submissive/unassertive.

(to be clear i also think is okay not to like this kind of language)


Well, I'd rather that creepy males didn't parade their sexual fetishes everywhere and pretend that it's totally normal and fine to do so.


It's used by lesbians just as often and giving vs receiving has nothing to do with fetishes. You can have the most vanilla sex imaginable and still have a top/bottom.


You really created a new account just to post a rude comment like this?



It’s a very weird choice of descriptions in this context, since it is neither about the domain in which in which that meaning applies, nor, even, anything really specifically about that emoji in particular. It’s exactly like unnecessarily dropping U+1F346 AUBERGINE into a conversations that applies to any emoji…and calling it “the male genitalia emoji”.


Never before has a programmer joked about getting screwed by a bug, the audacity!


Was I the only one looking for ⊥?


You were not.


I expected something about type lattice representation.


Yes.


There was a discussion on HN recently about it: https://news.ycombinator.com/item?id=34454165


It's common in gay circles


Interesting. Is there a corresponding "top" emoji?


Tops are considered less likely to use emoji from my understanding. They're supposed to be more stoic.



Appropriately, it doesn’t exist


I appreciate the subtle humor here, but I suspect most of the HN crowd won't quite get it.


There's 𓂸, but I guess it's not an emoji.


First I rolled my eyes when some HN users had to explain to other HN users what that “puppy dog eyes” thing meant. But I felt a bit worse when I first heard that name for it, being used as if it was canonical.


Take it as a sign that you need to hang out with more diverse crowds!


I'm not really sure why I should take it as a sign of that. What crowd does this indicate I don't hang out with?


Queer people. This is an easily-inferred reference for anyone who's decently-good friends with at least one gay guy or lesbian.


I have a handful of very close friends who are gay, but I've never heard of this bottom-face emoji thing before today.

And I'm a little concerned why not knowing about something that seems like an esoteric piece of fetish-related in-group communication implies that I don't hang out with queer people? Is there an expectation that in order to hang out with people belonging to some group, I need to learn specific lingo regarding that group's sexual power dynamics?


Yes, otherwise you're a bigot trying to erase queer people. /s


[flagged]


That's comes off as kind of unfair and misplaced, when a person is not of nor participates in activities common to a particular group. We may only see or know of a particular side of a person. We can see people, in certain settings for many years, yet not know various things about them or they don't show aspects or share various details about themselves.

It's one thing to ask or expect others to be open-minded or have empathy for things outside of their regular circles, it's another to presume they must know details about cultures and subcultures they aren't apart of or even condemn them for genuinely not knowing about something or asking.

It's alright to not know about everything, as nobody does, and ask questions as they come up. For those concerned, they can explain the history, details, or their views. It's a normal process of learning.


LGBT+ BDSM enthusiast.


Hacker culture is queer now, and queer, kinky (but SSC) sex is an integral part of that.

If it helps any, its official name is U+1F97A, FACE WITH PLEADING EYES.


That's an awful lot of time spent talking about your Emacs config...


Agreed, but I personally enjoyed the detour into "wow, our tools have terrible UX" territory. Totally get that some people would find it superfluous, but I thought it was fun.

I'm a vim user, and find the landscape to be pretty bad there too; it's nice to see that emacs is no better (and IMO worse, based at least on this one example).


Do they have terrible UX, though?

I'm also a vim user as my primary editor. My .vimrc does very little beyond the default .vimrc. I don't want to spend any time configuring my tools, I just want them to work. And that's what I get with vim, on any Linux system out there: a vim that works pretty much exactly the same as the config I use on my own machine and know well. If it's not already installed (and it often is), vim is just a package manager call away.

If you really think your tools have terrible UX, find/build better tools?

I'm not convinced time spent in my dotfiles isn't just time wasted. I don't want to hack my tools, I just want my tools to work. And vim does. Most of the tools I've come across for vim, are already in vim with good enough UX. The exception is language-specific tooling for stuff like highlighting compiler/linter errors or failing tests, but it's a massive amount of work to get that stuff working, and I'm not sure that pays off when I can just run that stuff from the shell. Sure, it takes a second to switch to another command line tab to run the tool from shell, but how many 1-second switches does it take to add up to 4 hours spent debugging my .vimrc? It's not worth it.

And, by the way, I'm not throwing any shade here. If you like fiddling around with your config files, that's an entirely valid reason to spend any amount of time that you want, fiddling around with your config files. Do what you enjoy--you don't need my blessing, but you have my blessing.

Just be honest with yourself about why you're doing it, and do it on your own time. If you need to complete a task in a timely manner, it's extremely unlikely that any step in completing that task involves your editor configuration. There's absolutely no way that all that Emacs configuration was the fastest way to reproduce that bug in the OP.


The perils of combining executive config with long-lived sessions. I frequently try something out by evaling it first to see if I want to put it in my permanent config then forget to actually add it. Mickens almost had it right about "perfect window placement", it's just perfect editore state (including open files and caret position)


+1 appreciate the effort, but just get to the point thanks...


I liked the emacs config discussion.


Me too! Fixing Emacs + this whole `rls` strangeness was a yak whose shave I've been putting off for a while, and this just gave me quite a head start.

Plus, one of my biggest gripes with Emacs documentation that it is very hard to find good articles that are contemporary + shows the full configuration + comes from a perspective of a user using a tool, rather than a programmer building their environment from scratch.

Yes, I know, Emacs is one of the quintessential "environments built from scratch", and I engage in that too – sometimes to my detriment. But some days I just need to get python-mode / Poetry.el / eglot+Pyright to all play nice together and an article like this would go a long way.


It's fasterthanlime. It always takes ages to get to the point.


I love the author's style. Not every blog need do it, but I appreciate that fasterthanli.me does it.

To me, it is intellectually honest: this shows a reader every step along the way, every painful trail that must be overcome from point A to point B. Nothing is omitted. And I think the sooner we all did this, as an industry, the sooner the very many problems and bugs that exist (that get hit before we can even "get to the point", as you say) would get dragged into the light, and maybe we'd progress, as a society, towards having computers that weren't shit.


> Nothing is omitted

To quote internet reviewer: Brevity is the soul of wit. That means stop wasting my time. Keep it nice and simple.

Look. Like what you want, I'm free to prefer a shorter form, and to point out this is part of author's style.


This submission went from #5 to #42 in a minute so I'm assuming it's been deranked. An explanation would be nice, in case a moderator is around.

edit: now it's no longer there at all. Welp.


(not a mod just been on this website forever)

At this exact moment, there's 49 points, but 45 comments. If comments >= points, you get a significant ranking penalty (the "flamebait detector", iirc). I can't say for sure that that's happened, but given how close the two numbers are, I bet at some point that was true.


Ahhh, that sounds likely, thanks!


Given the controversy on-thread about the length of the intro, calling it the bottom emoji, and the title making it sound like the problem is specific to that emoji, I wouldn't be surprised if it has collected a number of flags as well as tripping the overheated discursion detector.

FWIW I enjoy your articles in general and this was not an exception, although I could have done with a shorter exposition myself ;)


Could be automated, there's the flamebait detector that deranks if it gets a high comments/vote ratio.


Of all the languages to have a problem with the bottom emoji, Rust definitely ought to have the largest blast radius.


Next time just use Visual Studio Code for Rust development.


A non-core lsp library having a bug doesn't justify moving from emacs to vscode. Especially if you use org-mode, org-roam, or customize your workflow with elisp.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: