It definitely gets a bit murky when dealing with mbcs, when you want characters ...

matheusmoreira · on Aug 27, 2022

Yeah, that's an important observation especially in today's unicode world. It just strengthens my point that these "string" functions are really just bytes/memory functions in disguise.

Honestly "string" is a very harmful word that we've all grown used to. As an abstraction it sits somewhere between raw bytes and properly encoded text with proper unicode functions such as those provided by ICU. Python 3 finally forced people to start thinking about this stuff and nobody liked it.

mort96 · on Aug 27, 2022

The str functions aren't ASCII-only, they work perfectly fine with multi-byte strings such as UTF-8-encoded strings. The "length" just isn't the number of "characters", but the definition of a "character" itself is murky and bytes are what what you're usually interested in anyways.

johannes1234321 · on Aug 27, 2022

> and bytes are what what you're usually interested in anyways.

Bytes are relevant when I have to allocate memory otherwise some definition of "character" is often more relevant. Even if I trim text to fit in a buffer I don't want to trim inside a "character" but get the most number of fitting "characters" Now "characters" are of course complicated as grapheme clusters are what is useful the most for human interaction ... but those are quite out of scope for a "simple" string library ...