No, Python 3 indexes by character, not byte. $ python3 Python 3.4.3 (default, Oc...

pjscott · on June 19, 2016

It's important to be insufferably pedantic about this: they index by code point, which is almost but not quite what people expect a character to be.

    $ python3
    >>> "위키백과"[1]
    '키'
    >>> "위키백과"[1] # Should be identical, right?
    'ᅱ'

kbenson · on June 19, 2016

> It's important to be insufferably pedantic about this

That is perhaps the most succinct and accurate way I've heard to explain and justify why you're sounding like a wet blanket to people that may not understand, while acknowledging that you know how you sound, but there is a reason for it. I expect to use this in the future.

schoen · on June 20, 2016

That's an awesome example! To show more of how it works for Western language speakers who might be confused, how about

c = "é"

c[0], c[1]

It's the same phenomenon with Latin characters. (Extra bonus: for me, the combining acute accent character then combines in the terminal with the apostrophe that Python uses to delimit the string!)

Another idea to see the effect is "a" + "é"[1]. (The result is 'á'... and as in your examples, a precomposed "̈́é" is also available which doesn't exhibit any of these phenomena.)

kazinator · on June 20, 2016

Same with English, sort of. What is

    "difficult"[2]?

Is it 'f'? Or the ligature `ffi`? :)