Hacker News new | past | comments | ask | show | jobs | submit login

No, Python 3 indexes by character, not byte.

    $ python3
    Python 3.4.3 (default, Oct 14 2015, 20:28:29) 
    >>> s = "オンライン"
    >>> s
    'オンライン'
    >>> len(s)
    5
    >>> s [0]
    'オ'
    >>> s [1]
    'ン'
    >>> s [2]
    'ラ'
    >>> s[3]
    'イ'
    >>> s[4]
    'ン'



It's important to be insufferably pedantic about this: they index by code point, which is almost but not quite what people expect a character to be.

    $ python3
    >>> "위키백과"[1]
    '키'
    >>> "위키백과"[1] # Should be identical, right?
    'ᅱ'


> It's important to be insufferably pedantic about this

That is perhaps the most succinct and accurate way I've heard to explain and justify why you're sounding like a wet blanket to people that may not understand, while acknowledging that you know how you sound, but there is a reason for it. I expect to use this in the future.


That's an awesome example! To show more of how it works for Western language speakers who might be confused, how about

c = "é"

c[0], c[1]

It's the same phenomenon with Latin characters. (Extra bonus: for me, the combining acute accent character then combines in the terminal with the apostrophe that Python uses to delimit the string!)

Another idea to see the effect is "a" + "é"[1]. (The result is 'á'... and as in your examples, a precomposed "̈́é" is also available which doesn't exhibit any of these phenomena.)


Same with English, sort of. What is

    "difficult"[2]?
Is it 'f'? Or the ligature `ffi`? :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: