Hacker News new | past | comments | ask | show | jobs | submit login
GPT-3 and Arithmetic (nostalgebraist.tumblr.com)
3 points by luu on June 12, 2020 | hide | past | favorite | 1 comment



The article is talking about something called BPE but it's not defined anywhere. Chasing links I see it's about tokenization and how it's "weird".

> BPE tries to be efficient, so it doesn’t waste token slots on spaces if it doesn’t have to. A word is almost always preceded by a space, so instead of representing “ example text” as four tokens (space, “example,” space, “text”), it represents it as two:

> [(' Example', 17934), (' text', 2420)]

Apparently this is a problem because when feeding prompts to the system you will get different results.

> So far, seems innocuous, right? But what if you’re feeding a prompt into GPT-2? Unless you’re hip to this particular issue, you’ll probably type in something like

> “Example text”

> which becomes

> [('Example', 16281), (' text', 2420)]

> Compare this to the one above. Yes – instead of token #17934, with the preceding space, I’ve unwittingly fed in token #16281, without a preceding space.

Maybe someone should figure out how to write a neural network for tokenizing text in a more natural way since it seems like the tokenizer is hand crafted and is essentially a hyperparameter that can not be optimized with gradient descent.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: