Isn't it more that they don't have ready access to the much-more-fundamental con...

isaacfung · 2024-05-28T10:28:40 1716892120

The current gen llms tokenize numbers digit by digit unlike earlier llms.

Last5Digits · 2024-05-28T10:49:02 1716893342

They don't. Which you can easily check with any of the dozen web apps currently implementing the GPT-4o tokenizer.

mike_hearn · 2024-05-28T11:06:13 1716894373

No, it doesn't help. Bloomberg tried this and it didn't seem to make much difference.

singularity2001 · 2024-05-28T15:57:05 1716911825

If someone else is interested in the Bloomberg tokenizer:

andrepd · 2024-05-28T10:22:50 1716891770

Fascinating article!

Terr_ · 2024-05-29T00:46:38 1716943598

It looks like the math-notation formatting didn't survive, for that you might want to see a PDF, ex: https://people.wou.edu/~girodm/library/benny.pdf