Hacker News new | past | comments | ask | show | jobs | submit login

The name 3.5-turbo sounds to me like it implies distillation. The release notes at the time also hinted at it IIRC.



Well, that's why I said public. Personally, I don't think release notes https://help.openai.com/en/articles/6825453-chatgpt-release-... hinted at any such thing, and I think quantization is more likely than distillation.


Turbo uses a different vocabulary (same one as gpt-4). Indicates that it's not the same model as the original 3.5, so I would be very surprised if it wasn't distilled.


Does the turbo API being 10 times cheaper than davinci imply anything? It implies more than just quantisation to me.


"davinci" is the original GPT-3 (175B) which had too many parameters per Chinchilla scaling law. And parameter count is strongly correlated with inference cost. GPT-3.5 is likely Chinchilla optimal and much smaller than davinci.

Though this theory has the defect that GPT-4 is, I think, more expensive than GPT-3, but as I recall it was considered unlikely that GPT-4 is larger than 175 billion parameters. Not sure.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: