The name 3.5-turbo sounds to me like it implies distillation. The release notes ...

sanxiyn · on May 10, 2023

Well, that's why I said public. Personally, I don't think release notes https://help.openai.com/en/articles/6825453-chatgpt-release-... hinted at any such thing, and I think quantization is more likely than distillation.

ImprobableTruth · on May 11, 2023

Turbo uses a different vocabulary (same one as gpt-4). Indicates that it's not the same model as the original 3.5, so I would be very surprised if it wasn't distilled.

kristianp · on May 10, 2023

Does the turbo API being 10 times cheaper than davinci imply anything? It implies more than just quantisation to me.

cubefox · on May 11, 2023

"davinci" is the original GPT-3 (175B) which had too many parameters per Chinchilla scaling law. And parameter count is strongly correlated with inference cost. GPT-3.5 is likely Chinchilla optimal and much smaller than davinci.

Though this theory has the defect that GPT-4 is, I think, more expensive than GPT-3, but as I recall it was considered unlikely that GPT-4 is larger than 175 billion parameters. Not sure.