Turbo uses a different vocabulary (same one as gpt-4). Indicates that it's not the same model as the original 3.5, so I would be very surprised if it wasn't distilled.
"davinci" is the original GPT-3 (175B) which had too many parameters per Chinchilla scaling law. And parameter count is strongly correlated with inference cost. GPT-3.5 is likely Chinchilla optimal and much smaller than davinci.
Though this theory has the defect that GPT-4 is, I think, more expensive than GPT-3, but as I recall it was considered unlikely that GPT-4 is larger than 175 billion parameters. Not sure.