Might as well have a quick discussion here. How's everyone finding the new models?
4-Turbo is a bit worse than 4 for my NLP work. But it's so much cheaper that I'll probably move every pipeline to using that. Depending on the exact problem it can even be comparable in quality/price to 3.5-turbo.
However the fact that output tokens are limited to 4096 is a big asterisk on the 128k context.
It's probably a smaller, updated (distilled?) version of gpt-4 model given the price decrease, speed increase, and turbo name. Why wouldn't you expect it to be slightly worse? We saw the same thing with 3-davinci and 3.5-turbo.
I'm not going off pure feelings either. I have benchmarks in place comparing pipeline outputs to ground truth. But like I said, it's comparable enough to 4, at a much lower price, making it a great model.
Edit: After the outage, the outputs are better wtf. Nvm it has some variance even at temp = 0. I should use a fixed seed.
4-Turbo is much faster, which for my use case is very important. Wish we could get more than 100 requests per day.. Is the limit higher when you have a higher usage tier?
4-Turbo is a bit worse than 4 for my NLP work. But it's so much cheaper that I'll probably move every pipeline to using that. Depending on the exact problem it can even be comparable in quality/price to 3.5-turbo. However the fact that output tokens are limited to 4096 is a big asterisk on the 128k context.