In all my GPT-4 API (python) experiments, it takes 15-20 seconds to get a full response from server, which basically kills every idea I've tried hacking up because it just runs so slowly.
Has anyone fared better? I might be doing something wrong but I can't see what that could possibly be.
Streaming. If you’re expecting structured data as a response, request YAML or JSONL so you can progressively parse it. Time to first byte can be milliseconds instead of 15-20s. Obviously, this technique can only work for certain things, but I found that it was possible for everything I tried.
We use it to generate automatic insights from survey data at a weekly cadence for Zigpoll (https://www.zigpoll.com). This makes getting an instant response unnecessary but still provides a lot of value to our customers.
Has anyone fared better? I might be doing something wrong but I can't see what that could possibly be.