In all my GPT-4 API (python) experiments, it takes 15-20 seconds to get a full r...

jondwillis · on July 6, 2023

Streaming. If you’re expecting structured data as a response, request YAML or JSONL so you can progressively parse it. Time to first byte can be milliseconds instead of 15-20s. Obviously, this technique can only work for certain things, but I found that it was possible for everything I tried.

jason_zig · on July 6, 2023

Run it in the background.

We use it to generate automatic insights from survey data at a weekly cadence for Zigpoll (https://www.zigpoll.com). This makes getting an instant response unnecessary but still provides a lot of value to our customers.

ianhawes · on July 6, 2023

Anthropic Instant is the best LLM if you're looking for speed.