Hacker News new | past | comments | ask | show | jobs | submit login

In all my GPT-4 API (python) experiments, it takes 15-20 seconds to get a full response from server, which basically kills every idea I've tried hacking up because it just runs so slowly.

Has anyone fared better? I might be doing something wrong but I can't see what that could possibly be.




Streaming. If you’re expecting structured data as a response, request YAML or JSONL so you can progressively parse it. Time to first byte can be milliseconds instead of 15-20s. Obviously, this technique can only work for certain things, but I found that it was possible for everything I tried.


Run it in the background.

We use it to generate automatic insights from survey data at a weekly cadence for Zigpoll (https://www.zigpoll.com). This makes getting an instant response unnecessary but still provides a lot of value to our customers.


Anthropic Instant is the best LLM if you're looking for speed.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: