Hacker News new | past | comments | ask | show | jobs | submit login

If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?



I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.


I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: