Hacker News new | past | comments | ask | show | jobs | submit login

for performance reasons, yes, I believe it's because the accumulation is over parallel computations so the ordering is at the mercy of the scheduler. but I'm not familiar with the precise details

edit: at 13:42 in https://www.youtube.com/watch?v=TB07_mUMt0U&t=13m42s there is an explanation of the phenomenon in the context of training but I suspect the same kind of operation is happening during inference




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: