It's for KV caching. In most conversations that will mean inference. But you *ca...

pizza 59 days ago | parent | context | favorite | on: New LLM optimization technique slashes memory cost...

It's for KV caching. In most conversations that will mean inference. But you can do reinforcement learning using sampled sequences, and you could use KV caching to speed that up too, so that would be an instance where training could get a slight boost.