Activation sparsity and packing sparse matrices will surely be important, so the...

Activation sparsity and packing sparse matrices will surely be important, so there is one kind of performance. However the other, perplexity, needs a good demonstration. It might require a big model, but even 30B you can fine tune on nowadays on a big Cloud GPU box.