Activation sparsity and packing sparse matrices will surely be important, so there is one kind of performance. However the other, perplexity, needs a good demonstration. It might require a big model, but even 30B you can fine tune on nowadays on a big Cloud GPU box.