I literally have DistilBERT models that can do this exact task in ~14ms on an NVIDIA A6000. I don’t know the precise performance per watt, but it’s really fucking low.
I use LLM to help with training data as they are great at zero shot, but after the training corpora is built a small, well trained, model will smoke an LLM in classification accuracy and are way faster - which means you can get scale and low carbon cost.
In my personal opinion there is a moral imperative to use the most efficient models possible at every step in a system design. LLM are one type of architecture and while they do a lot well, you can use a variety of energy efficient techniques to do discrete tasks much better.
Thanks for providing a concrete model to work with. Compared to GPT3.5, the number you're looking for is ~0.04%. I pointed out the napkin math because 0.00000001% was so obviously wrong even at a glance that it was hurting your claim.
And, yes, purpose-built models definitely have their place even with the advent of LLMs. I'm happy to see more people working on that sort of thing.
I use LLM to help with training data as they are great at zero shot, but after the training corpora is built a small, well trained, model will smoke an LLM in classification accuracy and are way faster - which means you can get scale and low carbon cost.
In my personal opinion there is a moral imperative to use the most efficient models possible at every step in a system design. LLM are one type of architecture and while they do a lot well, you can use a variety of energy efficient techniques to do discrete tasks much better.