Feels like there should be two branches of system design for unit profitable architecture for paying users vs VC backed architectures to support non-paying users.
I'm building a tool for transcriptions where your brand, product or any other technical jargon or heck even your own name does not need a fine-tuned model all the time. Both as a free native Mac app & a SaaS tool for those who need to process in bulk. Hit me up at nebo@minusgreed.com to check it out, will launch as FortuneSpeech.com once out of beta.
Thanks to this post, I kind of have another idea for a 'corrections to transcript' feature that Llama2-7B even on CPU can help with.
It's like a repeat of c++ or python. If you can find folks & tooling for what you want to accomplish then go for it. I don't think founders who actually have paying customers concern themselves with what works best per cent of compute spend rather what allows them to improve the product, while as an outsider it may seem that they can reduce costs by choosing X over Y.
I would suggest getting your feet wet with HuggingFace Spaces free/Pro plan to get started & then their APIs once you get the hang of setting up things there. After that you can start with setting up LangChain pipelines or direct vector DB queries for which sort of columns or SQL queries to formulate(for the latter).
As for the former classifier you can try doing zero-shot classification between n number of categories + others. Models like Flan-T5/T5/Flan-UL2/DistillBART(also ~7B-40B param LLMs can also do this but would be overkill).