I recently wrote a blog post exploring the idea of interfacing with local LLMs for ambiguous tasks like this. Doesn't that make more sense than coding the neural network yourself? Using something like llama.cpp and evaluating whether a small model solves your problem out of the box, and fine-tuning if not, then programmatically interfacing with llama.cpp via a wrapper of your choice seems more pragmatic to me.