For what it’s worth, I’ve found using Perplexity Pro (with Claude 3 Opus) to exceed the accuracy of asking a non-expert armed with Google Search and a few minutes of time.
i'm really enjoy the resurgence of very minimal implementations of ml algorithms, because if you've recently tried performing inference on a sophisticated ml model in a way that's user friendly in any capacity, you know that it essentially involves pulling out your prayer book, rosary and incense, pulling like 20gb of python dependencies, 20 different frameworks, all of which breaks very easily, any minor difference in versioning is guaranteed to break the entire setup, with no hope of fixing it, it's just bindings on top of bindings on top of bindings, every other day a new library comes out that builds on top of existing libraries, introducing their new format, promising "deploy models in with 15 lines of python", then "10 lines of python", then "1 one of python", which essentially calls into a black box N layers of python on top of each other, calling into an extremely complicated C++ autodiff library, the source code of which can only be acquired by an in person meeting with some sketchy software engineer from czechia, all of which only works on python 3.10.2, cuda v12.78.1298.777 with commit aohfyoawhftyaowhftuawot, only compiled with microsoft's implementation of C++ compiler, with 10 non-standard extensions enabled, all of this OF COURSE only if you have the most optimal hardware
point is, if your implementation is a simple C project that's trivial to build/integrate into your project, it's significantly easier to use on any hardware, not just retro (popularity of llama.cpp is a great testament to that imo)