> The standard practice for achieving fast inference is to rewrite the entire model inference loop in C++, as in FasterTransformer, and call out to special fused kernels in CUDA. But this means that any changes to the model require painfully reimplementing every feature twice: once in Python / PyTorch in the training code and again in C++ in the inference codebase. We found this process too cumbersome and error prone to iterate quickly on the model.
I am an AI novice but why can't they automated this with AI? I thought the whole point of these tools was to automated tasks that are error prone and require lots of attention to details. Computers are great at that kind of stuff so it's surprising they haven't applied AI techniques to automate parts of the AI pipeline like converting code from Python to C++.
Automatic kernel fusion (compilation) is a very active field, and most major frameworks support some easy-to-use compilation (e.g. jax's jit, or torch.compile which iirc uses openai's triton under the hood). Often you can still do better than the compiler by writing fused kernels yourself (either in cuda c++ or in something like triton (python which compiles down to cuda) but compilers are getting pretty good.
edit: not sure why op is getting downvotes, this is a very reasonable question imo; maybe the characterization of kernel compilation as "AI" vs. just "software"?
Both AI and compilers are just software and right now the optimizers are written manually which is kinda weird because the whole point of LLMs is to generate sequences of tokens that minimize some scalar valued loss function. In the case of compilers the input is some high level code in python expressing tensor operations and the output is whatever is executable by GPUs as fast as possible by combination of kernels which are formally equivalent to the tensor operations expressed in Python (or whatever higher level language is used to write the tensor specifications to be optimized for the task at hand). Everything in this loop has a well defined input with a well defined output and an associated scalar valued metric (execution time) and even a normalization factor (output length with shorter sequences being "better").
The whole thing seems obviously amenable to gradient based optimization and data augmentation with synthetic code generators. It is surprising that no one is pursuing such approaches to improving the optimization pipeline in kernel compilation/fusion/optimization because it is just another symbol game with much better defined metrics than natural language models.
i don't know why people downvote, but writing highly performant gpu code across multiple languages is still in the realm of only a few people with a lot of the right experience can do well, and while ai can help assist those people it's not a problem that can be fully solved by an ai at this moment, maybe a few years with a large feedback loop of iterating, testing, benchmarking, repeating. i guess one day but not now
If the bottleneck is writing performant code then seems like that's the first thing AI companies should solve with AI. If that's solved then building applications on top of that foundation is very easy. Are there any companies working on this problem?
HN keeps throttling after 2 comments. Maybe they need more AI to detect bots instead of whatever algorithm they are using at the moment.
Back to the question at hand. How exactly do AI companies plan to build AGI if they can not optimize the AI development loop with their tools and techniques?
I would have to guess it has something to do with that task not actually being suitable for language models at their current stage. Even if they could be trusted to perform the task, its actually not that much work to just... write code to handle keeping this kind of thing in sync. It's really really not that much more work. You really don't even need to do it, both training and inference can be done within PyTorch or in C++.
If it was necessary for some reason... Running a language model to keep something like this is sync over long term training and iteration would likely be more expensive than a developer's time AND block the researcher in a verification loop on the output which still probably needs to be checked by the developer (they could be the same person which will just deepen the frustration they experience).
The use of a lot of garbage accounts in this thread and lack of model details also looks pretty shady...
I'm confused. If these tools aren't good enough for AI research then why would they be good enough for consumer applications? If language models can not help with the AI development loop then the technology is not going to be useful for consumer use cases. Code can be very easily verified by linters and type systems so the problem of verification is much simpler than in consumer use cases without linters and type systems.
>Code can be very easily verified by linters and type systems so the problem of verification is much simpler than in consumer use cases without linters and type systems.
you are confusing (syntactic) validation from verification. verifying code is an incredibly hard problem.
You can get a lot of value out of a models even if they are not capable of AI development because most people aren't doing things that are as complicated as AI development.
I don't think AI development is complicated. It's just a bunch of functions with parameters which are optimized with gradient descent. The AI development loop is extremely simple and most AI "research" is basically stacking standard tensor operations and seeing what works. It's surprising there is no company that is applying AI to AI development since it is essentially a game with symbols and very well defined measurable outcomes.
I don't have an opinion. I am legitimately surprised that very easy problems in AI research have not already been solved with some foundation model. Translating and optimization of code from one formal language to another seems like a very obvious application of AI and yet most of the work is still done manually.
I don't think you read what I said, or you don't know what you're talking about. Language models are actively being researched to be useful, but as of right now _they are not ready, capable, or trustworthy enough_ to perform tasks without supervision, they're especially bad at complex code and code that doesn't have many examples in the internet corpuses such as optimized CUDA kernels...
I am an AI novice but why can't they automated this with AI? I thought the whole point of these tools was to automated tasks that are error prone and require lots of attention to details. Computers are great at that kind of stuff so it's surprising they haven't applied AI techniques to automate parts of the AI pipeline like converting code from Python to C++.