That is a decent point, in that it reminds me of a startup that posted on HN a couple of months ago that did background removal from images using AI models. They claimed this was a mature market now where bulk pricing was bringing the cost down to some marginal over the price of compute. I suspect those kinds of models are comparatively small compared to the general intelligence LLMs we are seeing and might reasonably be trainable on 250k clusters. There is likely a universe of low-hanging fruit for those kinds of problems and those who are capable. That is definitely not a market I would want to compete in since once a particular problem is sufficiently solved then it becomes a race to the bottom in cost.
But my (totally amateur and outsider informed) intuition is that the innovative work will still happen at the edge of model size for the next few years. We literally just got the breakthroughs in LLM capabilities around the 30b parameter mark. These capabilities seemed to accelerate with larger models. There appears to be a gulf in the capabilities from 7B to 70B parameter LLMs that makes me not want to bother with LLMs at all unless I can get that higher level performance of the massive models. But even if I did want to play around at 30B or whatever I have to pay 15k-100k.
I think we are just in a weird spot right now where the useful model sizes for a large class of potential applications is at a price point that many engineers will find prohibitively expensive to experiment with on their own.
For the first example, I think that was just due to the specific problem being solved. I can tell you there are a ton of places that aren't yet "solved" yet, and that aren't trivial to solve either. One thing we haven't discussed in this conversation is the data itself, and cleaning up that data. Rad AI probably spent more money on staff cleaning up data than they did on model training. This isn't trivial- for medical grade stuff you need physician data scientists to help out, and that field has only really existed since 2018 (which was the first time the title was listed in any job listing). The reason background removal is "mature" is because it's not that hard of a problem and there's a good amount of data out there.
I also think that you're way off on the second point. I'm not saying that to be rude, because it does seem to be a popular opinion. It's just that if you read papers most people publishing aren't using giant clusters. There's a whole field of people who are finding ways to shrink models down. Once we understand the models we can also optimize them. You see this happen in all sorts of fields beyond "general intelligence"- tasks that used to take entire clusters to run can work on your cell phone now. Optimization is important not just because it opens up more people to work on things, but also because it drops down the costs that these big companies are paying.
Lets think about this in another direction. ML models are based off of how the brain is thought to work. The human brain is capable of quite a bit, but it uses very little power: about 10 watts. It is clearly better optimized than ML models are. That means there's a huge gap we still have to fit on efficiency.
> It's just that if you read papers most people publishing aren't using giant clusters.
There is a massive difference between what is necessary to prove a scientific thesis and what is necessary to run a profitable business. And what do you mean "giant clusters" in this context. What is the average size of the clusters used in ground breaking papers and what is their cost? Is that cost a reasonable amount for a boot-strapped startup to experiment with or are we getting into the territory where only VC backed ventures can even experiment?
> There's a whole field of people who are finding ways to shrink models down
Of course the cost of running models is going to come down. The literal article we are responding to is a major part of that equation. You seem to be making arguments about how the future will be as support for an argument against how the present is.
Presently, hardware costs are insanely high and not coming down soon (as per the article). Presently, useful models for a large set of potential applications require significant cluster sizes. That makes it presently difficult for many engineers to jump in and play around.
My opinion is that the cost has to come down to the point that hobbiest engineers can play with the high-quality LLMs at the model sizes that are most useful. That doesn't imply that there are no model sizes for other use-cases that can't be developed today. It doesn't imply that the price of the hardware and size of the models will not fall. It just implies that dreaming of a business based around a capable LLM means your realistic present day costs are in the 10's of thousands at a minimum.
But my (totally amateur and outsider informed) intuition is that the innovative work will still happen at the edge of model size for the next few years. We literally just got the breakthroughs in LLM capabilities around the 30b parameter mark. These capabilities seemed to accelerate with larger models. There appears to be a gulf in the capabilities from 7B to 70B parameter LLMs that makes me not want to bother with LLMs at all unless I can get that higher level performance of the massive models. But even if I did want to play around at 30B or whatever I have to pay 15k-100k.
I think we are just in a weird spot right now where the useful model sizes for a large class of potential applications is at a price point that many engineers will find prohibitively expensive to experiment with on their own.