Hacker News new | past | comments | ask | show | jobs | submit login

This was something we thought about a bit at Rad AI.

I think one of the problems this paper ignores isn't whether you can get a general purpose model to beat a special purpose one, but whether it's really worth it (outside of the academic sense).

Accuracy is obviously important, and anything that sacrifices accuracy in a medical areas is potentially dangerous. So everything I'm about to say assumes that accuracy remains the same or improves for a special purpose model (and in general that is the case- papers such as this talk about shrinking models as a way of improvement without sacrificing accuracy: https://arxiv.org/abs/1803.03635).

All things being equal for accuracy, the model that performs either the fastest or the cheapest is going to win. For both of these cases one of the easiest ways to accomplish the goal is to use a smaller model. Specialized models are just about always smaller. Lower latency on requests and less energy usage per query are both big wins that affect the economics of the system.

There are other benefits as well. It's much easier to experiment and compare specialized models, and there's less area for errors to leak in.

So even if it's possible to get a model like GPT-4 to work as well as a specialized model, if you're actually putting something into production and you have the data it almost always makes sense to consider a specialized model.




That's interesting, I'm currently working on an idea that assumes the opposite. Having built specialized models for years, the cost of having a data science team clean the data and build a model is pretty high, and it can take quite a while (especially if part of the project is setting up the data collection).

For prototyping and for smaller use cases, it makes a lot of sense to use a much more general model. Obviously this doesn't apply to things like medicine, etc. But for much more general things like: check if someone is on the train-tracks, or number of people currently queuing in a certain area, or if there's a fight in a stadium; I think multi-modal models are going to take over. Not because they're efficient, or particularly fast; but because it'll be quick to implement, test, and iterate on.

The cost of building a specialized model, and keeping it up to date, will far exceed the cost of an LVM in most niche use cases.


I think it depends on how much you expect your model to be used, and how quickly the model needs to react. The higher either of those becomes the more likely you'll want to specialize.

If you expect your model to be used a lot, and you don't have a way to distribute that pain (for instance, having a mobile app run the model locally on people's phones instead of remotely on your data centers) then it ends up being a cost balancing method. A single DGX machine with 8 GPUs is going to cost you about the same as a single engineer would. If cutting the model size down means you can reduce your number of machines that makes increasing headcount easier. The nice thing about data cleaning is that it's also an investment- you can keep using most data for a long time afterwords, and if you're smart then you're building automated techniques for cleaning that can be applied to new data coming in.


I'm curious to know what Rad AI ended up doing? IIRC the initial problem was how do you turn this set of radiology notes into some summary radiology notes, with a specific format. Is that right?

If you were approaching this problem anew today, you'd probably try with GPT-4 and Claude, and then see what you could achieve by finetuning GPT-3.5.

And, yes, for a given level of quality, the finetuned GPT-3.5 will likely be cheaper than the GPT-4 version. But for radiology notes, perhaps you'd be happy to pay 10x per even if it were to give only a tiny improvement?


I guess a question to ask is "What is GPT-4". Is it the algorithm, the weights, the data, or a combination of them all?

To put it another way, the researchers at Rad AI consumed every paper that was out there including very cutting edge stuff. This included reimplimenting GPT-2 in house, as well as many other systems. However, we didn't have the same data that was used by OpenAI. We also didn't have their hyperparameters (and since our data was different it's not a guarantee that those would have been the best ones anyways).

So with that in mind it's possible that Rad AI could today be using their own in house GPT-4, but specialized with their radiology data. In other words them using a specialized model, and them using GPT-4, wouldn't be contradictory.

I do want to toss out a disclaimer that I left there in 2021, so I have no insights into their current setup other than what's publicly released. However I have no reason to believe they aren't still doing cutting edge work and building out custom stuff taking advantage of the latest papers and techniques.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: