Hacker News new | past | comments | ask | show | jobs | submit login

I've always thought it was abundantly clear how to make smaller models perform as well as large models: keep labeling data and build a human-in-the-loop support process to keep it on track.

My perspective is more pessimistic. I think people opt for huge unsupervised models because they believe that tuning a few thousand more input features is easier than labeling copious amounts of data. Plus (in my experience) supervised models often require a more involved understanding of the math, whereas there's so many NN frameworks that ask very little of the users.




People have tried (and continue to try) that human-in-the-loop data growth. Basically any applied AI company is doing something like that every day, if they're getting their own training data in the course of business. It helps but it won't turn your bag-of-words model into GPT3.

Companies like Google have even spent huge amounts of time and money on enormous labeled datasets -- JFT-300M or something like that for computer vision tasks, as you might guess, ~300M labeled images. It creates value, but it creates more value for larger models with higher capacity.


I "have tried (and continue to try) that human-in-the-loop data growth" to enormous success, bringing logistic regression models to greater than 99% accuracy. And you can chain vectorization strategies to create more input features than simply a bag-of-words, like morphology, shape, etc. We (the software company that I work for) don't need GPT-3, because it is a specialized model geared towards generating human-like text. Most NLP problems are just parsing text for actionable information, and oftentimes, supervised models can be chained to create something far more effective towards your needs than trying to shoehorn a massive general-purpose unsupervised model into a specialized problem.


Supervised models would also require a lot more human labour, and the goal of most machine learning projects is to achieve cost-savings by eliminating human labour.


Up front, yes, but long term, I wholly disagree. A model that performs at 95% or higher will assuredly eliminate human work, no matter how many interns you enlist to label the data.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: