You do realize how possible it is to fine tune a task like this (along with a hundred others in a similar vein) on a tiny model you can scale on your own hardware?
I've run hundreds of millions (150m so far in a couple of weeks of non-continuous running as I tweaked things) of tokens through my 2x 3090 with a 13b llama2 model I fine tuned on tasks like: summary, knowledge graph generation, writing using the knowledge graph, grammar, spelling, and transcription correction, etc.
This type of stuff is going to be done at scale with a modest budget if you have the skills to tune more efficient and faster models to your use cases.
It's even easier than that. There's no need to even fine tune an LLMs to do it. Here's a screenshot[1] of a 4 bit quantised version of an off the shelf open LLM (WizardLM 13B v1.2) doing it on my Mac.
How does one efficiently learn how to do such things, and what kinds of problems such approaches are fruitful for?
I find there to be a giant gap in learning about this stuff between material that boils down to "use magic words and system prompts to improve results from one of the big models" and "how do LLMs work from first principles".
I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.
So I described my approach to how I fine tune a specific task below to another user, but I'll copy it here:
> Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses.
> Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.
> Measure the quality of your model vs OpenAI for the same inputs, and then swap out the model in your workflow once happy with the results.
If you do this, be careful how/if you publish your weights trained on OpenAI output as if they look into how it was generated and it becomes clear you broke the ToS, they'll most likely ban you from the platform.
You train your model, publish it on huggingface and then write in the README:
> This is how I made this model: Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses. > Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.
If you're looking for a practical guide to getting started with fine tuning, I wrote one a couple of months ago that got pretty popular here on HN. Might be helpful if you're interested in playing around with it! https://news.ycombinator.com/item?id=37484135
The industry term for that middle ground is a “moat”, and the people who are most familiar with it are getting paid for what they know, so they’re not giving it away.
I think that may be right, but if so, that seems pretty unusual to me.
I've gone through a few of these "new kinds of software becoming useful" transition periods - most notably applications moving to the web, and then native smart phone applications - and in none of those transitions was there a dearth of resources on how to spin up on doing useful things due to this "moat" concern.
Nobody was protecting their iphone app dev moat by not publishing books and training courses on Objective-C and XCode...
> I still haven't found a great resource that covers this middle ground, which seems to me to be where a lot of the power of these approaches is going to reside.
I think this is the disconnect: It doesn't strike me that what I'm talking about has anything to do with "papers". So from your comment, I'm once again left wondering what you mean.
My sense is that I have a much better grasp of the foundational material here, having read in depth books and papers about that, but still can't quite wrap my head around the question of how people are actually "operationalizing" this into useful software.
But to your point about experimentation, it might just be the kind of thing where there is no path to enlightenment besides working on a project and running into and overcoming all the hurdles along the way.
But not at webscale. It's fine if you want to summarize something for personal use. The size model you're talking about is still way too large if you're trying to harvest millions of e-mail addresses from billions of webpages.
I'm also looking forward to what Apple Mail and other local clients are able to do. My laptop's CPU is idle most of the time, why not use that extra CPU time to do something cool like filter spam better?
Microsoft already does that, and its Antimalware agent is the bane of my existence. It will see idle machines spin up their fans to full and drain batteries within a short few hours. No thank you!
When plugged into the grid, it makes sense to spend a few cents of energy a day to filter out unwanted solicitations, harassment that you may not want to see, scam emails or texts, etc.
If I didn't have to worry about my grandparents getting scammed because they were having 99.99% of it effectively filtered or warned about at one layer or another before it actually became a problem...can you imagine how much you could lower that type of fraud/abuse?
That would cost money and lower the profits of the people that own/control the grid. I sometimes wonder how much money these robber barons spend on lobbying and other PR campaigns to convince that climate change isn't a problem and that the grid is just fine. It's one of those unanswerable questions I'm sure, but how much progress could be made by redirecting that amount of money to actually improving the grid itself?
That was actually part of how I intended my "negotiated" to be taken. Part of it is a monetary negotiation where we invest more in base load and peak load, but some of it needs to be in the ability for the grid to request to shed load and the devices react accordingly.
the power cord has to have a data link (USB? or just networking over power line itself) through which the power outlet can tell the computer how much does the energy cost at any given time. this is a very welcome but very expensive addition to the infrastructure.
my wall outlet supplies power from 3 different sources: grid, solar on the roof and/or powerwall, depending on the weather, grid status (which sucks where I live) and time of day. computer only knows time of day off the bat, everything else it has to learn in a complex way.
if I have a "cost" or, better, "status" information integrated into the power itself, smart appliances (like computer) can make decisions what they can or can not run. right now I can start training my models on my 4090 at night, we have an outage, and 4090 will happily drain the powerwall, so I'll not have an A/C in the morning. models can wait, they're stupid anyway, or at least I like the A/C better.
My guess is you wouldn't lower it by much because there's more incentives for attackers than for defenders to invest in these approaches, so it's likely that by the time grandmas are running LLM-based anti-fraud tooling the attackers will already be running LLM-based attacks as well.
You don't need a "model" for this - I remember a Coursera course on ML I did some years ago, and one of the exercises was email extraction. With some very basic algorithms, nothing more than a bunch of common python libraries and couple of days of work, it's possible to extract over 90% of emails with commonly used tricks. I'm not sure the remaining number is worth making more complicated models for it - the returns are quickly diminishing, and wasting time on spamming people who are clever enough to invent their own unique email hiding technique probably doesn't have a good ROI anyway.
It's entirely possible without OpenAI doing anything else. Design your tasks to be repeatable and small steps, call the OpenAI API and log all requests/responses.
Filter out any bad responses and take a representative sample of the data you have collected from OpenAI,and train a Mistral or Llama2 model with the request/response pairs.
Measure the quality of your model vs OpenAI for the same inputs, and then swap out the model in your workflow once happy with the results.
I've run hundreds of millions (150m so far in a couple of weeks of non-continuous running as I tweaked things) of tokens through my 2x 3090 with a 13b llama2 model I fine tuned on tasks like: summary, knowledge graph generation, writing using the knowledge graph, grammar, spelling, and transcription correction, etc.
This type of stuff is going to be done at scale with a modest budget if you have the skills to tune more efficient and faster models to your use cases.