Must say I'm genuinely thankful for what Meta has done on this front. Without their research release of llama I think things would have been substantially less democratic.
This could easily have gone into a "only orgs with billions can play" direction and nobody even trying in the learned helplessness sense. Instead we're ending up in a hybrid "ok maybe can't quite train from scratch but can still tinker" space which is a lot more healthy
If they want to double down on that then I applaud them
Facebook isn't doing this out of the kindness of their hearts, it just makes business sense for them to do it. AI seems to be in a smiling curve situation right now, where only Nvidia on the hardware side and the consumer facing products using AI as a feature are making money. The companies training the models and trying to sell the API seem like they'll have a hard time not being replaceable commodities
Zuckerberg talked about this in his interview with Lex Friedman.
From what I understand they already benefited by the OSS work on quantization and they see themselves as well positioned to benefit from a world where there's a bunch of specialized AI models/assistants.
Does anyone remember Facebook M? I believe that was their first big foray into creating an intelligent chatbot/personal assistant. Although that may not have anything to do with LLaMA directly - it’s still pretty cool to see that a crazy vision like that is so close to becoming reality.
I am genuinely excited about the positive impacts LLMs and their future derivatives can have in computing. We can now, for the first time, truly “program” a computer in natural language. It’s even intelligent enough to “fill in the gaps” using general intelligence. Just don’t rely on it for any niche topics without teaching it a thing or two, or you’ll get bull crap back.
Nope wasn't leaked...somehow media latched on to that wording.
Initially it was behind a consent form for research purposes. i.e. Just give some basic deets and you get access to the weights under a non-commercial license. FB shut that down after it got lots of attention.
That obvious got copy pasted onto a torrent & grew legs from there. And FB hassled some people DCMA takedowns too but seemed pretty half hearted & was too late at that stage.
[Sidequest: I believe the repo they used to distribute access also had a magnetic link in it too at one point which further confused the narrative but not 100% sure on the precise details on this]
Point is at no stage was this 100% behind closed doors and someone leaked it as you & I would understand the word in the "stolen" sense.
Legally it is still restricted to those who have been granted permission for research / non-commercial purposes. That license still applies. The fact that they have stopped actively enforcing it does not change the legal status. If Elon were to announce that Twitter was using it for commercial purposes, they could sue in court and would likely win.
I see. Thanks for explaining. Yeah I guess that re-distribution was a little rogue and could perhaps be called a leak. I personally dislike that interpretation but I can see it.
You can tell by Mark's wording and body language when he talks about it in the recent Lex Fridman episode. I got the impression from him that he would have released it in a manner closer to that of open source if there wasn't a question of legal liability.
Why couldn't they distribute it? They clearly could have. There's no law against it.
Perhaps you meant that they were nervous about companies using it commercially and either bringing them bad press or making money off their work? That's clearly why they only released it for researchers.
Don’t forget what your dealing with here: The faceless, amoral, infinitely ravenous, maw of the most efficient personal data succubus in history. Make no mistake this is something like “goodwill capture” instead of “regulatory capture.”
I see no way that this diminishes Meta’s power in any way - arguably it strengthens it by making it easier to choose a Meta architecture instead of creating a competing FOSS architecture.
So arguably all this does is raise the FOSS bar technically further entrench Meta - AND with the most important thing, having thousands of developers prime their data architectures for Meta models to eventually serve from a Meta account.
And once it’s widespread enough to lock you in, those commercial terms, whoops they changed!
A false dilemma, also referred to as false dichotomy or false binary, is an informal fallacy based on a premise that erroneously limits what options are available.[1]
These models cost millions to train. The only reason open-source LLMs have a heartbeat is they’re standing on Meta’s weights. The only third path is a public option.
> The only reason open-source LLMs have a heartbeat is they’re standing on Meta’s weights.
Not necessarily.
RWKV, for example, is a different architecture that wasn't based on Facebook's weights whatsoever. I don't know where BlinkDL (the author) got the training data, but they seem to have done everything mostly independently otherwise.
disclaimer: I've been doing a lot of work lately on an implementation of CPU inference for this model, so I'm obviously somewhat biased since this is the model I have the most experience in.
My personal bet is specialised models have a niche. Do you think one of these could compete with GPT if e.g. trained on a law firm’s correspondence and contracts?
Probably not, honestly—because it's an RNN, old information gradually deteriorates as new information is fed into the model, which is undesirable compared to e.g. transformers that can reference any part of the context without degradation, but have a hard limit on context size (RWKV can ingest a theoretically infinite number of tokens, but after around 16k it will start to degrade into madness until restarted, so practically it does sort of have a limit).
(The reason why it degrades is because a single internal state is updated in-place per token, and the currently models have only been trained with up to 8192 tokens of context, so once you start getting double past that or so, the state starts to diverge from "sanity", with no known way to correct this. And then priming a new instance of the model with 8192 tokens or so of the new context takes a really long time because you can't compute the next token of an RNN until you also have the previous one!)
With some fine-tuning (which, even that is ... still out of reach for most people unfortunately, but I digress) it can be turned into a pretty good chat model, generate story completions, generate boilerplate code etc. and the base model is reasonably okay at most of these things already.
I think it's definitely a competitor in some areas, though I don't remember if there have already been benchmarks putting it up against the other models. I do know that it's better than the majority of other open-source models, including transformer-based ones, but this is probably more the fault of training data than architecture.
It is interesting how “catastrophic forgetting” is subtly different technically between these large corpus LLMs and say a CNN, but the basic “the sequences you are looking for are not here” is the same.
oh, you said trained. If trained, then the long context length issue may not be as severe. It might still go mad if you let it eat too much of a hundred-page lawsuit, but if you work with portions of it (like how transformers work), RWKV can be vastly more economical than the larger models (requiring a much less powerful GPU, or even running on no GPU at all, thanks to rwkv.cpp).
rwkv.cpp in particular depends on a project that would not have existed in its current form without LLaMA, even though the project itself isn't LLaMA-specific. However there are enough other implementations of CPU inference (at least two?) that I think RWKV could still exist even if LLaMA had never.
It kicks Google, a competitor for advertising dollars. Some people feel Google is under existential threat from AI (trawling through search results full of spam and ads sucks when an AI can just tell you the answer), by allowing people to build various forms of Google competitor without doing the hard lifting of creating the LLM.
It kicks OpenAI, too, though Microsoft is perhaps less obviously a competitor to Meta right now. But Microsoft has OpenAI, loads of money, loads of engineers, and lots of product lines, so they might leverage OpenAI's tech lead to _become_ more of a competitor to Meta. It's less of a risk to Meta if OpenAI doesn't have a tech lead anymore.
This is aimed squarely at OpenAI (and to a much lesser degree, Anthropic). Google is their own worst enemy in this space precisely because they are terrified of doing anything that will cannibalize their search business and the ad market built around it.
It also makes Meta more attractive for top research talent, because researchers _really_ like to publish and get credit for their work. As OpenAI and others batten down the hatches, this could give Meta an advantage.
How will ai affect facebook? The number of fake news posts, even by ordinary people putting themselves in photos, etc, is going to really gut the platform of its value, won’t it? What about when the flood of ai images flood instagram? It’s going to be a weird decade or two for them
It is certainly strategic to make their AI the most accessible platform for building with AI. Plus the reach of their social networks can put AI improvements that are made elsewhere into their own models and then into their software and to end users. If company A finds a way to use Model X, then that is more easily usable by Meta — they know the model quite well I would assume. Meta’s business thrives on free usage by billions of users, it needs people to keep using its platform to survive, and not leave the networks. Maybe Google is the nearest competitor in terms of ads business being so financially vital.
Given the choice between equal models from Meta and the group that released Falcon, initially with a super shady royalty license that they then open sourced, and that nobody had ever heard of before, I'd personally go for meta.
Of course, variety is good and I hope the UAE group continues to establish themselves as a credible model provider.
I'm surprised this opinion still persists. Royalty-based licenses have been used by major game engines [0] for a long time, so that's not unprecedented.
This isn't the first time I've seen this brought up. It's irrelevant here for so many reasons. It it unprecedented in ML models, the model was promoted as open source, the terms were absurd (10% for anything related to it plus some reporting requirements, for a foundation model that's not even tuned to anything).
In any event, bringing shitty practices from another industry into ML doesn't seem worth supporting.
> In any event, bringing shitty practices from another industry into ML doesn't seem worth supporting.
Why is this still an issue for you?
All players made licensing blunders in the past and the fine folks behind falcon seem to have learned from their mistake by releasing their weights under Apache 2.0, a well understood and respected permissive license.
Many major open source projects started as proprietary software that eventually went opensource. Why hold a grudge against this project specifically? Yes, they made a mistake and learned from it. What more do you want?
See. When you have to convince corporate lawyers and security folks that whole switching licenses makes them uneasy. BD and legal are much happier to deal with Meta than the UAE.
People relicense software all the time and the lawyers are usually fine with it (especially when the terms are more favorable). What am I missing here?
> It's not only free as in beer, but free as in libre
If by it you mean LLaMA 1, then I don't think per the license one can use it for commercial projects. So, it isn't really libre. That said, all indications are that LLaMA 2 would be fully FLOSS.
I love Meta. I thought they had turned to the dark side for the longest time. I've never been so wrong. When Meta released LLaMA it changed my life. I've never seen a company move more leverage to the edge at once. They must be taking notice of how much it's made folks like me adore them. So now we're getting commercial friendly models too? I didn't know Christmas could come twice in one year.
Meta is so big that I'm not even sure it's valid to refer to them as a whole. What I like in this context is Meta's ML research department, but to me, Meta as a whole is still the same old privacy violating, dopamine inducing, teenage deprecion and suicide causing company I've alway known.
When LLaMA came out, I dropped everything I was doing to work on it. It's given me new hope for technological progress. Think about it. What were people focusing on before LLMs came out? The frontiers in software were cryptocurrency and wasm. Now we have something I can believe in and thanks to Facebook I'm able to actually use it on my own. I also got to feel like I was a part of its development when I changed llama.cpp to reduce its memory use by 2x and enable running multiple models in parallel.
Is there any information about when the model will be available or what it is capable of? Anything like a position on a leaderboard or a score related to reasoning or code generation?
Will it be able to run on AMD's new MI300X? I keep hoping that will put a "chip" in Nvidia's dominance since it seems more efficient.
I am not sure businesses would trust Meta with their sensitive data. Facebook/Meta brand equity in the B2B domain is severely tarnished, with their long history of lack of respect to data privacy laws and lack of transparency.
I just don’t see how Meta could possibly turn into a success in the B2B software business. They are a great advertising company, but they’ve never been successful in their other ventures…
Honestly this feels like a good reason for them to open source it. You trust the local model and may try using it instead of one of their competitors like OpenAI. If they took LLAMA in the state they released it in and hosted inference, nobody would use it due to the lower quality and data privacy issues.
This could easily have gone into a "only orgs with billions can play" direction and nobody even trying in the learned helplessness sense. Instead we're ending up in a hybrid "ok maybe can't quite train from scratch but can still tinker" space which is a lot more healthy
If they want to double down on that then I applaud them