Abacus always seemed to me like a 'we got a lot of VC money with inflated claims now we gotta show we do everything' company. I don't really understand what they do, they seem to offer everything but I don't see anyone talking about using their offerings in the real-world. Ever. The only time I see mentions of the company are when I am targeted with ads or promoted posts of the founder.
Their CEO made a post[1] on Twitter claiming to have invented "the world's first commercially usable 32K long-context open-source LLM", which IMO is pure hyperbole.
It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].
This is just another fine-tuned LLaMA and Llama 2, like there are already some. I doubt that this will give seriously meaningful results for long context inference.
32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.
It’s not possible to have a license over an ML model trained on other peoples’ works, since such models are uncopyrightable. They’re more like a phone book; a collection of facts trained by an entirely un-creative process. https://news.ycombinator.com/item?id=36691050
This hasn’t been proven in court, but it seems the most likely outcome.
Not saying that this applies to LLMs but if you describe them as "a collection of facts [collected and] trained by an entirely un-creative process" then it begins to sound like one could argue for Database Right.
Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.
Nope. It's limited by 700m monthly-active users at the time Llama2 was released, a weird catch clause for a handful of Meta competitor companies. The license doesn't satisfy OSS requirements, but it is quite reasonable.
If the JSON license isn't considered open, due to requiring that "The Software shall be used for Good, not Evil.", then I don't see how tacking an additional financial threshold onto it makes it more open. I don't think meta even released the training dataset, so you cant even replicate it (should you have the funds to do so).
There are other LLMs that don't have such restrictions, and publish their training data.
Open source is both a colloquial term for available/modifiable/distributable code (like Llama2), and a strict OSI-approved list of licenses. I'd say opensource-ish is a great fit here.
Edit: this is in fact fairly interesting discussion because LLM is a new breed of digital products. Meta's terms are practical for limiting the usage for commercial applications, and they are designed to protect the general population. It's not the worn out "protecting us from ourselves", its actually preventing Llama users from harming non-users. Yes, we can be jaded and say it's about protecting the brand and dissociating from bad actors. My point is that it's hard to apply usual arguments for open source and freedom of computing, when you're defending rights of people who want to harm other people.
Sure there is, BSD is bad because xyz, GPL is bad because zyx. That said Llama restrictions are rather harsh and you are not allowed to improve other models with it. So no freedom there just some ok beer.
From memory llama 2 license does allow tuned models with suitable credit & license inclusion. The restricted using it to train other models though (a bit like people use gpt4 to generate question/answer pairs to train their models)
It's probably too new for anyone to have integrated this into text-generation-webui / Gradio? I've been looking for a large context LLM (self-hosted or not) for a project, and as a European I unfortunately don't have access to Anthropic's Claude API yet.
How high context do you need? There’s a couple 16K models out there now. Some people have their own 32K ones too but the quality vary. It’s worth trying them on huggingface. The easiest way is to track TheBloke’s work to see any new models that come out.
There’s a couple models on huggingface that uses NTK/linear RoPE that you can play with. Vicuna and WizardLM both have a 16K context model. The biggest issue is that if you go to really high context, it sometimes does these weird repetitions. But to be fair, I only have tried the quantized models and 13B (highest I can run locally). Not sure if the repetition are an artifact of the rope or quantization or both.
They are more resource (time and memory) intensive in training and inference, that is their disadvantage. For a fair comparison you would have to compare a 8k to a 32k pre-trained model with otherwise similar hyper-parameters.
OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.
Is the increased resource usage inherent to the model or does it only happen when using the extra context? Like if your workflow currently fits in a 2k model would an 8k model be objectively worse and only worth using once you've filled the context up of a smaller model? Or would it be worth always using an 8k context model and just knowing it will get slower and more resource hungry as your context grows?
Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.
The big cost is training with large chunks and you pay that regardless how large the chunks that you feed the model later are. At inference time you only pay what you use.
I think the context length is not a parameter of the model in the sense that it is set to a particular value but it is just the size of the chunks you feed in during training. The model will only ever be able to learn relationships within that length. In that sense it is an implicit property of the model.
At inference time you can well query the model with chunks larger than what it was trained with and it will answer without a blink. You just cannot expect the answers to contain meaningful information beyond the length the model was once trained with.