Open source LLM with 32k Context Length

alsodumb · on Aug 24, 2023

Abacus always seemed to me like a 'we got a lot of VC money with inflated claims now we gotta show we do everything' company. I don't really understand what they do, they seem to offer everything but I don't see anyone talking about using their offerings in the real-world. Ever. The only time I see mentions of the company are when I am targeted with ads or promoted posts of the founder.

woadwarrior01 · on Aug 24, 2023

Their CEO made a post[1] on Twitter claiming to have invented "the world's first commercially usable 32K long-context open-source LLM", which IMO is pure hyperbole.

It looks like the first OSS 13B Llama 2 based 32k token context model[2], but the first OSS and commercially usable 32k token context model was a 7B Llama 2 based model[3] from Together AI, who beat them by about a week[4].

[1]: https://twitter.com/bindureddy/status/1694126931174977906

[2]: https://huggingface.co/abacusai/Giraffe-v2-13b-32k

[3]: https://huggingface.co/togethercomputer/Llama-2-7B-32K-Instr...

[4]: https://twitter.com/togethercompute/status/16925744231638470...

weinzierl · on Aug 24, 2023

This is just another fine-tuned LLaMA and Llama 2, like there are already some. I doubt that this will give seriously meaningful results for long context inference.

32k context length sounds nice of course, and it seems to be common to call the just fine-tuned models like that. I think it is more of a marketing thing and we really should distinguish between the context length of the pre-trained model and the fine-tuned model, with the latter being the default meaning of context length.

kordlessagain · on Aug 24, 2023

These 800 watt speakers are great. So loud.

smcleod · on Aug 24, 2023

“Better than lossless”

supermatt · on Aug 24, 2023

It seems this is built on LLAMA. Did meta change the license to make it open source now? It still seems to be showing otherwise in the repo.

Edit: No mention of it being open source in the linked article. Maybe the title here is just wrong? @dang

sillysaurusx · on Aug 24, 2023

It’s not possible to have a license over an ML model trained on other peoples’ works, since such models are uncopyrightable. They’re more like a phone book; a collection of facts trained by an entirely un-creative process. https://news.ycombinator.com/item?id=36691050

This hasn’t been proven in court, but it seems the most likely outcome.

nemoniac · on Aug 24, 2023

Not saying that this applies to LLMs but if you describe them as "a collection of facts [collected and] trained by an entirely un-creative process" then it begins to sound like one could argue for Database Right.

https://en.wikipedia.org/wiki/Database_right

supermatt · on Aug 24, 2023

And yet meta are specifying a license, implying they do hold the copyright.

sillysaurusx · on Aug 24, 2023

They’re mistaken.

It’ll take awhile for this mistake to be reconciled in court though.

inciampati · on Aug 24, 2023

Keep working on it! A good court case and a landmark decision about this could change the landscape for the market for these models.

ImprobableTruth · on Aug 24, 2023

Llama 2 is open source-ish. Weights are freely available and can be commercially used, but only if you have less than 700m users and agree to some "don't do naughty things" terms.

jmiskovic · on Aug 24, 2023

Nope. It's limited by 700m monthly-active users at the time Llama2 was released, a weird catch clause for a handful of Meta competitor companies. The license doesn't satisfy OSS requirements, but it is quite reasonable.

https://ai.meta.com/llama/license/ https://ai.meta.com/llama/use-policy/

supermatt · on Aug 24, 2023

If the JSON license isn't considered open, due to requiring that "The Software shall be used for Good, not Evil.", then I don't see how tacking an additional financial threshold onto it makes it more open. I don't think meta even released the training dataset, so you cant even replicate it (should you have the funds to do so).

There are other LLMs that don't have such restrictions, and publish their training data.

satvikpendem · on Aug 24, 2023

There is no open source-ish. It either protects the fundamental freedoms or it...doesn't.

jmiskovic · on Aug 24, 2023

Open source is both a colloquial term for available/modifiable/distributable code (like Llama2), and a strict OSI-approved list of licenses. I'd say opensource-ish is a great fit here.

Edit: this is in fact fairly interesting discussion because LLM is a new breed of digital products. Meta's terms are practical for limiting the usage for commercial applications, and they are designed to protect the general population. It's not the worn out "protecting us from ourselves", its actually preventing Llama users from harming non-users. Yes, we can be jaded and say it's about protecting the brand and dissociating from bad actors. My point is that it's hard to apply usual arguments for open source and freedom of computing, when you're defending rights of people who want to harm other people.

pastage · on Aug 24, 2023

Sure there is, BSD is bad because xyz, GPL is bad because zyx. That said Llama restrictions are rather harsh and you are not allowed to improve other models with it. So no freedom there just some ok beer.

keyle · on Aug 24, 2023

"open source-ish" sounds like the perfect way to massively profitable future litigations.

Also "don't do naughty things", is there a chart for that? How is that defined, is it part of the non-existing license?

yreg · on Aug 24, 2023

> How is that defined, is it part of the non-existing license?

What do you mean? https://github.com/facebookresearch/llama/blob/main/LICENSE

Havoc · on Aug 24, 2023

From memory llama 2 license does allow tuned models with suitable credit & license inclusion. The restricted using it to train other models though (a bit like people use gpt4 to generate question/answer pairs to train their models)

vekker · on Aug 24, 2023

It's probably too new for anyone to have integrated this into text-generation-webui / Gradio? I've been looking for a large context LLM (self-hosted or not) for a project, and as a European I unfortunately don't have access to Anthropic's Claude API yet.

qeternity · on Aug 24, 2023

It's just Llama 2 w/ rotary encoding fine tuned to 32k. It should work fine.

syntaxing · on Aug 24, 2023

How high context do you need? There’s a couple 16K models out there now. Some people have their own 32K ones too but the quality vary. It’s worth trying them on huggingface. The easiest way is to track TheBloke’s work to see any new models that come out.

Havoc · on Aug 24, 2023

Does anyone know if larger context lengths are inherent worse at other task?

i.e. all other things being equal is a 8k model better at math than a 32k model

syntaxing · on Aug 24, 2023

There’s a couple models on huggingface that uses NTK/linear RoPE that you can play with. Vicuna and WizardLM both have a 16K context model. The biggest issue is that if you go to really high context, it sometimes does these weird repetitions. But to be fair, I only have tried the quantized models and 13B (highest I can run locally). Not sure if the repetition are an artifact of the rope or quantization or both.

weinzierl · on Aug 24, 2023

They are more resource (time and memory) intensive in training and inference, that is their disadvantage. For a fair comparison you would have to compare a 8k to a 32k pre-trained model with otherwise similar hyper-parameters.

OP is about a 32k sugar-coated Llama 2, so I would expect it be similar in performance to other Llama 2 derivatives.

semi · on Aug 24, 2023

Is the increased resource usage inherent to the model or does it only happen when using the extra context? Like if your workflow currently fits in a 2k model would an 8k model be objectively worse and only worth using once you've filled the context up of a smaller model? Or would it be worth always using an 8k context model and just knowing it will get slower and more resource hungry as your context grows?

Sorry for the random question, I've just been curious about this for a while and unable to find out and you seem knowledgeable about these extended models.

weinzierl · on Aug 24, 2023

The big cost is training with large chunks and you pay that regardless how large the chunks that you feed the model later are. At inference time you only pay what you use.

I think the context length is not a parameter of the model in the sense that it is set to a particular value but it is just the size of the chunks you feed in during training. The model will only ever be able to learn relationships within that length. In that sense it is an implicit property of the model.

At inference time you can well query the model with chunks larger than what it was trained with and it will answer without a blink. You just cannot expect the answers to contain meaningful information beyond the length the model was once trained with.