Open source implementation for LLaMA-based ChatGPT

amrb · on Feb 27, 2023

I would care more about LLaMA architecture when I can get hands on, honestly this project is more interesting and lighting fast on even a 2060 laptop https://github.com/BlinkDL/RWKV-LM

la64710 · on Feb 27, 2023

News flash : filling a google form to get access to the model based on Meta’s discretion cannot be called open source .

MH15 · on Feb 27, 2023

And yet they're trying real hard to.

holtkam2 · on Feb 27, 2023

Anyone know of the best way to get approved? I’m not a academic researcher but I really want to start experimenting…

flangola7 · on Feb 28, 2023

If you can tell them which university you're with and the formal name of your research project that would be a start. They'll reach out to the admin to confirm.

nl · on Feb 27, 2023

> I would care more about LLaMA architecture when I can get hands on

The architecture is available just fine, both in the paper[1] and in code: https://github.com/facebookresearch/llama/blob/main/llama/mo...

It's the model weights that you can't get as easily.

[1] https://research.facebook.com/publications/llama-open-and-ef...

dwheeler · on Feb 27, 2023

The nebullvm projrct and specifically chatllama, by themselves, are clearly OSS. They're licensed under Apache-2.0: https://github.com/nebuly-ai/nebullvm/tree/main/apps/acceler...

However, many models including LLaMA are not OSS (they have terms such as "non-commercial use only"). And you need models.

It can be confusing, it's important to try to make that clear.

UncleOxidant · on Feb 27, 2023

The README there doesn't seem too coherent... how does one actually get that running? The Quick Start section isn't very helpful.

lachlan_gray · on Feb 27, 2023

Has anyone tried this on an m1 machine?

tmptmptmp1 · on Feb 27, 2023

Do you have access to the weights? If so you probably have better ML hardware. Wish this model was actually "open".

The perf of a model of that size on the M1 will not be good. That is big enough it won't quite fit on a 3090 (24GB) without quantization.

pumanoir · on Feb 27, 2023

I think is feasible. The description even says is designed to save on vram[1]. I don't get the other comments about needing more vram than a 3090.

Also, Neuralmagic may run their sparsification on ARM cpu's in the future, so keep an eye.

1. ChatRWKV v2: with "stream" and "split" strategies. 3G VRAM is enough to run RWKV 14B :)

IanCal · on Feb 27, 2023

You have to split it up which slows it down a lot. The 14B model doesn't fit fully on a 3090, though the 7B fits easily and is very fast. Other replies either may have meant this or thought the original comment was about llama.

fswd · on Feb 27, 2023

I've ran it on a AMD 3950 which I think is half the speed of a M1, and it's plenty fast. Note I am specifically talked about RWKV

nl · on Feb 28, 2023

It would be interesting to see a version of RWKV[1] that takes some of the improvements in LLaMA (eg the SwiGLU activation function and the Rotary Embeddings - although I think they have tried rotary embeddings in some versions of RWKV) as well as the same dataset and see how it does.

The dataset is interesting. It's not dissimllar to The Pile, which RWKV is already trained on, but does seem to have quite a lot more preprocessing to increase the dataset quality.

[1] https://github.com/BlinkDL/RWKV-LM

pmarreck · on Feb 27, 2023

very interesting. Do you know of anything that would take advantage of the 128 cores of my Ryzen Threadripper even though I have a 2080 and a 3080 as well? (Or all three... lol)

vorticalbox · on Feb 27, 2023

GPU is massively faster than CPUs for this sort of work load.

Even if you could use all 128 cores I would still get the 2080 would be faster

zamalek · on Feb 27, 2023

This is not strictly true. GPUs fare better at these tasks for a few reasons:

* The largest contributor is the sheer number of cores.

* The latency between cores and memory.

* FP16 instructions.

128 cores isn't an insignificant fraction of the number of cores on a 1050 (about 600), but CPU cores are individually more powerful. That advantage is potentially difficult to call. The top-end of Genoa has 96/192 cores, and you can slot many of them onto a single board.

AMD is throwing more and more memory into the CPU cache. That's very different to a direct path to GBs of HBM, but at some point the difference in performance might not matter to a novice/dabbler.

I would say give it a go, see what happens.

zackmorris · on Feb 27, 2023

Keep in mind that many of the GPU cores go unused, since they are dedicated to geometry or ray tracing or whatnot:

https://www.pcworld.com/article/402592/nvidia-turing-gpu-gef...

IMHO these chips designed for specialized workloads are looking more and more ridiculous. I expect GPU manufacturers to keep dragging them out for another decade or so, as well as Apple as it explores offshoots of M1. It all makes me feel very tired (the ultimate code smell).

A better design would be something like a 256+ core RISC-V with local memories in/by each core for data-locality and a content-addressable caching scheme for deduplication. Copy-on-write languages like Clojure and orchestrating processes under something like Docker would make it a breeze to program, although it would still support manually managed mutability like with Rust for innermost loops in games or whatever. It's fairly obvious how it would all work, but IP law and gatekeeping ensure that it will not happen anytime soon.

Then stuff like CUDA would be just another framework run on a symmetric multiprocessor and we could get back to exploring alternatives like genetic algorithms like we did in the 1990s. Thankfully nobody cares what I think, heck even I'm sick of reading my own complaints, so it's easy enough to just unsee this.

dragonwriter · on Feb 27, 2023

> Keep in mind that many of the GPU cores go unused, since they are dedicated to geometry or ray tracing or whatnot

I mean, “many” is not usually the case; an RTX 3090 has 10,496 CUDA cores, 328 Tensor cores, 82 RT (raytracing) cores, and 96 render output pipelines. ML apps will uses the first and, depending on software, second set. The vast majority of the cores being CUDA cores is the norm.

pmarreck · on Feb 27, 2023

ah ok, that's fair.

pumanoir · on Feb 27, 2023

Look into sparsification of ML models: https://neuralmagic.com/ If your cpu has avx512 it can make it even faster.

qwertox · on Feb 27, 2023

I'm using a Ryzen 9 5950X to run some tests with Whisper (ASR), and since I have no GPU with more than 4 GB VRAM, I'm running it on the CPU. It is slow. It takes between 20 seconds and 2 minutes to transcribe 1 minute of audio, I'm using 8 cores. Adding more cores doesn't seem to improve the inferencing time.

ML is really something which should be left to a GPU.

amrb · on Feb 27, 2023

There is an example to use multi GPU in the link, Outside of this I have read about offloading to cpu/nvme in the case of 100GB+ models that don't fit in VRAM tho this will be at the expense of performance.

https://pypi.org/project/rwkvstic/

shmatt · on Feb 27, 2023

I don't really understand the benchmarking aspect researchers are touting. The public never cared about LLMs until they had a proper conversation with one. You can beat GPT3 at any benchmark you'd like, but if you can't get people that "feeling" when chatting with your model, is it worth anything?

In the future there's going to have to be a way to benchmark the "human-ness" or "intrigue" or "feistiness" of a model to show us if its getting better at what we want

skybrian · on Feb 27, 2023

Often articles show up on Hacker News that are meant for other researchers, not the general public. Not everything is a product.

Benchmarks are used as a way to show that a particular machine learning technique does better at some task. It’s a way for researchers to show they’re making progress that will be legible to other researchers. You can’t publish a paper saying “we tried it out and we think it’s better.”

basch · on Feb 27, 2023

The question is, where does this "human-ness" lie? In the initial neural network, the training data, or the supervised reinforcement?

In theory, a significantly smaller neural network that outputs at nearly as good a quality, should be able to chew through training data, and its re-enforcement process, much faster and cheaper, right? A more generalized, lower parameter model, is almost always preferred, as long as it works?

If the human feeling is all boltonable to the neural net later, there is no reason to discount this component as lacking potential to exceed current models.

visarga · on Feb 27, 2023

I think it's the feedback from RLHF that is mostly responsible, but it only works if the base model is large. Never seen a small model doing good conversation. They can do ok for classification and open book question answering, but generating long form coherent text is hard.

rnosov · on Feb 27, 2023

GPT-3 performed really well on synthetic benchmarks. It was later made palatable for general public consumption. You might say that a LLM needs to be good on synthetic benchmarks first before you can make public facing chatbots with it.

dragonwriter · on Feb 27, 2023

The techniques to go from a basic GPT-like model to a conversational agent are largely published and should be reproducible, open-source base models are starting points for that work that are unencumbered and available.

This is important for researchers and implementers, not (immediately, ay least) end users.

antoineMoPa · on Feb 27, 2023

I guess one technique is to train a model on various language model outputs to classify these as good/bad/intriguing/robot-sounding/repetitive/etc. A human can tag the answers for the training dataset.

Then, we can use this model to compare different LLMs and optimize new models - could be with genetic algorithms or just a human tweaking the model - so that the LLM maximizes whatever we want.

visarga · on Feb 27, 2023

You just rediscovered RLHF - reinforcement learning from human preferences. That's the last stage of training for chatGPT, but uses RL instead of supervised learning.

SakiWatanabe · on Feb 27, 2023

What is the purpose of this? The model from meta is not available to public. Neither this open source "LLaMA-based ChatGPT" nor the "open source" LLaMA can be downloaded or actually used by public because it would required the actual trained model.

swyx · on Feb 27, 2023

also LLaMA was released like 2 days ago - how far in advance did these folks know about it?

vdfs · on Feb 27, 2023

At least 2 days for sure

agolio · on Feb 27, 2023

I'm as much a META hater as anyone - their policies have consistently disappointed me in almost every aspect of their business - but their stance on this LLaMA project I must say I am happy with and seems to mark a turn for the better.

If they follow through on their promise of making the weights available and share source code that is a big step in the right direction for democratising this technology

cuuupid · on Feb 27, 2023

The weights are non commercial and while their code is GPLv3 they’ve only released inference and have removed anything that would give away the training methods :)

alfalfasprout · on Feb 27, 2023

The paper highlights the training methods pretty clearly though including what tweaks they made.

A4ET8a8uTh0 · on Feb 27, 2023

Yeah, but their history ( and accumulated goodwill ) are similar to Microsoft's. They may say it is 'open' or sprinkle appropriately sounding corporate speaking all over the press release, but the actions will, at best, temporarily prevent them from going 'full evil ahead'.

And I like that announcement. I just don't think they will actually follow through on this.

visarga · on Feb 27, 2023

Yann LeCun, who is director of AI research at FB is all over the ML Twitter talking about this model. I think they will actually release the weights.

ilaksh · on Feb 27, 2023

When? Why didn't they release them already?

jerpint · on Feb 28, 2023

Big if. It’s been a few days they’ve released the model, and no one I know (academics) has gotten access to it yet

georgehill · on Feb 27, 2023

For anyone wondering what LLaMA is, here are some useful links.

https://ai.facebook.com/blog/large-language-model-llama-meta...

https://news.ycombinator.com/item?id=34925944

swyx · on Feb 27, 2023

those who know what it is are questioning why you are working on this when the base model itself is not released widely. whats the intention here?

georgehill · on Feb 27, 2023

i am not the author of the repo, i came across this on github.

https://twitter.com/nebuly_ai is the author of the repo.

vid · on Feb 27, 2023

I am very far from an expert on this, but I think domain specific conversational AI would be much more useful than these large models. It's fun to ask an AI to compose a fresh 600bpm hip hop song about the relationship between materials science and the breeding habits of mosquitoes, but an open-source medical AI, application support AI, or many other applications would be much more practical, if they could be accurate enough. And especially if they could run "standalone." They could also consult with each other, as a network of specialized AI. Is work inching closer to more specific, more accurate applications? Or is this just a big gimmick/distraction phase around a maybe not so great idea of AI?

og_kalu · on Feb 27, 2023

Smaller models are typically dumber. Sure you could fine-tune a smaller model on say the medical domain and they might even perform better on some benchmarks but they won't reason or generalize as well. domain finetuned large models >>> domain finetuned small models. And because competence in one area bleeds over to other areas, you often need much less domain specific data to finetune on compared to the smaller models.

You can see instances of this with Minerva, where the finetuned 540b version beats the finetuned 62b version despite being finetuned on only a quarter of the data the 62b version was finetuned on.

muttled · on Feb 27, 2023

They're claiming the 13b model beats GPT-3 175b which is an extraordinary claim requiring extraordinary evidence. If that's true, though, it'd be interesting to see if that also applies to fine-tuning. Since the claim is predicated on the 13b model being better trained (amongst other things?), I wonder if limited fine-tuning data handicaps the 13b model even if the base model can outperform GPT-3 Davinci, given your point about large models handling fine-tuning better with limited data.

og_kalu · on Feb 27, 2023

I mean the benchmarks are there. Can't exactly fake that. It should apply to fine-tuning. fine-tuning works off the back of the weights. That's why instruction-finetuned models even of small models like the T5 converge much faster on any additional fine-tuning or training than their non instruct counterparts as per the flan paper. Honestly, what i'm taking from this paper is that even chinchilla is undertrained. 13B was trained on 1T tokens.

PeterisP · on Feb 27, 2023

My opinion is entirely opposite - we need conversational AI for exploring the multitude of possibilities and identify what works, what doesn't work, what is/isn't useful given the obvious limitations on accuracy, truthfulness and interpretability; but once we can identify a specific narrow use case then we can fine-tune a ML system for that isn't conversational but is able to provide better results with whatever domain-specific structure is required (linking to specific sources, including external structured data, providing certainty metrics, filtering results according to domain-specific criteria instead of the conversational political correctness filters which fail some domains, treatment info which was correct but has become outdated, etc, etc) that can be done better in non-conversational systems.

panarky · on Feb 27, 2023

The P in GPT is for "pretrained".

The large pretrained model is a prereq for domain-specific models.

PeterisP · on Feb 27, 2023

Not necessarily, you don't need to make a domain-specific model from a general model, you can definitely make a large pretrained domain-specific model from scratch by training it only on domain-specific data, which can result in a smaller and more efficient model.

Furthermore, when making task-specific models, an 'encoder' architecture (similar to BERT) often works better than a 'decoder' architecture (similar to GPTx), so you might want to use a similar-but-different architecture than the general model intended to be conversational/generative.

panarky · on Feb 27, 2023

If you want to build a domain-specific classifier that determines whether an image is a dog or a cat, and you have 50 labeled images of dogs and cats, it's much better to start with a large model pretrained on millions of images, and then specialize it by training on 50 images of dogs and cats.

Try to start with a NN and 50 images of dogs and cats, and it won't work very well.

PeterisP · on Feb 27, 2023

Sure, that's correct, but that's absolutely unrelated to what we were talking about; your example is about the general concept of transfer learning to task-specific annotated data, not about domain-specific pretrained models.

For example, if you want a domain-specific model for the legal domain, then you can pre-train a large self-supervised model on every single legal-related document in the world you can get your hands on, instead of a general mix of news and fiction and blogs and everything else - and that might be a more efficient starting point for however many(few) annotated examples you have for your task-specific classifier than the general model.

panarky · on Feb 27, 2023

Legal-related documents are a minuscule fraction of the corpus the large model is trained on. The resulting model won't have the conceptual fluency that the large model has. It's like training a human baby with legal briefs and expecting her to be a good lawyer.

Taek · on Feb 27, 2023

These models are only capable of coherent conversation in the first place because they are so large. As soon as you shrink it down to be 'domain specific' its ability to form coherent sentences even within its own domain greatly reduces.

nfgrep · on Feb 27, 2023

The way its typically done, AFAIK, is that you train these big models on a breadth of information, hoping that it picks up on the generalities of the information. In the case of LLMs, things like basic inference, for example. You then take these big, general models and “fine tune” them for specific applications, with specific bits of data. This way, you get things like basic inference, and logic, while still having something that can answer specific questions.

OpenAI offers the ability to fine-tune some of their models: https://platform.openai.com/docs/guides/fine-tuning

There are also other services that will fine-tune an LLM, for a specific domain, for you: https://activechat.ai/build-your-own-chatgpt/

Thebroser · on Feb 27, 2023

There are definitely very solid attempts at least to make LLMs that encode biomedical knowledge such as BioGPT which is trained on Pubmed and other domain specific areas. Source: https://arxiv.org/abs/2210.10341

pmarreck · on Feb 27, 2023

I think you would still need a large model to train on general-purpose knowledge and then train on domain specific things to get the specialized knowledge to be truly useful. For example, without a general-purpose model, if someone wanted domain-specific language "translated" for a neophyte, I doubt it would be able to without having also been trained on a general-purpose dataset

visarga · on Feb 27, 2023

You can already do that. Take your task first to GPT-3 and collect a bunch of outputs. Then fine-tune a small model on them. Works well, but you need to extend the dataset to cover all edge cases because the small model can't draw on the vast knowledge GPT-3 has.

levesque · on Feb 27, 2023

In what way is this a ChatGPT implementation or equivalent? Seems like a chatbot based on a different backend, therefore it has absolutely zero link to ChatGPT.

rnosov · on Feb 27, 2023

It is a different backend but it supposedly should be roughly comparable to ChatGPT. Also, looks like it's both open source and requires a lot less hardware to run and train.

Taek · on Feb 27, 2023

Its not open source until the weights are available. I have the hardware I need to run it but the required files are not available unless you receive special access.

You can't use what has been released unless you want to spend $500,000 on training.

pmarreck · on Feb 27, 2023

With only a modicum of trolling-level here, I wonder what percentage of that training expense was used to identify and avoid "true things that must be muted because they offend someone"

nl · on Feb 28, 2023

Ignoring the subtext of "true things that must be muted because they offend someone", there's a whole section in the paper on how they didn't filter and the problems that causes. TL;DR:

> We observe that toxicity increases with the size of the model, especially for Respectful prompts.

It does outperform GPT3 slightly in terms of observed bias against protected groups (as in it is slightly less biased) but not substantially so.

basch · on Feb 27, 2023

It is an analogy.

ChatGPT:GPT3::ChatLLaMa:LLaMa

levesque · on Feb 27, 2023

It uses a different engine, so this is as related to ChatGPT as a Toyota Corolla is related to a BMW car. This is an efficient and open-source chatbot, which is very good news, but the authors just wrote a clickbait title and they know it.

basch · on Feb 27, 2023

In formal analogies, : is pronounced "is to" and :: is pronounced "as".

The purpose here is to use the relationship from a known, to describe the relationship between a partial known.

ChatGPT is to GPT3 as ChatLLaMa is to LLaMa. It uses the relationship between ChatGPT and GPT3 to extrapolate a relationship between an unknown and LLaMa.

see Analogies.pdf https://resources.finalsite.net/images/v1584287027/brockton/...

Corolla:Toyota::3-Series:BMW. If you had heard of a Corolla, Toyota, and BMW, but not a 3-Series, you now roughly know that a 3-Series is BMWs equivalent to a Corolla.

levesque · on Feb 27, 2023

I think I prefer the other commenter's point, referring to ChatGPT as a known learning paradigm for chatbots. But thanks for the little crash course on analogies ;)

basch · on Feb 27, 2023

Isn't that consistent with, and the same as, my original comment?

ChatGPT:GPT3::ChatLLaMa:LLaMa::Chatbot-through-RLHF:LLM

aka ChatGPT is a known chatbot implementation, and GPT3 and LLaMa are known LLMs.

levesque · on Feb 27, 2023

Yes and no. You wouldn't say GPT to mean large language models or autoregressive language models. I would've thought the same to be true for ChatGPT instead of Chatbots with RL from human feedback (RLHF), perhaps the field is moving towards adopting ChatGPT as a paradigm name. Note that the title doesn't say a ChatGPT-like model based on LLaMa, it says outright opensource implementation of ChatGPT.

basch · on Feb 27, 2023

> You wouldn't say GPT to mean large language models or autoregressive language models.

In the analogy, that’s exactly what you are saying. Identical to Toyota and BMW meaning “the make of the car.”

Maybe reimplementation is a more precise word, a black box re-engineering/cloning. In this case I inferred it by knowing it was a different LLM underneath, and that this group didn’t have access to the chatgpt source code.

f_devd · on Feb 27, 2023

> Toyota Corolla is related to a BMW car.

The analogy is somewhat accurate, but also moot, since within the ML community "ChatGPT" can be used either as the product or the method (more specifically called RLHF) somewhat interchangeably. It's more like Google/Googling, where the largest/most popular provider becomes the defacto way to refer to a method. As someone who develops DL models, the title seems quite apt.

didntreadarticl · on Feb 27, 2023

Have we got any details on the benchmarks that show LLaMa's 13B architecture outperforming GPT-3? Because that seems kindsof fantastical. Is it just a product of a very specific benchmark or does it reflect real world performance?

rafaelero · on Feb 27, 2023

The GPT-3 they are comparing to is the one that was released on 2020. Since then OpenAI made a lot of improvements and nowadays I believe GPT-3.5 is competitive to Palm (540b). Still, LLaMa is in the same tier, with much less parameters.

og_kalu · on Feb 27, 2023

GPT 3.5 is definitely not comparable to Palm. You can see some benchmarks here. https://crfm.stanford.edu/helm/latest/?group=core_scenarios For example, text-davinci-3 is 56.9 on MMLU, Flan-Palm is 75. https://arxiv.org/abs/2210.11416

tikkun · on Feb 27, 2023

Flan-Palm is 75 with five-shot. text-davinci-003 is it 56.9 with five-shot or zero-shot?

og_kalu · on Feb 27, 2023

You're right. Equivalent scenarios, the gap is smaller - about a difference of 10. you can check the end of the flan paper for some equivalent comparisons.

og_kalu · on Feb 27, 2023

Just a heads up with my comparison. Under equivalent scenarios, the gap is smaller. davinci-003 gets about 10 more points using five shot (which is what the palm comp does)

rnosov · on Feb 27, 2023

They list 7B, 13B, 33B, 65B architectures. Presumably, they compare 65b one to GPT-3 175B. Chinchilla model which is about 70B outperformed a much larger GPT-3 model. So not that fantastical.

EDIT: I stand corrected. They do compare 13B model with a large GPT-3 model which is hard to believe without a bit more concrete evidence

rileyphone · on Feb 27, 2023

You're incorrect.

> For example, LLaMA's 13B architecture outperforms GPT-3 despite being 10 times smaller.

Of course this is all on benchmarks but it's a big improvement if true.

visarga · on Feb 27, 2023

This whole debate - if a 13B model can really be as good as GPT3 - would have been settled if we had a live demo. I am not sure their licence allows running public demos, even if you get the weights.

rnosov · on Feb 27, 2023

Looks like they are making ChatGPT clone that would be possible to run a single GPU. HN dream come true!

wcarss · on Feb 27, 2023

ChatGPT's stable diffusion moment?

Taek · on Feb 27, 2023

Can't have a stable diffusion moment if you refuse to release the weights to the general public. Stable diffusion only got to where it is because 10,000 people with otherwise zero reputation were able to play around with the code and models.

LLaMA is still only available to the elite.

sinity · on March 3, 2023

It was released on 4chan recently :)

files_catbox_moe[slash]o8a7xw(dot)torrent

flangola7 · on Feb 28, 2023

It would not be a HN dream for long, given the implications it has for the internet

swyx · on Feb 27, 2023

lol good luck running a 13B model on a single GPU

ddren · on Feb 27, 2023

Seeing the performance of implementations like FlexGen [1], I don't think it would be entirely unreasonable to run a 13B model on a single GPU for personal usage purposes. You are not going to a run a public service out of it, but it probably would be good enough to run your own ChatGPT or Copilot locally.

[1]: https://github.com/FMInference/FlexGen

visarga · on Feb 27, 2023

You need a RTX 3090 24gb

voytec · on Feb 27, 2023

Fake title riding on ChatGPT popularity. I think that it should be updated to something like:

Open source implementation for LLaMA-based chat bot*

Open source implementation for LLaMA-based ChatGPT alternative*

agolio · on Feb 27, 2023

I think the 'alternative' is implied

holtkam2 · on Feb 27, 2023

I don't have a decent gpu at my disposal... has anyone tried to run LLaMA on an EC2 GPU instance? If so, which instance type? (I don't wanna overpay)

alfalfasprout · on Feb 27, 2023

Per the paper the 13B variant runs on a single A100 GPU.

jstsch · on Feb 27, 2023

Are LLaMA's weights generally available/floating around yet?

gavi · on Feb 27, 2023

I think you need to apply for it @Meta

https://docs.google.com/forms/d/e/1FAIpQLSfqNECQnMkycAp2jP4Z...

amrb · on Feb 27, 2023

Nope they pulled a fast one and limited it to academy and government, just put the dam thing on hugging face like everyone else guys..

swyx · on Feb 27, 2023

piratebay, but for model weights

bethecloud · on Feb 27, 2023

open-assistant from LAION is in the process of creating an OSS RLHF dataset for a personal assistant, may be useful for this project

threevox · on Feb 27, 2023

Can someone leak the weights please

vivegi · on March 1, 2023

This obsession with locking up model weights behind a gate-keeping application form and calling it open source is weird. I don't know who the high priests are trying to fool.

If your model is really that good, unleash it into the open so that others can truly evaluate it-warts and all-and help improve it by identifying the flaws.

sinity · on March 3, 2023

> This obsession with locking up model weights behind a gate-keeping application form and calling it open source is weird. I don't know who the high priests are trying to fool.

When they don't do it, people scream at them (see Galactica)

"Journalists" react like this:

> On November 15 Meta unveiled a new large language model called Galactica, designed to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday the company took down the public demo that it had encouraged everyone to try out.

> Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, including its tendencies to reproduce prejudice and assert falsehoods as facts.

> However, Meta and other companies working on large language models, including Google, have failed to take it seriously.

Yann LeCunn confirmed this: https://twitter.com/pmarca/status/1631185701864865792

I wonder if they just leaked it onto 4chan themselves, lol.

louis030195 · on Feb 28, 2023

The difference between invention & innovation is that innovation is when you ship your product to the masses. Can I query llama in a line of code? No

gersh · on Feb 27, 2023

Is the trained model available to download anywhere?

ddren · on Feb 27, 2023

Not really, the LLaMA model is only available on request and access is granted on a "case by case basis" [1], which for most of us is more or less as available as GPT-3 is.

[1] https://ai.facebook.com/blog/large-language-model-llama-meta...

swyx · on Feb 27, 2023

> more or less as available as GPT-3 is

what? GPT3 is available by a public api that anyone can sign up and pay for and use for commercial use. how is it "as available"?

ddren · on Feb 27, 2023

I was mostly talking about access to the trained model weights. The OpenAI API is certainly better than nothing, but it is very restrictive and cost prohibitive for many purposes. For instance, you have to adhere to the OpenAI usage policies, and while they offer fine-tuning services, it is not likely enough to implement techniques like RLHF, which is the basis for ChatGPT.

That said, if LLaMa can achieve performance competitive with GPT-3 with just 13B parameters, I imagine that it is only a matter of time until open source pre-trained models based on this architecture become available, which would render GPT-3 obsolete.

rvz · on Feb 27, 2023

> LLaMA is creating a lot of excitement because it is smaller than GPT-3 but has better performance. For example, LLaMA's 13B architecture outperforms GPT-3 despite being 10 times smaller.

Exactly. Best part is that it is open-source.

That is worth getting excited about. Not a AI SaaS API owned by a so-called pseudo-non profit company which struggles on API uptime and availablity, just like GitHub.

This is the 'revolution' you are looking for that changes everything. Not ChatGPT.

Ozzie_osman · on Feb 27, 2023

Is it open source though? It looks like Meta was only releasing it to academics who apply.

Taek · on Feb 27, 2023

its not open source, only approved researchers are allowed to access the weights. More ivory towering from the AI industry.

mikie92 · on Feb 27, 2023

right, and I guess there'll be opensource alternatives of llama very soon

eigenvalue · on Feb 27, 2023

It's for your own "safety"...

karmasimida · on Feb 27, 2023

Is it of any good?

Jack5500 · on Feb 27, 2023

This seems like a great first step to a truly open source LLM

lurquer · on Feb 27, 2023

Indeed. All the weights for all the models will be available one way or the other very soon.

The proprietary nature of the weights is not going to be a bottleneck for more than a month, if I had to guess.

The other bottle-neck to personal use — the hardware required to run (not train from scratch) the thing - is going to be gone within the year I bet. I would assume some clever bloke is going to be able to prune the model or decrease the precision of the weights and discover you can get good-enough results with 1/10th of the memory.

What happens then?

Well, probably some Very Bad Things.

visarga · on Feb 27, 2023

If we can confirm the 13B model is as good as GPT3 then at least we have a target for reproduction without FB restrictions.