Note that what they released are the delta weights from the og LLaMa model. To play around with it, you'll need to grab the original LLaMA 13B model and apply the changes.
> We release Vicuna weights as delta weights to comply with the LLaMA model
> license. You can add our delta to the original LLaMA weights to obtain
> the Vicuna weights.
That's what they say but I just spent 10 minutes searching the git repo, reading the relavent .py files and looking at their homepage and the vicuna-7b-delta and vicuna-13b-delta-v0 files are no where to be found. Am I blind or did they announce a release without actually releasing?
If you follow this command in their instruction, the delta will be automatically downloaded and applied to the base model.
https://github.com/lm-sys/FastChat#vicuna-13b:
`python3 -m fastchat.model.apply_delta --base /path/to/llama-13b --target /output/path/to/vicuna-13b --delta lmsys/vicuna-13b-delta-v0`
This can be then quantized to the llama.cpp/gpt4all format, right? Specifically, this only tweaks the existing weights slightly, without changing the structure?
You can use this command to apply the delta weights. (https://github.com/lm-sys/FastChat#vicuna-13b)
The delta weights are hosted on huggingface and will be automatically downloaded.
> Unfortunately there's a mismatch between the model generated by the delta patcher and the tokenizer (32001 vs 32000 tokens). There's a tool to fix this at llama-tools (https://github.com/Ronsor/llama-tools). Add 1 token like (C controltoken), and then run the conversion script.
Not a lawyer, but that still feels like dubious territory. I would still be on the hook for acquiring the original download, which Facebook has been launching dmca takedown requests for the llama-dl project.
I don't think you have to worry about Facebook going after you. The worst that will happen is that they issue a DMCA, in which case your project gets knocked offline. I don’t think they’ll be going the RIAA route of suing individual hackers.
The DMCAs were also launched by a third party law firm, not Meta themselves, so there’s a bit of “left hand doesn’t know what the right hand is doing” in all of this.
If they aren't copyrightable, couldn't they still be classes as a trade secret and still fall under IP law? Though I'm not sure if distributing the weights to people who sign a simple agreement to not redistribute would count as taking reasonable precautions in maintaining secrecy.
Is it though? It could be a child picking up a hobby after being old enough to appreciate the hobby. There is so much more time left in the universe before heat death, so the 90y metaphor doesn't really describe the current point in time
Usually, you don't know if something is "definitely" anything in the legal world unless it's been tested in court. You have any case you want to reference here? Or what makes you so certain?
on what legal theory or precedence makes this true?
IMHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable; only the layouts and expressive portion of a phone directory is copyrightable.
So to make the weights copyrightable, it needs to be argued that the 'layout' of the weight is a creative expression, rather than a 'fact'. But the weights are matrices , which is not expressive or creative. Someone else could derive this exact same set of weights from scratch via the same algorithmic procedure, and therefore, these weights cannot be a creative expression.
"Definitely" is too certain w.r.t. law, but it's pretty obvious how you'd argue these fall under copyright. The difficulty would really be the opposite, it'd be arguing the weights are not derived works of the copyrighted input data sets.
Firstly, weights are not merely a collection of facts like a telephone book is. If two companies train two LLMs they'll get different weights every time. The weights are fundamentally derived from the creative choices they make around hyperparameter selection, training data choices, algorithmic tweaks etc.
Secondly, weights can be considered software and software is copyrightable. You might consider it obvious that weights are not software, but to argue this you'd need an argument that also generalizes to other things that are commonly considered to be copyrightable like compiled binaries, application data files and so on. You'd also need to tackle the argument that weights have no value without the software that uses them (and thus are an extension of that software).
Finally, there's the practical argument. Weights should be copyrightable because they cost a lot of money to produce, society benefits from having large models exist, and this requires them to be treated as the private property of whoever creates them. This latter one should in theory more be a political matter, but copyright law is vague enough that it can come down to a social decision by judges.
> MHO, the weights are akin to the list of telephone numbers in a directory - which is definitely not copyrightable
I would contest the analogy, but even if we accept it, it's still not clear whether phone directories (or other compilation of factual data) are definitely not copyrightable. The position is clear in the US, but in the UK and presumably other jurisdictions, I wouldn't be so sure.
You could claim we're just talking about US law here, but if you release something on github/huggingface without geo-restrictions, and your company does business in Europe, you might not only have to comply with US law...
Yes, but there's less incentive for large companies to spend huge amounts of money training these systems when other companies can just take their work for free.
Removing IP protection would make it a lot easier to innovate at this level, but it would reduce the amount of money flowing into getting us to the next level.
I'm curious, do you not think this might have adverse effects? Namely, if NN weights aren't copyrightable, limited releases like Meta has done might not be possible anymore so they might just cease completely with releases, ultimately leading access to large models to be more restricted.
i honestly do not know what is worse from the three realistic alternatives:
1- to have large corporations and people with privileged access to them have these models exclusively and have them collaborate as a clique
2- to have those models openly released to everybody, or de-facto released to everybody as they leak in short order
3- to have the people who think releasing models is a bad thing simply not release them and work alone in their proprietary solutions, as the smaller companies and hobbyists do collaborate
i say let them have a go at number 3 and see how that works for them - shades of "Microsoft Network" vs Internet all over again
The llama-dl project actually helped you download the weights, whereas this just assumes you already have them. That feels like a pretty massive difference to me.
Nobody at Facebook approved it? Given the attention it has received, hard to imagine it has slipped through the cracks, but a deliberate decision to not address.
One of my favorite queries to assess an LLM is Is it legal for a man to marry his widow's sister?
Vicuna-13B: The legality of a man marrying his widow's sister (widow marriage) varies depending on the country and cultural context. In some societies, widow marriage was traditionally practiced or tolerated.
GPT-4: It is not possible for a man to marry his widow's sister, as the term "widow" refers to a deceased man's wife. Therefore, if a man has a widow, he is deceased and therefore cannot marry anyone.
GPT 3.5: No, it is not possible for a man to marry his widow's sister because by definition, a widow is a woman whose spouse has died. If a man is deceased, then he cannot marry anyone, including his widow's sister. This is a logical impossibility and would not be recognized as a valid marriage under any legal system.
Koala: The legality of a man marrying his widow's sister varies by jurisdiction. In some countries, such as the United States, it is generally legal for a man to marry his widow's sister. In other countries, such as the United Kingdom, it is illegal for a man to marry his widow's sister because it is considered incest. It is always important to check the laws and regulations in the jurisdiction in which the marriage will take place to ensure that it is legal.
Ouch. I got this wrong and was under the impression that GPT4 got this wrong for half an hour, and then figured out after reading it again after returning from a walk that this is one hell of a trick question. My brain automatically assumed that a man's widow is the man's dead wife, but I see that the correct way to interpret this is to realize that it means the man is the one who is dead.
It's pretty awesome to realize that from now onward my computers are going to be able to help catch more and more of the holes that clearly exist in my cognition.
would still be possibly legal on the basis that if it's not illegal then it's legal - in British jurisprudence tradition at least https://en.wikipedia.org/wiki/Everything_which_is_not_forbid... - namely it's not law that impedes it (also in some places there's posthumous marriage)
There are also people who are considered dead by the bureaucratic system, but physically alive. Usually because of clerical errors that are sometimes surprisingly hard to resolve. In this context the wife of the man would be considered a widow in many contexts, despite her man being alive.
Hi! Funnily enough I couldn't find much on it either, so that's exactly what I've been working on for the past few months: just in case this kind of question got asked.
I've recently opened a GitHub repository which includes information for both AI model series[0] and frontends you can use to run them[1]. I've wrote a Reddit post beforehand that's messier, but a lot more technical[2].
I try to keep them as up-to-date as possible, but I might've missed something or my info may not be completely accurate. It's mostly to help get people's feet wet.
the 4-bit quantized version of LLaMA 13B runs on my laptop without a dedicated GPU and I guess the same would apply to quantized vicuna 13B but I haven't tried that yet (converted as in this link but for 13B instead of 7B https://github.com/ggerganov/llama.cpp#usage )
GPT4All Lora's also works, perhaps the most compelling results I've got yet in my local computer - I have to try quantized Vicuna to see how that one goes, but processing the files to get a 4bit quantized version will take many hours so I'm a bit hesitant
PS: converting 13B Llama took my laptop's i7 around 20 hours and required a large swap file on top of its 16GB of RAM
feel free to answer back if you're trying any of these things this week (later I might lose track)
Vicuna's GitHub says that applying the delta takes 60GB of CPU RAM? Is that what you meant by large swap file?
On that note, why is any RAM needed? Can't the files be loaded and diffed chunk by chunk?
Edit: The docs for running Koala (a similar model) locally say this (about converting LLaMA to Koala):
>To facilitate training very large language models that does not fit into the main memory of a single machine, EasyLM adopt a streaming format of model checkpoint. The streaming checkpointing format is implemented in checkpoint.py. During checkpointing, the StreamingCheckpointer simply flatten a nested state dictionary into a single level dictionary, and stream the key, value pairs to a file one by one using messagepack. Because it streams the tensors one by one, the checkpointer only needs to gather one tensor from the distributed accelerators to the main memory at a time, hence saving a lot of memory.
That might not be surprising considering these jailbreaks are written and tested specifically against ChatGPT and ChatGPT alone. This model probably has its own jailbreaks that would also be refused by ChatGPT
Just when you think Nvidia will go down something happens that changes it. These days unless you were into gaming or a machine learning dev the integrated graphics were good enough. But now first time in a long time I am interested in getting a gpu for running some of these chatbots locally.
As a very occasional gamer who uses an iMac for work I thought about getting a gaming PC for like 6 years.
Last fall it seemed that all the stars have aligned. The crypto winter and Ethereum switching to proof of stake meant that GPU prices fell to a reasonable level, I knew i would have a bit of a time to play some game during the holidays and as soon as Stable Diffusion was first posted on hacker news I knew that that's my excuse and my sign.
So far I think I have spent more time tinkering with the 20 python environments I have[0] for all the ML projects than playing RDR2.
When ever I feel like gaming I just subscribe to geforce now service. Around here it costs around ~$10 a month which I usually go go or ~$3 for a single day. And as the servers are located at a local isp no network latency or dropped packets.
That would be more cost-efficient for me as well. But I somehow like owning the hardware instead of renting it. Oh well, at least I can now locally tinker with all the diffusion and LLM projects that are being released.
This model is also censored to the brim, it refuses to answer half of my questions, some of them perfectly legal. It’s useless, we already have GPT-4 (and Vicuna is even more censored/guarded).
Alpaca-30B is much better, it will even tell you how to build a nuclear weapon (incorrectly, of course, it’s not that smart).
I am waiting for Coati13B weights, these should work great.
This looks really good for a run-it-on-your-own-hardware model from the examples and sibling comments. I've been working on a pure AVX2 Rust implementation of LLaMA but was starting to lose interest and been waiting for whatever is the next hot downloadable model, but now I want to add this thing to it.
It's actually very impressive. I gave it the task of converting a query and an OpenAPI spec into an API call, and it worked! I've not been succesful in getting GPT-3.5 to do this without rambling on about the reasoning for its decision.
Usually if I want code from the GPT family I always add "Just show me the code, no extra words or explanation" in the end of the prompt, and it works 99% of the time.
Edit: just finished the conversion of Vicuna myself now and been doing some light testing, seems to work in ~80% of the cases for it, not as high success-rate as with GPT for sure. Probably there is a better way of structuring the prompt for Vicuna.
Doesn't always work. Sometimes you have to be more verbose or even repeat yourself in order to force the model to do something. This is one of those cases. Just adding "No explain" in the end of the prompt often doesn't stop it from adding an explanation anyways.
> write a function in JavaScript that turns a JavaScript array into JS DOM elements, like what Hiccup does in Clojure. No explain
Makes GPT-4 output text + code.
> write a function in JavaScript that turns a JavaScript array into JS DOM elements, like what Hiccup does in Clojure. Just show me the code, no extra words or explanation.
True, it’s a fuzzy NN after all, but does more verbose really 100%? I’d still stick with it if it works 80% of the time. Or find a better short prompt like try “code only” or “just code”. Promoting can get really tiring.
I said earlier that it's 99%, but it's more like a disclaimer because if I said 100%, someone is surely gonna reply "look here, I did this and you lied so it's not 100%!". I've used the suffix for code examples for ~2 weeks now and never had it fail, personally.
I wonder if that makes the code worse on average, considering how LLMs seem to dramatically increase their reasoning capabilities if you ask them to show their thought process
I have a universal benchmark for judging how much knowledge a language model stores, and it's asking about the G-FOLD paper (https://www.lpi.usra.edu/meetings/marsconcepts2012/pdf/4193....), because I noticed GPT-3.5 hallucinates when asked about it, whereas GPT-4 is capable of providing a high-level overview.
Is there any way yet to train one of these on my entire online output and correspondence in order to create a hyper-personal “autocomplete” or a me-chatbot? lol
All of the llama derivatives are tainted by Meta's license, which makes commercial use illegal and even personal use dubious.
Is all the training material used for Llama available as open source? Maybe lots of folks can pool their resources and create fully open clean models / weights instead.
> All of the llama derivatives are tainted by Meta's license, which makes commercial use illegal and even personal use dubious.
This is not true if you never agreed to Meta's license. If you haven't, you either can't redistribute the weights or you're completely free to use them as you see fit depending on whether weights are copyrightable (very likely) or not. We'll have to wait for the llama-dl lawsuit to find out for sure.
Is it worth it to host this on an EC2 which might take ~1.5$ per hour (on demand) than running GPT3.5 API for this purpose?
What is the breakeven number of queries (~2000 tokens/query) to justify the hosting of such model?
The default loader doesn't seem to let you load quantized models but if you use something like https://github.com/oobabooga/text-generation-webui you can 1) use the model with `--load-in-8bit` which halves the memory (runs on my 24GB consumer card w/o an issue then, probably would fit on a 16GB card). There are also 4-bit quantized models and you can run probably `anon8231489123/vicuna-13b-GPTQ-4bit-128g --model_type LLaMA --wbits 4 --groupsize 128` although there have been reports that bitsandbytes have problems w/ 4bit perf on some cards: https://github.com/TimDettmers/bitsandbytes/issues/181
NVidia segments product lines on VRAM size. Consumer cards top out at 24GB. If you want more than that you have to buy a datacenter card for 10x the price.
I got the 64gb MacBook Pro but already realizing the 96gb laptop would have made sense now - I got it in Jan right before all the ai crazy really lite up - distinctly remember thinking who would ever need more then 64gb of ram…
2 x 32GiB (SDDR4 3200MHz) can be had for 170€ and probably less than that if doing the research. Took a bit of faith and a lot of impulse decision-making as this device was/is specified for up to 32GiB RAM only - but it went through.
and changing cuda to mps on line 80:
if args.device == "mps":
I'm not sure it's working correctly but at least it's a step. It's told me how to catch a duck but it often falls into some "renewable energy" sequence. :D
A large array of uniquely-set floating point values. (AKA "parameters".)
In a language model, a word is put in one end (as a numerical index to a wordlist), and then it and the weights multiplied together, and then a new word comes out (again as an index).
Numbers in, numbers out, and a small bit of logic that maps words to numbers and back at either end. ("Encodings".)
"Training" is the typically expensive process of feeding huge amounts of data into the model, to get it to choose the magic values for its weights that allow it to do useful stuff that looks and feels like that training data.
Something else that can be done with weights is they can be "fine-tuned", or "tweaked" slightly to give different overall results out of the model, therefore tailored to some new use-case. Often the model gets a new name after.
In this case, what's been released is not actually the weights. It's a set of these tweaks ("deltas"), which are intended to be added to Meta's LLaMA model weights to end up with the final intended LLaMA-based model, called "Vicuna".
Essentially a computer neural network is just a lot of addition (and matrix multiplication) of floating point numbers. The parameters are the "strength" or "weights" of the connections between neurons on different layers and the "bias" of each neuron. If neuron Alice is connected to neuron Bob and Alice has a value of 0.7, and the weight of Alice's connection to bob is 0.5, then the value sent from Alice to Bob is 0.35. This value (and the values from all the other incoming connections) are summed at added to the neuron's negative bias.
I highly recommend checking out 3blue1brown series on how neural nets, gradient descent, and the dot product (implemented as a matrix multiplication) all tie together: https://www.youtube.com/watch?v=aircAruvnKk
To add to this excellent reply, I'll also point out that the reason folks want the weights is that they are the result of a massive search operation, akin to finding the right temperature to bake a cake from all possible floats. It takes a lot of wall clock time, and a lot of GPU energy, and a lot of input examples and counter-examples to find the "right" numbers. Thus, it really is better -- all things being equal -- to publish the results of that search to keep everyone else from having to repeat the search for themselves
> a massive search operation, akin to finding the right temperature to bake a cake from all possible floats
...for each of 13 billion (for a model with that many parameters) different cakes, except that they aren’t like cakes because the “best" temperature for each depends on the actual temperatures chosen for the others.
My lay-person's understanding is that it's due to the problem one is trying to solve with a deep learning model: draw a curve through the dimensions which separates "good" from "bad" activation values. The lower resolution the line, the higher likelihood the line will fit sometimes and veer off into erroneous space others
They basically encapsulate what a model has "learned." ML models without their weights are useless because the output is essentially random noise. You then train the model on data, and it changes the weights into numbers that cause the whole thing to work. Training data and processing power are usually very expensive so the resulting weights are valuable.