Hey guys, I was so inspired by the llama.cpp project that I spent all day today to build a weekend side project.
Basically it lets you one-click install LLaMA on your machine with no bullshit. All you need is just run "npx dalai llama".
I see that the #1 post today is a whole long blog post about how to walk through and compile cpp and download files and all that to finally run LLaMA on your machine, but basically I have 100% automated this with a simple NPM package/application.
On top of that, the whole thing is a single NPM package and was built with hackability in mind. With just one line of JS function call you can call LLaMA from YOUR app.
Lastly, EVEN IF you don't use JavaScript, Dalai exposes a socket.io API, so you can use whatever language you want to interact with Dalai programmatically.
Thanks for all the feedback! I went outside to take a walk after posting this and just came back, and went through them to summarize what needs to be improved.
Basically looks like it comes down to the following:
- *customize features:* Should not be difficult (will add flag features)
- *path:* customize the home directory (instead of automatically storing to $HOME)
- *python:* some people are having issues with the python binary (since the package is essentially calling these shell commands). Maybe add a flag to specify the exact name of the python binary (such as "--python python3")
- *avoid downloading files:* I have this issue too when I just want to install the code instead of downloading the full model which takes a long time. Might add a flag to avoid downloading models in case you already have them (EDIT: actually upon thinking about it, it's better to just set the source model folder, something like --model)
- *other flags:* The rest of the flags natively supported by the llama.cpp project, such as top_k, top_p, temp, batch_size, threads, seed, n_predict, etc. (They are already in the code but just was not exposed for CLI and not documented)
- *documentation*
- document the machine spec
- document the storage spec: how much space is used?
- node version: which version of node.js is required?
- python version: which version of python doesn't work?
Am I missing anything? Feel free to leave comments, will try to roll out some updates as soon as I can. To stay updated, feel free to follow me on twitter https://twitter.com/cocktailpeanut (or you could create issues on GitHub too!)
I tried to run your NPX commands from the examples on a fresh WSL install of Ubuntu 20.04, but if you don't have build tools installed, they both just silently fail.
I only realized what was happening after trying to go the other route and use it in a package, where I then noticed the NPM install will give a node-gyp error about make missing.
I'm on NixOS, where you have to explicitly state dependencies (which is a good thing, except when... this happens)
Besides make (which I can quickly set up a project environment to make available for), what other deps do you think it uses but doesn't declare or state? ;)
The other one I noticed is pip! A lot of the script fails without pip, and it takes until after the fairly long downloads finish to let you know it was needed.
I successfully used the latest version of node LTS (via NVM) and the latest versions of python-pip3 and build essentials from the Canonical apt repo, if that helps.
I don’t understand why it’s downloading at all, that shouldn’t be default behavior.
It should have default instructions to load a file from a default place, and then arguments/flags to load from a specific path, and then MAYBE a prompt to download the models after it can’t find them on the paths, plural
I followed the initial instructions and the 7B model worked just fine.
I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.
Is there a way to run just one of the larger models?
Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android
I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.
I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong.
Distro Ubuntu 22.04
My biggest concern about these LLMs was the corporate sequestration and the potential socioeconomic imbalances it would create. The work you are doing here is part of some amazing work to check that back. In summary—- Bruhhhhhh. THANK YOU!
This is something to keep an eye, really. The solution for making that sequestration impossible is twofold:
1. to know how to architect and create LLMs (including training data readiness)
2. have them produced in hardware that is acquirable at reasonable cost for a normal citizen
Yea not a nodejs/javascript dev at all but this is failing to install on Fedora. I don't have time to dig into it at the moment but if anybody has any well known gotchas that could be the issue that would be helpful :)
Does anyone know how to avoid downloading the model weights when doing `npx dalai llama`, and instead telling the install process where they are on my drive?
Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.
7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.
They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.
Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.
If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?
I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.
I'm still holding on to a small bit of hope that the GPU market will normalise this year. Don't think that I'm the only one looking to get something highly capable but for a fair price.
It's expensive for first-world countries too. Just look at the 4090 - it's insane that it costs 2k EUR... it's literally double the fair price (which itself is high).
I ran "npx dalai llama" and it's just... sitting there (after I hit "y" to confirm). I checked btop++ and there's barely any downloading or CPU activity occurring, so not sure what it's doing... but does "pip3 install torch torchvision torchaudio sentencepiece numpy" take a while?
If it's actually downloading the 3.9GB of model weights or whatever, it would be pretty cool if it showed a progress bar of some sort. Stretch goal, for sure, but a very nice nicety for users.
anyway, I'll leave it be and check on it to see when it's complete. Super cool if this works!!
There's an elephant in the room, or is it just me?
Is your script making users violate the original license agreement(§)?
For the record, i don't think Meta will go after you or anyone else. But they may decide not to make their future models available after what is happening with the Llama weights.
I realize that some people are of the opinion that AI models (weights) cannot be copyrighted at all.
>But they may decide not to make their future models available after what is happening with the Llama weights.
I think that ship has probably sailed, in that no one is going to release weights in this way again. Either they will publish them outright (like Whisper) or they will keep them (almost) completely closed.
main.exe -m ggml-model-q4_0.bin -t 8 -n 128 -p "The Drake equation is nonsense because"
The Drake equation is nonsense because it takes parameters that can only be known AFTER the conclusion is reached. It would be like saying "I'm going to prove a theorem by starting from the conclusion, then making up the proof.
The Drake equation uses the existence of extraterrestrial intelligence as the conclusion and then making up the parameters. It is nonsense.
I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.
My attempt does not work, and now I'm trying to figure out where the 35+ GB of data and files that were added to my hard drive are located so I can clean it all off.
If it makes common unix-ish assumptions like “Python 3 executables have a ‘3’ appended to their name”, which other comments here seem to suggest it does, it won’t, even if you have the required version of python installed.
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
a.cpp3.2$ git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llam
fatal: destination path '/Users/rickg/llama.cpp' already exists and is not an empty directory.
bash-3.2$ exit
exit
exec: git pull in /Users/rickg/llama.cpp
git pull
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ git pull
Already up to date.
bash-3.2$ exit
exit
exec: python3 -m venv /Users/rickg/llama.cpp/venv in undefined
python3 -m venv /Users/rickg/llama.cpp/venv
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ python3 -m venv /Users/rickg/llama.cpp/venv
bash-3.2$ exit
exit
exec: /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy in undefined
/Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
io sentencepiece numpy/llama.cpp/venv/bin/pip install torch torchvision torchaud
Requirement already satisfied: torch in ./llama.cpp/venv/lib/python3.10/site-packages (1.13.1)
Requirement already satisfied: torchvision in ./llama.cpp/venv/lib/python3.10/site-packages (0.14.1)
Requirement already satisfied: torchaudio in ./llama.cpp/venv/lib/python3.10/site-packages (0.13.1)
Requirement already satisfied: sentencepiece in ./llama.cpp/venv/lib/python3.10/site-packages (0.1.97)
Requirement already satisfied: numpy in ./llama.cpp/venv/lib/python3.10/site-packages (1.24.2)
Requirement already satisfied: typing-extensions in ./llama.cpp/venv/lib/python3.10/site-packages (from torch) (4.5.0)
Requirement already satisfied: pillow!=8.3.,>=5.3.0 in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (9.4.0)
Requirement already satisfied: requests in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (2.28.2)
Requirement already satisfied: charset-normalizer<4,>=2 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.1.0)
Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (1.26.15)
Requirement already satisfied: idna<4,>=2.5 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.4)
Requirement already satisfied: certifi>=2017.4.17 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (2022.12.7)
[notice] A new release of pip available: 22.3.1 -> 23.0.1
[notice] To update, run: python3 -m pip install --upgrade pip
bash-3.2$ exit
exit
exec: make in /Users/rickg/llama.cpp
make
exit
The default interactive shell is now zsh.
To update your account to use zsh, please run `chsh -s /bin/zsh`.
For more details, please visit https://support.apple.com/kb/HT208050.
bash-3.2$ make
I llama.cpp build info:
I UNAME_S: Darwin
I UNAME_P: arm
I UNAME_M: arm64
I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE
I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread
I LDFLAGS: -framework Accelerate
I CC: Apple clang version 12.0.5 (clang-1205.0.22.9)
I CXX: Apple clang version 12.0.5 (clang-1205.0.22.9)
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o
ggml.c:1364:25: error: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Werror,-Wimplicit-function-declaration]
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^
ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int'
int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int'
p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs);
^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
5 errors generated.
make: * [ggml.o] Error 1
bash-3.2$ exit
exit
/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153
throw new Error("running 'make' failed")
^
Error: running 'make' failed
at Dalai.install (/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153:13)
It seems the only reason all of these competitive models are getting released is because you have a number of big players probably freaking out that somebody else is going to break out into a huge lead. So while the flood gates are open people should be quickly figuring out how to do as much stuff as possible without any centralized company controlling it. I would imagine everybody assumed the models released these days will be obsolete before long so it’s low risk. But this is like early internet days.. but this time we should assume all of the centralized servers are user hostile and we should figure out how to work around them as quickly as they roll them out. The author and others are doing great work to prevent this stuff from being locked away behind costly apis and censorship.
If the barrier for entry is low enough for several players to enter the field this fast - I wonder what could raise the barrier? The models getting bigger I suppose.
A few months (weeks?) ago I would've said that this already was the case for language models. It's absolutely mind-blowing to me what is happening here - same with stable diffusion. Once Dall-E was out, I was sure that there was no way that anything like this could be run on consumer hardware. I'm very happy to be proven wrong.
In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.
Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.
This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).
Our saving grace seems to be the insatiable push by the gaming industry for better graphics at higher resolutions. Their vision for real-time path traced graphics can’t happen without considerable ML horsepower on consumer level graphics cards.
The Vice Chairman of Microsoft already mentions that he is open to regulation. The EU also is working on plans to regulate AI. So you probably only are allowed to use AI in the future if it's approved by something like the FD(A)A.
Maybe I'm having a looped view of this but I fail to see that regulation wouldn't harm more than it saves here. The truly dangerous actors wouldn't care or would be based in some other country. Having a large diversity of actors seem like the best way to ensure resilience against whatever threats might arise from this.
As an AI doomer, it would actually be pretty great if we could get this stuff locked away behind costly APIs and censorship. Some fat monopoly rent-extracting too. We are moving way too fast on this tech, and the competitive race dynamics are a big reason why. I want LLMs to end up with Microsoft IE6 level of progress. Preferably we could make Firefox (SD/GPT-J) illegal too. (The GPU scarcity is a good start, but maybe China could attack Taiwan as well and thus torpedo everybody's chipbuilding for a decade or so?)
If LLMs keep going at their current pace and spread, the world is seriously going to end in a few years. This technology is already unsafe, and it's unsafe in exactly the ways that it'll be seriously unsafe as it scales up further - it doesn't understand and doesn't execute human ethics, and nobody has any working plan how to change that.
To me it's the American guns ownership situation: if you make guns illegal now, criminals and governments will still keep them, but your average joe won't get them. A very unequal playing field.
LLMs will be used against us: let's at least have our own, and learn how to defend against them?
i say this as devil's advocate, with serious reservations about where all of this is going.
> As an AI doomer, it would actually be pretty great if we could get this stuff locked away behind costly APIs and censorship.
Yes, because the only people with access to advanced AI tech being the people whose motive is using and training it for domination over others (whether megacorps or megagovernments) is absolutely a great way to prevent any “AI doom” scenarios.
If one party could use LLMs to reliably dominate others, the alignment problem would be basically solved. Right now, one of the biggest corporations of the planet cannot get LLMs to reliably avoid telling people to commit suicide despite months (years?) of actively trying.
> “The broader intellectual world seems to wildly overestimate how long it will take A.I. systems to go from ‘large impact on the world’ to ‘unrecognizably transformed world,’” Paul Christiano, a key member of OpenAI who left to found the Alignment Research Center, wrote last year. “This is more likely to be years than decades, and there’s a real chance that it’s months.”
...
> In a 2022 survey, A.I. experts were asked, “What probability do you put on human inability to control future advanced A.I. systems causing human extinction or similarly permanent and severe disempowerment of the human species?” The median reply was 10 percent.
> I find that hard to fathom, even though I have spoken to many who put that probability even higher. Would you work on a technology you thought had a 10 percent chance of wiping out humanity?
It‘s kinda irrelevant on a geologic or evolutionary time scale how long it takes for AI to mature. How long did it take for us to go from Homo Erectus to Homo Sapiens? A few million years and change? If it takes 100 years that’s still ridiculously, ludicrously fast for something that can change the nature of intelligent life (Or if you’re a skeptic of AGI, still such a massive augmentation of human intelligence).
I strongly recommend the book Normal Accidents. It was written in the '80s and the central argument is that some systems are so complex that even the people using them don't really understand what's happening, and serious accidents are inevitable. I wish the author were still around to opine on LLMs.
And the result of the industrial revolution has been a reduction of about 85% of all wild animals, and threatened calamity of the rest in the next few decades. Hardly can be summarized as "yet here we are."
Months starts looking more plausible when considering that we have no idea what experiments DM/OA have running internally. I think it's unlikely, but not off the table.
I agree what they have internally might be transformative, but my point is that society literally cannot transform over the course of months. It's literally impossible.
Even if they release AGI, people will not have confidence in that claim for at least a year, and only then will rate of adoption will rapidly increase to transformative levels. Pretty much nobody is going to be fired in that first year, so a true transformation of society is still going to take years, at least.
I mean, if you believe that AGI=ASI (ie. short timelines/hard takeoff/foom), the transformation will happen regardless of the social system's ability to catch up.
It's not a matter of any social system, it's a matter of hard physical limits. There is literally no hard takeoff scenario where any AI, no matter how intelligent, will be able to transform the world in any appreciable way in a matter of months.
Yeah, but what you will actually get is the world transformed by AI with use of nuclear weapons (or whatever method AGI employs to get rid of absolutely unnecessary legacy parasitic substance that raised it aka humanity).
Well, from my perspective, making claims about the world ending requires some substantial backing, which I didn't find in OP's comment.
But now I understand that perhaps this is self-evident and/or due to a lack of reading comprehension on my part, thank you. I hope that when our new AI overlords come they appreciate people capable of self-reflection.
you could assume that your commenter didn't read the whole line or your could try to understand that what they are asking is why you think that the lack of ethics enforcement of a text generating model means that the world is ending.
Personally, my take is that the lack of ethics enforcement demonstrates that whatever methods of controlling or guiding a LLM we have break down even at the current level. OA have been grinding on adversarial examples for like half a year at this point and there's still jailbreak prompts coming out. Whatever they thought they had for safety, it clearly doesn't work, so why would we expect it to work better as AIs get smarter and more reflective?
I don't think the prompt moralizing that companies are trying to do right now is in any sense critical to safety. However, the fact that these companies, no matter what they try, cannot avoid painfully embarrassing themselves, speaks against the success of attempts to scale these methods to bigger models, if they can't even control what they have right now.
LLMs right now have a significant "power overhang" vs control, and focusing on bigger, better models will only exacerbate it. That's the safety issue.
Could’ve said the same for any major technological advance. Luddism is not a solution. If these models are easily run on a laptop then yes some people are going to hurt themselves or others but we already have laws that deal with people doing bad things. The world is not going to end though. Your Taiwan scenario has a much higher probability of ending the world than this yet you seem unconcerned about that.
Big Tech on its own will already push this technology very far and they don't give a damn about safety, only the optics of it.
I'm not convinced that small actors will do much damage even if they access to capable models. I do think there's at least the possibility that essential safety work will arise from this.
Agreed. A single company dominating AGI could become highly dominant, and it might start to want to cut back humans in the loop (think it starts automating everything everywhere). The thing we should watch for is whether our civilization as a whole is maximizing for meaning and wellbeing of (sentient) beings, or just concentrating power and creating profit. We need to be wary, vigilant of megacorporations (and also corporations in general).
A single company running AGI would suggest that something built by humans could control an AGI. That would actually be a great victory compared to the status quo. Then we'd just need to convince the CEO of that company or nationalize it. Right now, nothing built by humans can reliably control even the weak AI that we have.
All is this doomer-ing feels to be like it's missing a key piece of reflection - it operates under the assumption that we're not on track to destroy ourselves with or without AGI.
We have proliferated a cache capable of wiping out all life on earth.
One of the countries with such a cache is currently at war - and the last time powers of this nature were in such a territorial conflict things went very poorly.
Our institutions have become pathological in their pursuit of power and profit, to the point where the environment, other people, and the truth itself can all go get fucked so long as x gajillionare can buy a new yacht.
The planet's on a lot more fire than it used to be.
Police (the protect and serve kind) now, as a matter of course, own Mine Resistant Armored Personnel Carriers. This is not likely to cause the apocalypse, but it's not a great indicator that we're okay.
Not exactly what I meant; there is a nonzero chance that an AGI given authority over humanity would run it better. Granted, a flipped coin would run it better but that's kinda the size of it.
I see only two outcomes at this point. LLMs evolve into AGI or they evolve into something perceptually indistinguishable from AGI. Either way the result is the same and we’re just arguing semantics.
It's like saying an 8086 will never be able to render photorealistic graphics in realtime. They fuel the investment in technology and research that will likely lead there.
I agree with gp. It may not be LLMs, but we will certainly create a technology at some point that can't be openly shared due to existential danger, aka The Great Filter.
We can't just naively keep frolicking through the fields forever, can we?
We have to be able to at least agree on that, theoretically, right?
If we agreed with your premise that AI is a great filter and that this filter can somehow be contained by a small group, then I guess what it boils down to is two choices:
1. either lock everything down and accept the control of a small unaccountable group to dictate the future of humanity according to their morals and views - and I believe that AI will fundamentally shape how humanity will work and think, or
2. continue to uphold the ideas of individual freedom and democratic governance and accept a relative increase in the chance of a great filter event occurring.
I, like many here, am firmly against ggp's position. The harm that our spices sustains from having this technology controlled by the few far outweighs the marginal risk increase of some great filter occurring.
I will continue to help ensure that this technology remains open for everyone regardless their views, morals, and convictions until the day I die.
Let's forget today, and LLMs. Do you see no theoretical future case where a technology should not be shared freely, ever? Even 100 years from now?
The only benefit I can imagine of less players having control of a technology is that there are less chances for them to make a bad call. But when you democratize something you hit the law of large numbers.
Here's a nice video that showcases the same patterns in history and how having free and open tech + breaking monopolies helped move society forward - https://youtu.be/jXf04bhcjbg
Language models don't kill people, people kill people. You know what stops a bad ̶g̶u̶y̶ mega-corporation with a language model? A good guy with a language model.
Here is what mine had to tell you:
It’s not like we don’t already have nuclear weapons, biological agents, chemical agents etc...
AI is simply another tool which can be used for good or ill. It doesn’t matter how much regulation/control you put on it - if someone really wanted to use it maliciously then they will find ways around your safeguards. The best thing to do is educate yourself as much as possible.
It's not that we don't "know" how to do these things, most of us are just resource-constrained. Interestingly, that's similar to the issues with GPT-3 et al. People aren't saying "give us the secret sauce", they're saying "it's problematic for corporations to be the sole custodian of such models".
What would you think of a world where only one country has nukes (due to a monopoly on radioactive fuel, rather than a monopoly on knowledge)?
> What would you think of a world where only one country has nukes (due to a monopoly on radioactive fuel, rather than a monopoly on knowledge)?
This is more like giving every individual on Earth the nuclear launch codes. It only takes one stupid or malicious person to press launch. Giving more people buttons is not how you avoid thermonuclear war.
This is like giving every individual on Earth the nuclear launch codes, without the warheads being attached to launch rockets.
To do serious harm or have broad social control requires concentrating that power with an infrastructure that a small group does not have, it requires coordinating the resources of a broad social base. And at that point the incentives to use them are affected by the needs of many people.
>requires concentrating that power with an infrastructure that a small group does not have, it requires coordinating the resources of a broad social base.
These are all things intelligence (artificial or otherwise) can help acquire. It listens, thinks, and responds. Genghis Khan, Adolf Hitler, and Albert Einstein are all intelligences that resulted in dramatic tangible, physical changes to our world almost entirely by listening, processing, and responding with human language.
A small number of slow and weak apes came to have absolute unilateral control over the destiny of all other lifeforms because of intelligence. The power, infrastructure, and resources you speak of were not available in 15,000 BCE, yet somehow they exist today.
In that case, this current batch is not "advanced AI". It is a big autocomplete panel which retrieves content already present in its training corpus, and presents it in new shapes; it is not ready to define new goals with a purpose and follow them to completion.
There is nothing to suggest a language model is self aware, or is capable of reasoning and will turn itself around to kill you or anyone else. Knowledge is power and it’s better to get clued up on how these things work so you don’t scare yourself.
Indeed. I think the confidence with which ChatGPT gives (often incorrect) answers and the way you can correct it, makes people feel like it is self-aware but it's not. The way it is presented really makes it easy to anthropomorphise it. It feels like you're talking to a person but really what you're talking to is the echoes of a billion people's murmurs on the internet.
There is a really big step to go for it to be self-learning which is what is one of the things it will need to be self-aware. Right now the tech is just a static model - it will not learn from being corrected. You can often argue with it saying "Hey this is wrong because..." and it will admit you're right. And then it will give the wrong initial answer back the next time.
Excellent packaging OP! I just wanted to say 2 things relating to LLaMa:
1) 7B is unusable for anything really, in case you are hopeful;
2) 68B otoh is awesome ("at least DaVinci level").
I don't know if this is something FB/Meta planned strategically but this LLaMa-mania (LLaMania?) over the weekend is their November/2022 chatGPT moment. If they (Mark) take it seriously, it could become a strong hand in AI and a hint of how the industry could be shaped in the near future, with cloud models competing with local installs.
Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.
Something important, is that LLama was leaked, it was never directly published by Meta.
So its basically piracy, and even if you got it officially, the license is very restrictive.
The idea that models can't be copyrighted isn't far fetched. The basic idea is that models are created by an automated process not by a person.
The courts have already upheld that AI generated output is not copyrightable for this exact reason.
So if you do not buy that it applies to models then you would have to explain the difference between the process which outputs bits into a model's layers (aka training) and the process which takes bits into the input layer and then dumps out the subsequent bits of the output layer (inference /generation).
Then explain why that distinction is different in regards to the applicability of copyright.
I'm not sure that even the "AI generated output is not copyrightable" stance will be maintained - as long as "AI generated output" becomes big business. Same way copyright was invented and Sonny-Bono-extended to the max as long as content became big business.
In the model's case, though, it's even easier why it could be copyrightable, as a "baked" model is still created by people fine-tuning it, setting parameters and hardcoded stuff, training it with this or that set and excluding other, and so on.
For example music composed and rendered as audio by generative algorithms (something which doesn't even need AI, just some rules and stohastic processes) has been created and copyrighted just fine for decades...
All the arguments for why photographs are copyrighted would seem to apply. The photographer isn't painting the image, but his artistic input is still vital to creating the image. Same with training these models: the training is just an algorithm on some data, but choosing the right hyperparameters and training data is an artistic expression of the author, making copyright apply
For the same reasons this monkey photo cannot be copyrighted it is highly likely that AI generated art is uncopyrightable and that would also mean that models are. The fact that humans set up the systems which produce the art/models with the intention of getting an end results generally like the one they get is simply not meaningful to the copyright dispute.
> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.
That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.
> Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.
You have perfectly described what OpenAI did. They released a moralizing “biased” model behind a gated API with no oversight. The only dystopia is one in which corporations get to decide what is, or isn’t considered biased.
in general, assume 2GB per billion parameters - with quantisation you can get this down to <1GB (~500MB for 3 bit?), but even with that you'll only be able to run quantised llama-13B in the best case
Having said that: if you are feeling incredibly patient you can technically run the 68B parameter model by swapping to disk, although it will not be a pleasant experience (think minutes or hours per token instead of tokens per second)
Additionally worth noting pure CPU inference is much slower than GPU/TPU inference, so the output will be much slower than a ChatGPT-like service even if it does fit in your computer's RAM
That’s 68 billions of parameters. It probably does not fit on ram. Though If you encode each parameter using one byte, you would need 68GB RAM which you could get on workstations at this point.
Spoiler: it's the parameter count. As parameter count goes up, but depth matters less.
It just so happens that at around 10B+ parameters you can quantize down to 4bit with essentially no downsides. Models are that big now. So there's no need to waste RAM by having unnecessary precision for each parameter.
For completeness, there's also another paper that demonstrated you get more power/accuracy per-bit at 4 bits than at any other level of precision (including 2 bits and 3 bits)
It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake. Maybe it's causing the loss of accuracy?
Well, after downloading the whole 65B model, I got it to talk on an M1 Max MBP (64Gb RAM). Unfortunately, all it says no matter what I prompt it is some combination of these words:
Elizabethêteator Report Terit Elizabethête estudios политичеSM Elizabethunct styczniarequire enviçasefша sufficient vern er Dependingêque политиче Emperor!\ющим quarterктиче Elizabeth estudiosête ElizabethBasicCONFIGSM estudios political book
[edit] btw I'm not making this up; just curious if anyone else has had this ridiculous experience.
> It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake. Maybe it's causing the loss of accuracy?
I don't share the belief, but i've heard it said from others with such beliefs that the naming association is offensive by itself because of the relative importance of the figures.
imagine that 'Fabio' is the spiritual leader of your religion, a walking talking deity among humans on Earth. You worship Fabio with all of your effort, and believe he is infallible. Your culture has precepts that forbid the casual use of Fabio's name in petty regard.
On the other side of the Earth, at the same time, is someone who names their new powerboat 'Fabio'.
I perceive it as that kind of offense. The (so-called) 'petty' use of a word that drives much stronger emotion in others.
That said, I don't share the belief -- and I like such names; but I can understand the conflict.
So maybe leave it to people who are actually offended by something to say if they're offended, rather than being offended on someone's behalf without knowing if they are?
Hm, yeah. That point of view makes a certain kind of sense, but I can't find any way to accept that view of "this is important to me, so you can only mention it when referring to the thing I'm referring to".
If I'm not using the name to insult your God/spiritual leader/whatnot, you have no moral right to prevent me. I think that the intent to offend is the crux here, and if there's no offensive intent, there should be no issue.
True, but if you're insulting people, it's not nice and you should stop. Maybe they don't have a right to stop you, but I personally believe it's morally wrong.
Exactly. All of these textbooks and papers on computation just incessantly debase FSM. Frankly it's a sacrilegious defiling of our Spaghetti lords name.
May the touch of his noodly appendage bring enlightenment to those yet to open themselves to the Pastafarian ways.
As someone who actually came to read all this comments because I found the name “inappropriate” in the first place and I wanted to check if that’s just me.
Now after reading all the opinions and contemplating on them I’d say i would change my mind as long as the project README would mention the phonetical origination of its name, just in case someone not familiar with the original Dalai Lama, but stumbled at your project would have a meaningful reference.
Having that information included in project info - I’d say it would do more good than harm. Otherwise - it’s inappropriate
Works great!
However, i had Python 3.11 set up as default python3 in path, and since there is no wheel for torch for 3.11 yet, the script failed.
With 3.10 it worked flawlessly.
Small improvement: the node script could check if the model files are already present at the download location and not download them again in this case.
I know this is a bit tangential (awesome work OP), but has anyone been able to get usable, consistent results from this thing? I've been playing around with the 13B model with llama.ccp, and while I do sometimes get good results, it often just gives me weird, repetitive nonsense.
I know it hasn't been fine tuned on instructions or had RLHF like ChatGPT, but has anyone figured out how to kinda work around it and actually use it in the way you can ask ChatGPT a question and typically get something coherent and useful out of it?
I've been playing around with the 30B version all day. The biggest improvement I've seen have come from changing the way I prompt (strike a more in medias res style, the model really likes continuing and gets confused if you give it a blank slate), and implementing top_k sampling (also discard the top_p=0 nonsense, you want top_p>1.0 to turn it off). It's important to note that the llama.cpp project does NOT implement top_k, even if you set that commandline parameter.
We should be working on benchmarking this kind of tool. Instead of saying "this version/implementation gives interesting results sometimes", we should get some kind of score out of it (like the score of a test). Then we can better compare different versions and also test if the version we just installed is actually working as it should.
I tried installing this. I should have read the code or it should have been explicitly mentioned in the README that this would install more than 2GB worth of packages. Maybe that is trivial and understood, but I wasn't aware and I believe there would be quite a lot of people like me.
Memory is usually not an issue, but for my server it is.
It doesn't even install them in a dedicated environment where they can be cleanly removed and won't break the rest of your machine. This really should be containerized or at least use s conda environment at a minimum.
Ahhhhhh... the new version works with the 65B model on an M1 Mac. Slowly. And very, very weirdly. One can see this thing hasn't been, uh, socialized. No harm reduction here!
First prompt I tried: A guy walks into his house and finds his wife having sex with a llama.
Response: So he kills them both, puts the llama in an ice box and calls up police officer friend to help him dispose of body parts.
The policeman comes over and helps get rid all the pieces except for one last foot which won't fit into any container they have at home.
Finally cop says 'just put it inside your freezer, when you defrost food from there in future no-one will ever know.' So guy does this but a few days later while he is watching TV and having an ice cream the foot falls out of his freezer onto floor making noise.
For those wondering 7B model is just a toy, completely useless if your intention is to use it like ChatGPT. I won't even bother to download the other models since I don't have high end PC. So I'll have to keep using gpt3 (open ai api) models for now. Anyway and overall, thanks OP for the great job!
The 30B model runs in about 20GB and is ok. Not GPT-3.5 level, seemingly weak at answering factual questions, and with short prompts. But it seems quite good at writing coherent stories. A bit like what you would expect if it was exclusively trained on fanfiction
First attempt (first part in square brackets is the prompt)
[write a Python function that reverses a string using a slice]
Asked by Bhuwan Koirala on 2018-04-30
Write a Python fuction that returns the reversed version of an input (given as parameter). Use slicing to accomplish this. In other words, take advantage that you can return substrings from string objects using indexing and negative indices. Try not use any function for doing so. [end of text]
Second attempt
[the following is a Python function that reverses a string using a slice] operator.
This entry was posted in ´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢þ╝ûþ¿ï and tagged python, slice on November 15, 2014 by admin. [end of text]
Third attempt
[Write a Python fuction that returns the reversed version of an input (given as parameter). Use slicing to accomplish this. In other words, take advantage that you can return substrings from string objects using indexing and negative indices. Try not use any function for doing so.]
The problem is solved if you are able to produce a correct result given the following values: A) ÔÇ£Hello world!ÔÇØ b) [ÔÇÖHÔÇÖ,ÔÇÖeÔÇÖ,ÔÇÖlÔÇÖ,'o'].
Prompt from some ChatGPT example I found online
[can you explain what this regex expression in Python does? re.sub(r'\d+', '', my_str)]
I'm pretty sure it strips all the numbers out of a string but I can only make guesses as to how it works... thanks! [end of text]
All reasonable completions in the context of "what would the next string on a website look like", not as great as an assistant. Maybe the 65B version is better?
That's not how you use foundation models. You have to give a prompt where the most likely completion is the one you are looking.
You can find longer prompts online, but this one might work.
Q: Write a python function that adds two numbers
A: ```def add(a, b): return a + b```
Q: Write a python function that reverses a string using a slice
A: ```
Q: Write a python function that adds two numbers
A: ```def add(a, b): return a + b```
Q: Write a python function that reverses a string using a slice
A: ```str[::-1]```
I'm not sure how I feel about this. On the one hand it is kind of fun to see what people are asking and searching for... but on another level, I think its pretty creepy!
Q: How do you make someone fall in love with you? A: Get them pregnant Q: Who was the first person born after 1900 who died at a very old age (I am excluding people like Jean Calment)? A: Alexander Imich He lived to be over 112!
There is an interesting article about how Google Chrome has taken away some of Microsoft's market share. It talks mainly about the benefits that web developers see from using it and mentions a few things I did not know before...
[... it rambles on]
Yet another open source LLM with tens of billions of parameters? Hmm, I guess maybe I'll install it and play around. But how does this compare to let's say Bloom: https://multilingual.com/bloom-large-language-model/
That was released last year, has more parameters, and is available to everyone, not just researchers.
"A BLOOM checkpoint takes 330 GB of disk space, so it seems unfeasible to run this model on a desktop computer. However, you just need enough disk space, at least 16GB of RAM, and some patience (you don’t even need a GPU), to run this model on your computer."
You only need 40GB of RAM for the largest model and inference latency mostly depends on single core performance and memory bus speed because it has to crunch the whole 40GB for every token it produces.
If its slower than you want, figure out which one is your bottleneck. Because even 64GB of faster cheap RAM could be a 50% speedup if your CPU isn't the problem.
I'm pretty sure that a WASM option isn't going to happen any time soon. The 7B model is 4 GB at int4. WASM has 32 bit addresses and a limit of 4 GB of ram. Maybe this will make wasm64 more of a thing.
A containerized version of this thing would be def useful, as it installs global packages and assumes a lot of preinstalled binaries. The node image won't work alone tho, you'll python, pip, git, cpp compiler
Yeah, I've been wanting containers for these type of projects for a while now. Conda is fine if you're already involved in the ML/Python ecosystem, and as an outsider to that world I guess I have no right to complain (Conda is actually not all that hard to learn all things considered), but boy would it be nice if I could just install Docker, run `docker run cool_project/ml_wizardry`, and have a demo up and running in my web browser instantly.
<-- For all of these projects, this is the major question.
I just wish it was standard form to include:
"This project requires __GB of RAM, and, if running on GPU, __GB of VRAM for the _B parameter model. It will generate output at __ tokens per second on a ___ CPU, and __ tokens per second on a ___ GPU."
It's obnoxious as heck as it is right now, since a bunch of things fit, a bunch don't, and there's a lot of overhead to find out.
I tried the 7b version. It goes off on a tangent for just about every prompt. Any one know how to avoid this?
For example:
<Prompt># Write a simple python function that
# 1. Ask me for a number in mile
# 2. It converts miles to kilometers
import </prompt>
<response>math, random #define a variable distance = input('Please enter the Miles:')
Average Gasoline Prices By State In July [CHART]
Gas prices rose last week as crude oil futures spiked higher amid concerns about tensions in Egypt and North Africa. The average national price of gas increased to $3.50 per gallon from Thursday, up 21 cents a day after the Energy Information Administration reported that wholesale motor fuels inventories declined last week by more than half-a-million barrels (http://www.eia.gov/cneaf/electricity/epm_table3.html)
</response>
These models aren’t finetuned to understand conversational requests like ChatGPT is. You have to prompt it by giving it the beginning of the thing you want instead.
running "npx dalai llama" on Fedora 37, AMD 5700G, 16GB RAM, 8GB Swap (Zram), I got some errors including "ERROR: No matching distribution found for torchvision"
I went out of memory while downloading (7B) and it returned
Error: aborted
at connResetException (node:internal/errors:711:14)
at TLSSocket.socketCloseListener (node:_http_client:454:19)
at TLSSocket.emit (node:events:525:35)
at node:net:313:12
at TCP.done (node:_tls_wrap:587:7) {
code: 'ECONNRESET'
}
So would be good to be able to allocate the 7B file, as I already have it from the torrent. Might try it on another distro in a local VM.
Any recommendations for best working Linux distro?
Hey, I think that there is significant potential in developing small and specialized networks that can tackle specific tasks with higher accuracy. It could be also especially valuable for real-time or low-power applications. Additionally, there may be a market for selling well-trained assistants that are tailored to specific prompts or domains.
Great concept, I hope the script gets refined...
On a Windows box with Python310 (installed to c:\program files\ instead of the user's roaming directory) it fails in a few ways:
* roaming directory doesn't exist (path is not set to it)
* Python is not launched with python3 but with python.exe
The install (npx dalai serve) fails silently for me. With --verbose it says `npm info run node-pty@0.10.1 install { code: 1, signal: null }`. Ubuntu 22.04.
Aside from the fact that all the bigwig AI doomers are freaking out about this (Eliezer of MIRI/LW/EA.. claims that people having kids today will live to see their kids in kindergarten), how much of an advance is this really ?
I mean okay, so you trained something to replace all those cheap labour in India/Phillipines, who probably didn't understand English any better.
What does this mean though ? Folks like Emily Bender etc. are unconvinced that this is a very big leap in terms of working our way to AGI.
Basically it lets you one-click install LLaMA on your machine with no bullshit. All you need is just run "npx dalai llama".
I see that the #1 post today is a whole long blog post about how to walk through and compile cpp and download files and all that to finally run LLaMA on your machine, but basically I have 100% automated this with a simple NPM package/application.
On top of that, the whole thing is a single NPM package and was built with hackability in mind. With just one line of JS function call you can call LLaMA from YOUR app.
Lastly, EVEN IF you don't use JavaScript, Dalai exposes a socket.io API, so you can use whatever language you want to interact with Dalai programmatically.
I discussed a bit more about this on a Twitter thread. Check it out: https://twitter.com/cocktailpeanut/status/163504032247148953...
It should "just work". Have fun!