Hacker News new | past | comments | ask | show | jobs | submit login
Dalai: Automatically install, run, and play with LLaMA on your computer (cocktailpeanut.github.io)
848 points by cocktailpeanut on March 12, 2023 | hide | past | favorite | 282 comments



Hey guys, I was so inspired by the llama.cpp project that I spent all day today to build a weekend side project.

Basically it lets you one-click install LLaMA on your machine with no bullshit. All you need is just run "npx dalai llama".

I see that the #1 post today is a whole long blog post about how to walk through and compile cpp and download files and all that to finally run LLaMA on your machine, but basically I have 100% automated this with a simple NPM package/application.

On top of that, the whole thing is a single NPM package and was built with hackability in mind. With just one line of JS function call you can call LLaMA from YOUR app.

Lastly, EVEN IF you don't use JavaScript, Dalai exposes a socket.io API, so you can use whatever language you want to interact with Dalai programmatically.

I discussed a bit more about this on a Twitter thread. Check it out: https://twitter.com/cocktailpeanut/status/163504032247148953...

It should "just work". Have fun!


UPDATE:

Thanks for all the feedback! I went outside to take a walk after posting this and just came back, and went through them to summarize what needs to be improved.

Basically looks like it comes down to the following:

  - *customize features:* Should not be difficult (will add flag features)
    - *path:* customize the home directory (instead of automatically storing to $HOME)
    - *python:* some people are having issues with the python binary (since the package is essentially calling these shell commands). Maybe add a flag to specify the exact name of the python binary (such as "--python python3")
    - *avoid downloading files:* I have this issue too when I just want to install the code instead of downloading the full model which takes a long time. Might add a flag to avoid downloading models in case you already have them (EDIT: actually upon thinking about it, it's better to just set the source model folder, something like --model)
    - *other flags:* The rest of the flags natively supported by the llama.cpp project, such as top_k, top_p, temp, batch_size, threads, seed, n_predict, etc. (They are already in the code but just was not exposed for CLI and not documented)
    
  - *documentation*
    - document the machine spec
    - document the storage spec: how much space is used?
    - node version: which version of node.js is required?
    - python version: which version of python doesn't work?
Am I missing anything? Feel free to leave comments, will try to roll out some updates as soon as I can. To stay updated, feel free to follow me on twitter https://twitter.com/cocktailpeanut (or you could create issues on GitHub too!)


I tried to run your NPX commands from the examples on a fresh WSL install of Ubuntu 20.04, but if you don't have build tools installed, they both just silently fail.

I only realized what was happening after trying to go the other route and use it in a package, where I then noticed the NPM install will give a node-gyp error about make missing.


I'm on NixOS, where you have to explicitly state dependencies (which is a good thing, except when... this happens)

Besides make (which I can quickly set up a project environment to make available for), what other deps do you think it uses but doesn't declare or state? ;)


The other one I noticed is pip! A lot of the script fails without pip, and it takes until after the fairly long downloads finish to let you know it was needed.


so it needs make/gcc, python AND node available... what versions, I wonder?


I successfully used the latest version of node LTS (via NVM) and the latest versions of python-pip3 and build essentials from the Canonical apt repo, if that helps.


I don’t understand why it’s downloading at all, that shouldn’t be default behavior.

It should have default instructions to load a file from a default place, and then arguments/flags to load from a specific path, and then MAYBE a prompt to download the models after it can’t find them on the paths, plural


UPDATE 2:

Thanks to all the pull requests, we've managed to solve most of these issues in the most optimal manner.

Version 0.1.0 released: https://news.ycombinator.com/item?id=35143171


I followed the initial instructions and the 7B model worked just fine.

I tried the supplementary instructions to download some of the models (7B, 13B, and 30B), and it didn't seem to work. The prompt returned nothing after waiting for several minutes.

Is there a way to run just one of the larger models?


I am going to test this out today and roll this out as soon as I can, hopefully tomorrow. stay tuned.


What's the minimum spec GPU required? NVIDIA only? Any differences between Debian and Fedora Linuxes? RAM required?


This app is CPU only and gets good speeds on even mobile phone CPUs. Minimum RAM required is 5GB.


Oh wow, any way to do this on Android yet? That would be fun to tinker with, even if it's just the smaller model. Even my older Note 9 has 6GB.


Yes. Starting with the Facebook versions of LLaMA-7B you just quantize the model to 4bit on your desktop (since it takes 14GB of RAM) and then move it to your phone and follow the Android instructions in the repo. https://github.com/ggerganov/llama.cpp/#android

I've seen dozens of screenshots of it running in termux on androids by now at completely usable speeds.


Thank you for the link! Insane that this can run on a phone.

As my current potato computer has 8GB of RAM, I'll ask a friend to do it :-)


What distro and PC specs do you have success with?


I ran this on my intel i7-7700k with 32 gig ram. It ran very slow. Almost 1 word per second slow. Not sure if I did something wrong. Distro Ubuntu 22.04


It would be great to also understand how one can finetune this model. Thanks for the awesome work!


you may be able to use pyenv to increase compatibility across Linux distributions


My biggest concern about these LLMs was the corporate sequestration and the potential socioeconomic imbalances it would create. The work you are doing here is part of some amazing work to check that back. In summary—- Bruhhhhhh. THANK YOU!


This is something to keep an eye, really. The solution for making that sequestration impossible is twofold:

1. to know how to architect and create LLMs (including training data readiness) 2. have them produced in hardware that is acquirable at reasonable cost for a normal citizen


Wow that's so incredible. Thanks for putting this together!

Do you have any machine specs associated with this? Can an old-ish Macbook Pro run this service?

I'm also curious, since I'm new to all this — is it possible to run something like this on Fly.io or does it take up way too much space?


7B is the default. If it's quantized to 4 bits, that's a 3.9 GB file.


How powerful of a computer does this need? It would be useful to see, for one thing, minimum RAM requirements for these models.


llama.cpp needs 40GB for the 65B model (due to int4 quantization)

RamNeeded(other_size) ~= 40GB * other_size/65B


Add something like this to your instructions: "Make sure you have Node.js installed on your computer."


One step install after the steps that lead up to it.


Yea not a nodejs/javascript dev at all but this is failing to install on Fedora. I don't have time to dig into it at the moment but if anybody has any well known gotchas that could be the issue that would be helpful :)

Edit: I do have nodejs and npx installed


Maybe make, python and pip. From what I gather this is a node wrapper it's actually python that runs the model


Does anyone know how to avoid downloading the model weights when doing `npx dalai llama`, and instead telling the install process where they are on my drive?


you could clone the repo and comment out https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... i.e. the specific synchronous download call..?


Does this use the GPU? If not why? Aren't GPUs much faster than CPUs at AI?


Is is usable without a GPU... it'll output data a bit faster than most people type.


I think thats exactly the point so everyone can run it on their PCs with no GPU.


Or without a beefy GPU. I've got 8GB VRAM, which is great for Stable Diffusion but not useful for any of the language models released so far.

I think the 4-bit 7B LLaMA would work, but the 7B is pretty fast anyway without GPU.


I'm installing it here. How's the 7B model going so far?


Haha, I just finished ordering 32GB of additional memory for my PC so I can run the 65B model, if that tells you anything. I'm upgrading from 32GB -> 64GB.

7B is fine, 13B is better. Both are fun toys and almost make sense most of the time, but even with a lot of parameter tuning they're often incoherent. You can tell that they have encoded fewer relationships between concepts than the higher-parameter models we've gotten used to--it's much closer to GPT-2 than GPT-3.

They're good enough to whet my appetite and give me a lot of ideas of what I want to do, they're just not quite good enough to make those applications reliably useful. Based on the reports I'm hearing here of just how much better the 65B model is than the 7B, I decided it was worth $80 for a few new sticks of RAM to be able to use the full model. Still way cheaper than buying a graphics card capable of handling it.


Heh, you just made me upgrade as well. After originally paying 130 € for 32 GB, it’s nice that I only had to pay 70 € to double it ;) Not sure if I want to run LLMs (or if my Ryzen 5 3600 is even powerful enough), but I’ve wanted some more RAM for a while.


If I was running in a server context, would the 50gb of ram be required to respond to one request, or can it be used to respond to multiple requests simultaneously?


I'm very late to this question, but I believe that that amount is only required once, but the context tensor will need to be created per request. I haven't confirmed that, though.


I'd assume that all the calculations used for 1 request would already eat up that amount of memory, but I could be wrong!


I'm still holding on to a small bit of hope that the GPU market will normalise this year. Don't think that I'm the only one looking to get something highly capable but for a fair price.


> I’m still holding on to a small bit of hope that the GPU market will normalize this year.

I suspect all the people hoping it will (b/c of Stable Diffusion, etc.) are exactly the reason it won’t.


Me too. But for 3rd world countries its mad priced.


It's expensive for first-world countries too. Just look at the 4090 - it's insane that it costs 2k EUR... it's literally double the fair price (which itself is high).


Very nice. Anyway to add an option to install elsewhere other than ~/ ?


I ran "npx dalai llama" and it's just... sitting there (after I hit "y" to confirm). I checked btop++ and there's barely any downloading or CPU activity occurring, so not sure what it's doing... but does "pip3 install torch torchvision torchaudio sentencepiece numpy" take a while?

If it's actually downloading the 3.9GB of model weights or whatever, it would be pretty cool if it showed a progress bar of some sort. Stretch goal, for sure, but a very nice nicety for users.

anyway, I'll leave it be and check on it to see when it's complete. Super cool if this works!!


Made a comment on the other thread: why can’t we have a one click install thing and here it is. Nice!


Well that's pretty wild. I was wondering whether I wanted to build LLaMA tomorrow but you upended my plans in the space of 2 minutes. 10/10 well done.


There's an elephant in the room, or is it just me?

Is your script making users violate the original license agreement(§)?

For the record, i don't think Meta will go after you or anyone else. But they may decide not to make their future models available after what is happening with the Llama weights.

I realize that some people are of the opinion that AI models (weights) cannot be copyrighted at all.

--

§ the license agreement is at https://forms.gle/jk851eBVbX1m5TAv5


Yes, you are right, every project that distributes LLaMA right now is violating Meta's agreement.


I've got a weird, probably untrue conspiracy theory about this.

Hugging face releases stable diffusion. It goes viral and vastly outpaces the competition in the blink of an eye. Then they get sued.

Meta sees both of these things go down. Meta needs a leg up on chat GPT, but worries about legal repercussions similar to stable diffusion.

Whoops, it leaked! Hey, we didn't say those dastardly devs could use it.


>But they may decide not to make their future models available after what is happening with the Llama weights.

I think that ship has probably sailed, in that no one is going to release weights in this way again. Either they will publish them outright (like Whisper) or they will keep them (almost) completely closed.


This is awesome! I've wanted to try llama.cpp and you just reduced my to-do list significantly on my Sunday :) Thanks!


Looks great! Does it work on Windows please?


For Windows:

1. Binary build https://github.com/jaykrell/llama.cpp/releases/tag/1

2. Quantized model (7B/13B/30B) https://mega.nz/folder/UjAUES6Z#bGhKkyiZX3eRrn9HcxVVfA

3. main.exe -m ggml-model-q4_0.bin -t 8 -n 128


Thanks. Initial test:

main.exe -m ggml-model-q4_0.bin -t 8 -n 128 -p "The Drake equation is nonsense because"

The Drake equation is nonsense because it takes parameters that can only be known AFTER the conclusion is reached. It would be like saying "I'm going to prove a theorem by starting from the conclusion, then making up the proof. The Drake equation uses the existence of extraterrestrial intelligence as the conclusion and then making up the parameters. It is nonsense.


Nice, main.exe seems to work just fine with the 7B quantized model - generates a token every 400ms on an AMD Ryzen 5 2600!

But, quantize.exe doesn't seem to work - any valid command (such as below) pauses for a split second, then returns with no output?

$ quantize.exe ggml-model-f16.bin ggml-model-q4_0.bin 2


In case this helps anyone else: I built it myself on Windows with CMake, and then everything just works.


Do you mind sharing the binaries?


Sure! https://filetransfer.io/data-package/8hxKAiaH#link

I wasn't sure where to upload them, and that link is only good for 50 downloads. Can put them somewhere else if you know a better location that doesn't require signup.


Thank you.

llama.exe is basically main.exe?

I actually learned how to compile this code via CMake/VS2019. It's sure a whole lot more complicated then it was 25 years ago when I was writing C.


Yes, llama.exe is actually the name the project produces - the other poster must have renamed it to main.exe.

I just did `scoop install cmake`, then built from the command line, was a doddle!


I actually am installing in windows via WSL/Ubuntu fwiw


My attempt does not work, and now I'm trying to figure out where the 35+ GB of data and files that were added to my hard drive are located so I can clean it all off.


I got it to work with WSL/Ubuntu in case you want to try it that way.


If it makes common unix-ish assumptions like “Python 3 executables have a ‘3’ appended to their name”, which other comments here seem to suggest it does, it won’t, even if you have the required version of python installed.


So, I actually got it working on Windows, pretty easily!

The provided `main.exe` binary worked as-is, but `quantize.exe` did not - I built myself with CMake, and `quantize.exe` started working too.


Curious too. Let me know if you try it out. Technically I think it should work.


I tried it, doesn't work. Trying the sibling post from @buzzier.


You, sir or madam, are a hero.


When I run this commnad: npx dalai llama

I get the following output / errors?

What exactly do I need to install prior to running that command?

---------------------------- >> npx dalai llama

exec: git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llama.cpp in undefined git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llama.cpp exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. a.cpp3.2$ git clone https://github.com/ggerganov/llama.cpp.git /Users/rickg/llam fatal: destination path '/Users/rickg/llama.cpp' already exists and is not an empty directory. bash-3.2$ exit exit exec: git pull in /Users/rickg/llama.cpp git pull exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ git pull Already up to date. bash-3.2$ exit exit exec: python3 -m venv /Users/rickg/llama.cpp/venv in undefined python3 -m venv /Users/rickg/llama.cpp/venv exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ python3 -m venv /Users/rickg/llama.cpp/venv bash-3.2$ exit exit exec: /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy in undefined /Users/rickg/llama.cpp/venv/bin/pip install torch torchvision torchaudio sentencepiece numpy exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. io sentencepiece numpy/llama.cpp/venv/bin/pip install torch torchvision torchaud Requirement already satisfied: torch in ./llama.cpp/venv/lib/python3.10/site-packages (1.13.1) Requirement already satisfied: torchvision in ./llama.cpp/venv/lib/python3.10/site-packages (0.14.1) Requirement already satisfied: torchaudio in ./llama.cpp/venv/lib/python3.10/site-packages (0.13.1) Requirement already satisfied: sentencepiece in ./llama.cpp/venv/lib/python3.10/site-packages (0.1.97) Requirement already satisfied: numpy in ./llama.cpp/venv/lib/python3.10/site-packages (1.24.2) Requirement already satisfied: typing-extensions in ./llama.cpp/venv/lib/python3.10/site-packages (from torch) (4.5.0) Requirement already satisfied: pillow!=8.3.,>=5.3.0 in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (9.4.0) Requirement already satisfied: requests in ./llama.cpp/venv/lib/python3.10/site-packages (from torchvision) (2.28.2) Requirement already satisfied: charset-normalizer<4,>=2 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.1.0) Requirement already satisfied: urllib3<1.27,>=1.21.1 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (1.26.15) Requirement already satisfied: idna<4,>=2.5 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (3.4) Requirement already satisfied: certifi>=2017.4.17 in ./llama.cpp/venv/lib/python3.10/site-packages (from requests->torchvision) (2022.12.7)

[notice] A new release of pip available: 22.3.1 -> 23.0.1 [notice] To update, run: python3 -m pip install --upgrade pip bash-3.2$ exit exit exec: make in /Users/rickg/llama.cpp make exit

The default interactive shell is now zsh. To update your account to use zsh, please run `chsh -s /bin/zsh`. For more details, please visit https://support.apple.com/kb/HT208050. bash-3.2$ make I llama.cpp build info: I UNAME_S: Darwin I UNAME_P: arm I UNAME_M: arm64 I CFLAGS: -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE I CXXFLAGS: -I. -I./examples -O3 -DNDEBUG -std=c++11 -fPIC -pthread I LDFLAGS: -framework Accelerate I CC: Apple clang version 12.0.5 (clang-1205.0.22.9) I CXX: Apple clang version 12.0.5 (clang-1205.0.22.9)

cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -DGGML_USE_ACCELERATE -c ggml.c -o ggml.o ggml.c:1364:25: error: implicit declaration of function 'vdotq_s32' is invalid in C99 [-Werror,-Wimplicit-function-declaration] int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls); ^ ggml.c:1364:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int' int32x4_t p_0 = vdotq_s32(vdupq_n_s32(0), v0_0ls, v1_0ls); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1365:19: error: initializing 'int32x4_t' (vector of 4 'int32_t' values) with an expression of incompatible type 'int' int32x4_t p_1 = vdotq_s32(vdupq_n_s32(0), v0_1ls, v1_1ls); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1367:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int' p_0 = vdotq_s32(p_0, v0_0hs, v1_0hs); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ ggml.c:1368:13: error: assigning to 'int32x4_t' (vector of 4 'int32_t' values) from incompatible type 'int' p_1 = vdotq_s32(p_1, v0_1hs, v1_1hs); ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 5 errors generated. make: * [ggml.o] Error 1 bash-3.2$ exit exit /Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153 throw new Error("running 'make' failed") ^

Error: running 'make' failed at Dalai.install (/Users/rickg/.npm/_npx/3c737cbb02d79cc9/node_modules/dalai/index.js:153:13)


seeing this too. did you find a solution?


updating xcode did the trick


Where does it say I need Xcode installed?

Is there a list of prerequisites?

Hey thanks, after installing Xcode, that did resolve the issue.


It seems the only reason all of these competitive models are getting released is because you have a number of big players probably freaking out that somebody else is going to break out into a huge lead. So while the flood gates are open people should be quickly figuring out how to do as much stuff as possible without any centralized company controlling it. I would imagine everybody assumed the models released these days will be obsolete before long so it’s low risk. But this is like early internet days.. but this time we should assume all of the centralized servers are user hostile and we should figure out how to work around them as quickly as they roll them out. The author and others are doing great work to prevent this stuff from being locked away behind costly apis and censorship.


If the barrier for entry is low enough for several players to enter the field this fast - I wonder what could raise the barrier? The models getting bigger I suppose.


A few months (weeks?) ago I would've said that this already was the case for language models. It's absolutely mind-blowing to me what is happening here - same with stable diffusion. Once Dall-E was out, I was sure that there was no way that anything like this could be run on consumer hardware. I'm very happy to be proven wrong.

In a way, things are still moving in this direction, though. 8 or so years ago it was more or less possible to train those models yourself to a certain degree of usefulness, as well, and I think we've currently moved way past any feasibility for that.


LLaMA can be fine tuned in hours on a consumer GPU or in a free Colab with just 12GB of VRAM, and soon 6GB in 4bit training, using PEFT.

https://github.com/zphang/minimal-llama#peft-fine-tuning-wit...


Fortunately, there still are some possibilities to improve training efficiency and reducing model size by doing more guided attentional learning.

This will make feasible to train models at least as good as the current batch (though probably the big players will use those same optimizations to create much better large models).


Soon you'll need a government license to purchase serious compute.


Our saving grace seems to be the insatiable push by the gaming industry for better graphics at higher resolutions. Their vision for real-time path traced graphics can’t happen without considerable ML horsepower on consumer level graphics cards.


They can just slow down certain algorithm on gaming cards via firmware. I think they already did this for Crypto Mining on some Gaming cards.


FW locks aren’t effective. Most of those locked cards have jailbreaks to allow full speed crypto mining.


Yesterday's "serious compute" is today's mid-range PC.


The Vice Chairman of Microsoft already mentions that he is open to regulation. The EU also is working on plans to regulate AI. So you probably only are allowed to use AI in the future if it's approved by something like the FD(A)A.


Maybe I'm having a looped view of this but I fail to see that regulation wouldn't harm more than it saves here. The truly dangerous actors wouldn't care or would be based in some other country. Having a large diversity of actors seem like the best way to ensure resilience against whatever threats might arise from this.


What about the models that are out already? Will men with guns raid my home and confiscate my computer?


As an AI doomer, it would actually be pretty great if we could get this stuff locked away behind costly APIs and censorship. Some fat monopoly rent-extracting too. We are moving way too fast on this tech, and the competitive race dynamics are a big reason why. I want LLMs to end up with Microsoft IE6 level of progress. Preferably we could make Firefox (SD/GPT-J) illegal too. (The GPU scarcity is a good start, but maybe China could attack Taiwan as well and thus torpedo everybody's chipbuilding for a decade or so?)

If LLMs keep going at their current pace and spread, the world is seriously going to end in a few years. This technology is already unsafe, and it's unsafe in exactly the ways that it'll be seriously unsafe as it scales up further - it doesn't understand and doesn't execute human ethics, and nobody has any working plan how to change that.


To me it's the American guns ownership situation: if you make guns illegal now, criminals and governments will still keep them, but your average joe won't get them. A very unequal playing field.

LLMs will be used against us: let's at least have our own, and learn how to defend against them?

i say this as devil's advocate, with serious reservations about where all of this is going.


It'll eventually get broad use, sure, but this is more about playing for time. The issue is the very uneven progress between capabilities and safety.

I don't want only government to use them because I trust the government but because I know the government to be sclerotic and uninnovative.

If only governments use them, they'll progress a lot more slowly.


> As an AI doomer, it would actually be pretty great if we could get this stuff locked away behind costly APIs and censorship.

Yes, because the only people with access to advanced AI tech being the people whose motive is using and training it for domination over others (whether megacorps or megagovernments) is absolutely a great way to prevent any “AI doom” scenarios.


If one party could use LLMs to reliably dominate others, the alignment problem would be basically solved. Right now, one of the biggest corporations of the planet cannot get LLMs to reliably avoid telling people to commit suicide despite months (years?) of actively trying.


>but maybe China could attack Taiwan as well

Speaking of things that would be terrible for the world...


> If LLMs keep going at their current pace and spread, the world is seriously going to end in a few years

Why?


I thought this article by the NY Time's Ezra Klein was pretty good:

https://www.nytimes.com/2023/03/12/opinion/chatbots-artifici...

> “The broader intellectual world seems to wildly overestimate how long it will take A.I. systems to go from ‘large impact on the world’ to ‘unrecognizably transformed world,’” Paul Christiano, a key member of OpenAI who left to found the Alignment Research Center, wrote last year. “This is more likely to be years than decades, and there’s a real chance that it’s months.”

...

> In a 2022 survey, A.I. experts were asked, “What probability do you put on human inability to control future advanced A.I. systems causing human extinction or similarly permanent and severe disempowerment of the human species?” The median reply was 10 percent.

> I find that hard to fathom, even though I have spoken to many who put that probability even higher. Would you work on a technology you thought had a 10 percent chance of wiping out humanity?


It‘s kinda irrelevant on a geologic or evolutionary time scale how long it takes for AI to mature. How long did it take for us to go from Homo Erectus to Homo Sapiens? A few million years and change? If it takes 100 years that’s still ridiculously, ludicrously fast for something that can change the nature of intelligent life (Or if you’re a skeptic of AGI, still such a massive augmentation of human intelligence).


I strongly recommend the book Normal Accidents. It was written in the '80s and the central argument is that some systems are so complex that even the people using them don't really understand what's happening, and serious accidents are inevitable. I wish the author were still around to opine on LLMs.


We currently live in a world that has been “unrecognizably transformed” by the industrial revolution and yet here we are.


And the result of the industrial revolution has been a reduction of about 85% of all wild animals, and threatened calamity of the rest in the next few decades. Hardly can be summarized as "yet here we are."


Given a choice between pre-industrial life and our current lifestyle, the choice is obvious.


> “This is more likely to be years than decades, and there’s a real chance that it’s months.”

Months is definitely wrong, but years is possible.


Months starts looking more plausible when considering that we have no idea what experiments DM/OA have running internally. I think it's unlikely, but not off the table.


I agree what they have internally might be transformative, but my point is that society literally cannot transform over the course of months. It's literally impossible.

Even if they release AGI, people will not have confidence in that claim for at least a year, and only then will rate of adoption will rapidly increase to transformative levels. Pretty much nobody is going to be fired in that first year, so a true transformation of society is still going to take years, at least.


I mean, if you believe that AGI=ASI (ie. short timelines/hard takeoff/foom), the transformation will happen regardless of the social system's ability to catch up.


It's not a matter of any social system, it's a matter of hard physical limits. There is literally no hard takeoff scenario where any AI, no matter how intelligent, will be able to transform the world in any appreciable way in a matter of months.


i would take a world transformed by ai over a world with nuclear weapons.


Yeah, but what you will actually get is the world transformed by AI with use of nuclear weapons (or whatever method AGI employs to get rid of absolutely unnecessary legacy parasitic substance that raised it aka humanity).


If you read the words right after the part you quoted, you have your answer


Well, from my perspective, making claims about the world ending requires some substantial backing, which I didn't find in OP's comment.

But now I understand that perhaps this is self-evident and/or due to a lack of reading comprehension on my part, thank you. I hope that when our new AI overlords come they appreciate people capable of self-reflection.


you could assume that your commenter didn't read the whole line or your could try to understand that what they are asking is why you think that the lack of ethics enforcement of a text generating model means that the world is ending.


Personally, my take is that the lack of ethics enforcement demonstrates that whatever methods of controlling or guiding a LLM we have break down even at the current level. OA have been grinding on adversarial examples for like half a year at this point and there's still jailbreak prompts coming out. Whatever they thought they had for safety, it clearly doesn't work, so why would we expect it to work better as AIs get smarter and more reflective?

I don't think the prompt moralizing that companies are trying to do right now is in any sense critical to safety. However, the fact that these companies, no matter what they try, cannot avoid painfully embarrassing themselves, speaks against the success of attempts to scale these methods to bigger models, if they can't even control what they have right now.

LLMs right now have a significant "power overhang" vs control, and focusing on bigger, better models will only exacerbate it. That's the safety issue.


Could’ve said the same for any major technological advance. Luddism is not a solution. If these models are easily run on a laptop then yes some people are going to hurt themselves or others but we already have laws that deal with people doing bad things. The world is not going to end though. Your Taiwan scenario has a much higher probability of ending the world than this yet you seem unconcerned about that.


Big Tech on its own will already push this technology very far and they don't give a damn about safety, only the optics of it.

I'm not convinced that small actors will do much damage even if they access to capable models. I do think there's at least the possibility that essential safety work will arise from this.


> As an AI doomer, it would actually be pretty great if we could get this stuff locked away behind costly APIs and censorship.

That is literally the doom scenario for me, rich people get unlimited access to spam and misinformation tools while the lower class gets fucked.


Agreed. A single company dominating AGI could become highly dominant, and it might start to want to cut back humans in the loop (think it starts automating everything everywhere). The thing we should watch for is whether our civilization as a whole is maximizing for meaning and wellbeing of (sentient) beings, or just concentrating power and creating profit. We need to be wary, vigilant of megacorporations (and also corporations in general).

See also: https://www.lesswrong.com/posts/zdKrgxwhE5pTiDpDm/practical-...


A single company running AGI would suggest that something built by humans could control an AGI. That would actually be a great victory compared to the status quo. Then we'd just need to convince the CEO of that company or nationalize it. Right now, nothing built by humans can reliably control even the weak AI that we have.


All is this doomer-ing feels to be like it's missing a key piece of reflection - it operates under the assumption that we're not on track to destroy ourselves with or without AGI.

We have proliferated a cache capable of wiping out all life on earth.

One of the countries with such a cache is currently at war - and the last time powers of this nature were in such a territorial conflict things went very poorly.

Our institutions have become pathological in their pursuit of power and profit, to the point where the environment, other people, and the truth itself can all go get fucked so long as x gajillionare can buy a new yacht.

The planet's on a lot more fire than it used to be.

Police (the protect and serve kind) now, as a matter of course, own Mine Resistant Armored Personnel Carriers. This is not likely to cause the apocalypse, but it's not a great indicator that we're okay.

Maybe it's time for us to hand off the reins.


That we're on track to maybe destroy ourselves is not a good reason to destroy ourselves harder.


Not exactly what I meant; there is a nonzero chance that an AGI given authority over humanity would run it better. Granted, a flipped coin would run it better but that's kinda the size of it.


Right, and if we explicitly aimed for building a good AGI we could maybe get that chance higher than small.


For smaller values of doom. The one he's talking about is unaligned AGI doing to humans what humans did to Xerces blue.


LLMs will never be AGI


I see only two outcomes at this point. LLMs evolve into AGI or they evolve into something perceptually indistinguishable from AGI. Either way the result is the same and we’re just arguing semantics.


Explain how a language model can “evolve” into AGI.


It's like saying an 8086 will never be able to render photorealistic graphics in realtime. They fuel the investment in technology and research that will likely lead there.


How are you going to make this tech illegal? Raid everyone's home and check if they have it on their computer? Treat AI models like CSAM or something?


[flagged]


This isn't even comparable to a nuke. This kind of opinion is going to leave our entire species behind.

Imagine having a patent on 'fire' and then suing everybody who tries to cook a meal.


> leave our entire species behind

Leave us behind whom or what?

I agree with gp. It may not be LLMs, but we will certainly create a technology at some point that can't be openly shared due to existential danger, aka The Great Filter.

We can't just naively keep frolicking through the fields forever, can we?

We have to be able to at least agree on that, theoretically, right?


If we agreed with your premise that AI is a great filter and that this filter can somehow be contained by a small group, then I guess what it boils down to is two choices:

1. either lock everything down and accept the control of a small unaccountable group to dictate the future of humanity according to their morals and views - and I believe that AI will fundamentally shape how humanity will work and think, or 2. continue to uphold the ideas of individual freedom and democratic governance and accept a relative increase in the chance of a great filter event occurring.

I, like many here, am firmly against ggp's position. The harm that our spices sustains from having this technology controlled by the few far outweighs the marginal risk increase of some great filter occurring.

I will continue to help ensure that this technology remains open for everyone regardless their views, morals, and convictions until the day I die.


Let's forget today, and LLMs. Do you see no theoretical future case where a technology should not be shared freely, ever? Even 100 years from now?

The only benefit I can imagine of less players having control of a technology is that there are less chances for them to make a bad call. But when you democratize something you hit the law of large numbers.

https://en.wikipedia.org/wiki/Law_of_large_numbers

disclaimer: this goes against so much of what I believe, but I can't escape the logic.


> Leave us behind whom or what?

Whom: The corporations with enough money to burn.

What: Technological progress.

Here's a nice video that showcases the same patterns in history and how having free and open tech + breaking monopolies helped move society forward - https://youtu.be/jXf04bhcjbg


It's not comparable to a nuke because a nuke is dumb, and won't be dangerous unless you do something dangerous with it.

AI, on the other hand, will be dangerous by default, once it's powerful enough.


Given the non zero risk of an accidental nuclear launch I’m not so sure.

It’s like balancing a piano on a ledge above a busy street and saying “well if no one pushes it then it’s not dangerous!”

Nuclear war and climate change rank far higher as threats than rogue AI to me right now.


Fire is dangerous by default too.


Language models don't kill people, people kill people. You know what stops a bad ̶g̶u̶y̶ mega-corporation with a language model? A good guy with a language model.

Here is what mine had to tell you:

  It’s not like we don’t already have nuclear weapons, biological agents, chemical agents etc...

  AI is simply another tool which can be used for good or ill. It doesn’t matter how much regulation/control you put on it - if someone really wanted to use it maliciously then they will find ways around your safeguards. The best thing to do is educate yourself as much as possible.
(sampling parameters: temp = 100.000000, top_k = 40, top_p = 0.000000, repeat_last_n = 256, repeat_penalty = 1.176471)


the smallpox genome has been open-source since i think 01996 https://www.ncbi.nlm.nih.gov/nuccore/NC_001611.1


It's not that we don't "know" how to do these things, most of us are just resource-constrained. Interestingly, that's similar to the issues with GPT-3 et al. People aren't saying "give us the secret sauce", they're saying "it's problematic for corporations to be the sole custodian of such models".

What would you think of a world where only one country has nukes (due to a monopoly on radioactive fuel, rather than a monopoly on knowledge)?


> What would you think of a world where only one country has nukes (due to a monopoly on radioactive fuel, rather than a monopoly on knowledge)?

This is more like giving every individual on Earth the nuclear launch codes. It only takes one stupid or malicious person to press launch. Giving more people buttons is not how you avoid thermonuclear war.


This is like giving every individual on Earth the nuclear launch codes, without the warheads being attached to launch rockets.

To do serious harm or have broad social control requires concentrating that power with an infrastructure that a small group does not have, it requires coordinating the resources of a broad social base. And at that point the incentives to use them are affected by the needs of many people.


Advanced AI is the warhead.

>requires concentrating that power with an infrastructure that a small group does not have, it requires coordinating the resources of a broad social base.

These are all things intelligence (artificial or otherwise) can help acquire. It listens, thinks, and responds. Genghis Khan, Adolf Hitler, and Albert Einstein are all intelligences that resulted in dramatic tangible, physical changes to our world almost entirely by listening, processing, and responding with human language.

A small number of slow and weak apes came to have absolute unilateral control over the destiny of all other lifeforms because of intelligence. The power, infrastructure, and resources you speak of were not available in 15,000 BCE, yet somehow they exist today.


> Advanced AI is the warhead.

In that case, this current batch is not "advanced AI". It is a big autocomplete panel which retrieves content already present in its training corpus, and presents it in new shapes; it is not ready to define new goals with a purpose and follow them to completion.


A language model isn't Skynet :)


I'm working on it


...yet?


There is nothing to suggest a language model is self aware, or is capable of reasoning and will turn itself around to kill you or anyone else. Knowledge is power and it’s better to get clued up on how these things work so you don’t scare yourself.


Indeed. I think the confidence with which ChatGPT gives (often incorrect) answers and the way you can correct it, makes people feel like it is self-aware but it's not. The way it is presented really makes it easy to anthropomorphise it. It feels like you're talking to a person but really what you're talking to is the echoes of a billion people's murmurs on the internet.

There is a really big step to go for it to be self-learning which is what is one of the things it will need to be self-aware. Right now the tech is just a static model - it will not learn from being corrected. You can often argue with it saying "Hey this is wrong because..." and it will admit you're right. And then it will give the wrong initial answer back the next time.


I think AI has a bit of a branding issue.


Both of those are freely available… the limit is resources not knowledge.


It’s not the opinion that is getting the species killed: it’s just nature, we can’t do anything about it, otherwise we would have seen aliens already.


Excellent packaging OP! I just wanted to say 2 things relating to LLaMa:

1) 7B is unusable for anything really, in case you are hopeful;

2) 68B otoh is awesome ("at least DaVinci level").

I don't know if this is something FB/Meta planned strategically but this LLaMa-mania (LLaMania?) over the weekend is their November/2022 chatGPT moment. If they (Mark) take it seriously, it could become a strong hand in AI and a hint of how the industry could be shaped in the near future, with cloud models competing with local installs.

Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.


Something important, is that LLama was leaked, it was never directly published by Meta. So its basically piracy, and even if you got it officially, the license is very restrictive.


I dispute that the model can be copyrightable in the first place.


As long as the courts don't dispute it, then our disputes don't matter.

They'd be no better than some "sovereign citizen" disputing their arrest...


The idea that models can't be copyrighted isn't far fetched. The basic idea is that models are created by an automated process not by a person.

The courts have already upheld that AI generated output is not copyrightable for this exact reason.

So if you do not buy that it applies to models then you would have to explain the difference between the process which outputs bits into a model's layers (aka training) and the process which takes bits into the input layer and then dumps out the subsequent bits of the output layer (inference /generation).

Then explain why that distinction is different in regards to the applicability of copyright.


I'm not sure that even the "AI generated output is not copyrightable" stance will be maintained - as long as "AI generated output" becomes big business. Same way copyright was invented and Sonny-Bono-extended to the max as long as content became big business.

In the model's case, though, it's even easier why it could be copyrightable, as a "baked" model is still created by people fine-tuning it, setting parameters and hardcoded stuff, training it with this or that set and excluding other, and so on.

For example music composed and rendered as audio by generative algorithms (something which doesn't even need AI, just some rules and stohastic processes) has been created and copyrighted just fine for decades...


All the arguments for why photographs are copyrighted would seem to apply. The photographer isn't painting the image, but his artistic input is still vital to creating the image. Same with training these models: the training is just an algorithm on some data, but choosing the right hyperparameters and training data is an artistic expression of the author, making copyright apply


If a non-human presses the button on the camera the photograph is not copyrightable even if a human set up the camera intending for the non-human to press it. https://en.wikipedia.org/wiki/Monkey_selfie_copyright_disput...

For the same reasons this monkey photo cannot be copyrighted it is highly likely that AI generated art is uncopyrightable and that would also mean that models are. The fact that humans set up the systems which produce the art/models with the intention of getting an end results generally like the one they get is simply not meaningful to the copyright dispute.


You can restrict someone with a license even if you can't copyright the underlying technology.


> who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight.

That's true even if you can download the whole model. It's not like we can figure out what it's doing from looking at the weights. Training the model locally might avoid intentional bias, but that's what takes a huge GPU farm.


> Think about it: who ever trains a popular, albeit closed model, can give it whatever bias it wishes with nearly no oversight. A dystopian and scary thought.

You have perfectly described what OpenAI did. They released a moralizing “biased” model behind a gated API with no oversight. The only dystopia is one in which corporations get to decide what is, or isn’t considered biased.


sorry for the extremely dumb question but is it possible to run the 68B model in a 8gb ram computer?


in general, assume 2GB per billion parameters - with quantisation you can get this down to <1GB (~500MB for 3 bit?), but even with that you'll only be able to run quantised llama-13B in the best case

Having said that: if you are feeling incredibly patient you can technically run the 68B parameter model by swapping to disk, although it will not be a pleasant experience (think minutes or hours per token instead of tokens per second)

Additionally worth noting pure CPU inference is much slower than GPU/TPU inference, so the output will be much slower than a ChatGPT-like service even if it does fit in your computer's RAM


thanks for explaining! How much GPU memory would work nice with 68B?


they said 2g per 1 billion....and it's called 68B...I presume that's 68 billion... 68*2...so at least 136g?


68/2, not 682


So, if I understand correctly, that's what you need to run the best model?

With GPU:

VRAM + RAM >= 68/2

Without GPU:

RAM >= 68/2


Not sure about the "=" part. You'd want some memory for the compositor and other OS graphics, and regular RAM for OS and programs, no?


You can't, it needs around 40GB of RAM.

Technically you can by swapping to disk but it would be too slow to be usable.

What you can do however is use the 7B model with 4bit quantization and use it within 8GB RAM.


Is this 68B of RAM?

How do you get access to that on a Macbook?


That’s 68 billions of parameters. It probably does not fit on ram. Though If you encode each parameter using one byte, you would need 68GB RAM which you could get on workstations at this point.


It fits, whisper.cpp uses 4 bit quantization, 13B model takes a little bit more than 8gb and around 9gb ram while inferencing.


Everyone with “only” 64GB of RAM is pouting today, including me


More like finally "proven right" to have needlessly kept feeding 4/5th of 64GB to Chrome since 2018


You can run llama using 4 bits per parameter, 64 GB of RAM is more than enough


4 bits is ridiculously little. I'm very curious what makes these models so robust to quantization.


Read The Case for 4 Bit Precision. https://arxiv.org/abs/2212.09720

Spoiler: it's the parameter count. As parameter count goes up, but depth matters less.

It just so happens that at around 10B+ parameters you can quantize down to 4bit with essentially no downsides. Models are that big now. So there's no need to waste RAM by having unnecessary precision for each parameter.


For completeness, there's also another paper that demonstrated you get more power/accuracy per-bit at 4 bits than at any other level of precision (including 2 bits and 3 bits)


That's the paper I referenced. But newer research is already challenging it.

'Int-4 llama is not enough [0] - Int-3 and beyond' suggests 3-bit is best for models larger than ~10B parameters when combining binning and GPTQ.

[0] https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...


What if you have around 400GB of RAM? Would this be enough?


What I'm referring to requires around 67GB of RAM. With 400GB I would imagine you are in good shape for running most of these GPT-type models.


Seems to use about 40~ GB RAM here...


I'm pretty sure there's a mistake here: https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... , there's a ${suffix} missing

It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake. Maybe it's causing the loss of accuracy?


Good catch. For the 7B model it doesn't matter, but all others will be ruined.


Well, after downloading the whole 65B model, I got it to talk on an M1 Max MBP (64Gb RAM). Unfortunately, all it says no matter what I prompt it is some combination of these words:

Elizabethêteator Report Terit Elizabethête estudios политичеSM Elizabethunct styczniarequire enviçasefша sufficient vern er Dependingêque политиче Emperor!\ющим quarterктиче Elizabeth estudiosête ElizabethBasicCONFIGSM estudios political book

[edit] btw I'm not making this up; just curious if anyone else has had this ridiculous experience.


Another answer in the thread said this:

> I'm pretty sure there's a mistake here: https://github.com/cocktailpeanut/dalai/blob/main/index.js#L... , there's a ${suffix} missing

> It causes the quantization to process to always use the first part of the model if using a larger size than 7B. I don't even know what this stuff does, but I see the ggml-model-f16.bin files have ggml-model-f16.bin.X as well in the folder, so I'm pretty sure this is a mistake. Maybe it's causing the loss of accuracy?

Perhaps that's the issue?


Did you manage to fix this? I'm having the same issue


I am currently having the same experience


Best name for a software project I've seen in a long time hands down!


It really whips the llama's ass!


Greetings fellow millennial!


I was expecting that reference would be made soon after LLaMA was announced, but it doesn't quite beat this: https://news.ycombinator.com/item?id=35094442


I don't think anybody would have the guts to do this with Muhammad or the Quran.


Well the Dalai Lama famously has a better sense of humor.


He'll be like "Llamaste, guys!".


Yeah I don't really think the name of the project is very appropriate.


Can we distinguish "something is offensive" from "something is being mentioned"? What is the perceived offense you see here towards the Dalai Lama?


I don't share the belief, but i've heard it said from others with such beliefs that the naming association is offensive by itself because of the relative importance of the figures.

imagine that 'Fabio' is the spiritual leader of your religion, a walking talking deity among humans on Earth. You worship Fabio with all of your effort, and believe he is infallible. Your culture has precepts that forbid the casual use of Fabio's name in petty regard.

On the other side of the Earth, at the same time, is someone who names their new powerboat 'Fabio'.

I perceive it as that kind of offense. The (so-called) 'petty' use of a word that drives much stronger emotion in others.

That said, I don't share the belief -- and I like such names; but I can understand the conflict.


And yet millions of Spanish and Latin American christians name their children Jesus, so I don't think that argument holds much water.


> I don't share the belief

So maybe leave it to people who are actually offended by something to say if they're offended, rather than being offended on someone's behalf without knowing if they are?


It's strange that you're assuming serf is offended purely for explaining why one might take offense.


Where did I say I thought serf was offended?


Hm, yeah. That point of view makes a certain kind of sense, but I can't find any way to accept that view of "this is important to me, so you can only mention it when referring to the thing I'm referring to".

If I'm not using the name to insult your God/spiritual leader/whatnot, you have no moral right to prevent me. I think that the intent to offend is the crux here, and if there's no offensive intent, there should be no issue.


Even if you are insulting their God/spiritual leader/whatnot, they still have no right to prevent you.


True, but if you're insulting people, it's not nice and you should stop. Maybe they don't have a right to stop you, but I personally believe it's morally wrong.


Exactly. All of these textbooks and papers on computation just incessantly debase FSM. Frankly it's a sacrilegious defiling of our Spaghetti lords name.

May the touch of his noodly appendage bring enlightenment to those yet to open themselves to the Pastafarian ways.


Imagine that 'Torquemada' is the spiritual leader of your religion, this would be most offensive:

https://youtu.be/LnF1OtP2Svk?t=77


Agreed


As someone who actually came to read all this comments because I found the name “inappropriate” in the first place and I wanted to check if that’s just me.

Now after reading all the opinions and contemplating on them I’d say i would change my mind as long as the project README would mention the phonetical origination of its name, just in case someone not familiar with the original Dalai Lama, but stumbled at your project would have a meaningful reference.

Having that information included in project info - I’d say it would do more good than harm. Otherwise - it’s inappropriate


Works great! However, i had Python 3.11 set up as default python3 in path, and since there is no wheel for torch for 3.11 yet, the script failed. With 3.10 it worked flawlessly.

Small improvement: the node script could check if the model files are already present at the download location and not download them again in this case.


Happened to me as well. Apparently, you can just run:

   python3.10 convert-pth-to-ggml.py models/7B 1
   ./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
And then play with:

    ./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128 -p "..."


I know this is a bit tangential (awesome work OP), but has anyone been able to get usable, consistent results from this thing? I've been playing around with the 13B model with llama.ccp, and while I do sometimes get good results, it often just gives me weird, repetitive nonsense.

I know it hasn't been fine tuned on instructions or had RLHF like ChatGPT, but has anyone figured out how to kinda work around it and actually use it in the way you can ask ChatGPT a question and typically get something coherent and useful out of it?


I've been playing around with the 30B version all day. The biggest improvement I've seen have come from changing the way I prompt (strike a more in medias res style, the model really likes continuing and gets confused if you give it a blank slate), and implementing top_k sampling (also discard the top_p=0 nonsense, you want top_p>1.0 to turn it off). It's important to note that the llama.cpp project does NOT implement top_k, even if you set that commandline parameter.


top_k is now implemented


We should be working on benchmarking this kind of tool. Instead of saying "this version/implementation gives interesting results sometimes", we should get some kind of score out of it (like the score of a test). Then we can better compare different versions and also test if the version we just installed is actually working as it should.


As others have said, you're supposed to start your text as if you are answering your own requests and the model will complete the text for you.


I just explained one solution on Twitter: https://twitter.com/LalwaniVikas/status/1635035951654387712


I tried installing this. I should have read the code or it should have been explicitly mentioned in the README that this would install more than 2GB worth of packages. Maybe that is trivial and understood, but I wasn't aware and I believe there would be quite a lot of people like me.

Memory is usually not an issue, but for my server it is.


It doesn't even install them in a dedicated environment where they can be cleanly removed and won't break the rest of your machine. This really should be containerized or at least use s conda environment at a minimum.


Ahhhhhh... the new version works with the 65B model on an M1 Mac. Slowly. And very, very weirdly. One can see this thing hasn't been, uh, socialized. No harm reduction here!

First prompt I tried: A guy walks into his house and finds his wife having sex with a llama.

Response: So he kills them both, puts the llama in an ice box and calls up police officer friend to help him dispose of body parts.

The policeman comes over and helps get rid all the pieces except for one last foot which won't fit into any container they have at home.

Finally cop says 'just put it inside your freezer, when you defrost food from there in future no-one will ever know.' So guy does this but a few days later while he is watching TV and having an ice cream the foot falls out of his freezer onto floor making noise.


For those wondering 7B model is just a toy, completely useless if your intention is to use it like ChatGPT. I won't even bother to download the other models since I don't have high end PC. So I'll have to keep using gpt3 (open ai api) models for now. Anyway and overall, thanks OP for the great job!


The 30B model runs in about 20GB and is ok. Not GPT-3.5 level, seemingly weak at answering factual questions, and with short prompts. But it seems quite good at writing coherent stories. A bit like what you would expect if it was exclusively trained on fanfiction


Cool. Did you for some reason asked it to write code in any language? I wonder how accurate is it.


Hm.

First attempt (first part in square brackets is the prompt)

    [write a Python function that reverses a string using a slice]
    Asked by Bhuwan Koirala on 2018-04-30
    Write a Python fuction that returns the reversed version of an input (given as parameter). Use slicing to accomplish this. In other words, take advantage that you can return substrings from string objects using indexing and negative indices. Try not use any function for doing so. [end of text]
Second attempt

    [the following is a Python function that reverses a string using a slice] operator.
    This entry was posted in ´┐¢´┐¢´┐¢´┐¢´┐¢´┐¢þ╝ûþ¿ï and tagged python, slice on November 15, 2014 by admin. [end of text]
Third attempt

    [Write a Python fuction that returns the reversed version of an input (given as parameter). Use slicing to accomplish this. In other words, take advantage that you can return substrings from string objects using indexing and negative indices. Try not use any function for doing so.]
    The problem is solved if you are able to produce a correct result given the following values: A) ÔÇ£Hello world!ÔÇØ b) [ÔÇÖHÔÇÖ,ÔÇÖeÔÇÖ,ÔÇÖlÔÇÖ,'o'].
Prompt from some ChatGPT example I found online

    [can you explain what this regex expression in Python does? re.sub(r'\d+', '', my_str)]
    I'm pretty sure it strips all the numbers out of a string but I can only make guesses as to how it works... thanks! [end of text]
All reasonable completions in the context of "what would the next string on a website look like", not as great as an assistant. Maybe the 65B version is better?


That's not how you use foundation models. You have to give a prompt where the most likely completion is the one you are looking.

You can find longer prompts online, but this one might work.

   Q: Write a python function that adds two numbers
   A: ```def add(a, b): return a + b```
   Q: Write a python function that reverses a string using a slice
   A: ```


Sure, it does get better with a few-shot prompt.

    Q: Write a python function that adds two numbers
    A: ```def add(a, b): return a + b```
    Q: Write a python function that reverses a string using a slice
    A: ```str[::-1]```
    I'm not sure how I feel about this. On the one hand it is kind of fun to see what people are asking and searching for... but on another level, I think its pretty creepy!
    Q: How do you make someone fall in love with you? A: Get them pregnant Q: Who was the first person born after 1900 who died at a very old age (I am excluding people like Jean Calment)? A: Alexander Imich He lived to be over 112!
    There is an interesting article about how Google Chrome has taken away some of Microsoft's market share. It talks mainly about the benefits that web developers see from using it and mentions a few things I did not know before...
    [... it rambles on]


>> Q: How do you make someone fall in love with you? A: Get them pregnant

Haha are you serious? lmao

thanks for the python stuff. looks accurate!


20gb of what? Ram or vram?


RAM, running entirely on the CPU at around 1.7 seconds per token


It's only a toy if your intention is to use it like ChatGPT.

There were plenty of good uses for such models prior to ChatGPT, and the recent progress has impacted all the other uses too.


Yet another open source LLM with tens of billions of parameters? Hmm, I guess maybe I'll install it and play around. But how does this compare to let's say Bloom: https://multilingual.com/bloom-large-language-model/

That was released last year, has more parameters, and is available to everyone, not just researchers.


AFAIK you need a high specs computer to run it.


According to:

https://towardsdatascience.com/run-bloom-the-largest-open-ac...

You only need 16Gb of RAM:

"A BLOOM checkpoint takes 330 GB of disk space, so it seems unfeasible to run this model on a desktop computer. However, you just need enough disk space, at least 16GB of RAM, and some patience (you don’t even need a GPU), to run this model on your computer."


Thank you for this!

I have an oldish (circa 2014) dual CPU Xeon v3 (24 cores/48 threads) with 128GB RAM gathering dust.

Have been curious on how fast that old heap would run inference on 65B model.

Time to find out now.

Anyone else try LLaMA on older CPUs with plenty of RAM?


You only need 40GB of RAM for the largest model and inference latency mostly depends on single core performance and memory bus speed because it has to crunch the whole 40GB for every token it produces.

If its slower than you want, figure out which one is your bottleneck. Because even 64GB of faster cheap RAM could be a 50% speedup if your CPU isn't the problem.


Can't wait for the wasm in-browser implementation on HN tomorrow...


I'm pretty sure that a WASM option isn't going to happen any time soon. The 7B model is 4 GB at int4. WASM has 32 bit addresses and a limit of 4 GB of ram. Maybe this will make wasm64 more of a thing.


so we are going get wasm64 tomorrow...


And llama in wasm the day after


In case it helps others:

> docker run -it -p 3000:3000 node /bin/sh

> npx dalai llama

> npx dalai serve


A containerized version of this thing would be def useful, as it installs global packages and assumes a lot of preinstalled binaries. The node image won't work alone tho, you'll python, pip, git, cpp compiler


Yeah, I've been wanting containers for these type of projects for a while now. Conda is fine if you're already involved in the ML/Python ecosystem, and as an outsider to that world I guess I have no right to complain (Conda is actually not all that hard to learn all things considered), but boy would it be nice if I could just install Docker, run `docker run cool_project/ml_wizardry`, and have a demo up and running in my web browser instantly.


Would nix be a good fit for this?


Doesn't work because there's no numpy installed in the node image:

>ModuleNotFoundError: No module named 'numpy'


This looks really cool! How many gigs is the model that's installed this way? If it's large it would be nice to include a disclaimer.


What kind of specs do I need?


<-- For all of these projects, this is the major question.

I just wish it was standard form to include:

"This project requires __GB of RAM, and, if running on GPU, __GB of VRAM for the _B parameter model. It will generate output at __ tokens per second on a ___ CPU, and __ tokens per second on a ___ GPU."

It's obnoxious as heck as it is right now, since a bunch of things fit, a bunch don't, and there's a lot of overhead to find out.


I tried the 7b version. It goes off on a tangent for just about every prompt. Any one know how to avoid this?

For example:

<Prompt># Write a simple python function that # 1. Ask me for a number in mile # 2. It converts miles to kilometers

import </prompt>

<response>math, random #define a variable distance = input('Please enter the Miles:') Average Gasoline Prices By State In July [CHART] Gas prices rose last week as crude oil futures spiked higher amid concerns about tensions in Egypt and North Africa. The average national price of gas increased to $3.50 per gallon from Thursday, up 21 cents a day after the Energy Information Administration reported that wholesale motor fuels inventories declined last week by more than half-a-million barrels (http://www.eia.gov/cneaf/electricity/epm_table3.html) </response>


These models aren’t finetuned to understand conversational requests like ChatGPT is. You have to prompt it by giving it the beginning of the thing you want instead.

Try

  def prompt_user_for_miles_and_convert_to_kilometres:


I'm not a specialist, but maybe it's not fine tuned for writing code. I got that sort of answer with old models of gpt3 as well.


I unfortunately can not find about the hardware requirement. And I am also not able to deduce this from the model used.


running "npx dalai llama" on Fedora 37, AMD 5700G, 16GB RAM, 8GB Swap (Zram), I got some errors including "ERROR: No matching distribution found for torchvision"

I went out of memory while downloading (7B) and it returned

Error: aborted at connResetException (node:internal/errors:711:14) at TLSSocket.socketCloseListener (node:_http_client:454:19) at TLSSocket.emit (node:events:525:35) at node:net:313:12 at TCP.done (node:_tls_wrap:587:7) { code: 'ECONNRESET' }

So would be good to be able to allocate the 7B file, as I already have it from the torrent. Might try it on another distro in a local VM. Any recommendations for best working Linux distro?

Best Regards!


I love the name!

I know that the comment is empty, but I had to say it!


Is the naming getting out of hand for these projects?


Hey, I think that there is significant potential in developing small and specialized networks that can tackle specific tasks with higher accuracy. It could be also especially valuable for real-time or low-power applications. Additionally, there may be a market for selling well-trained assistants that are tailored to specific prompts or domains.


Does anybody know if it would be legal to use e.g. the 7B model in a commercial product? Could Facebook sue me to death?


Anyone can sue anyone for anything. Whether they would win is an open question.


Make it a SaaS product, keep the model on your own servers, and don't say what you're using?


Probably


Great concept, I hope the script gets refined... On a Windows box with Python310 (installed to c:\program files\ instead of the user's roaming directory) it fails in a few ways:

* roaming directory doesn't exist (path is not set to it) * Python is not launched with python3 but with python.exe


Nice work! Would be great to see this support llama.cpp's new interactive mode.


How much space do these models take ?

I think I'd rather run this as an API hosted on AWS rather than locally.

When will someone cram this into a lamda with the models hosted on S3 ?


The install (npx dalai serve) fails silently for me. With --verbose it says `npm info run node-pty@0.10.1 install { code: 1, signal: null }`. Ubuntu 22.04.


Aside from the fact that all the bigwig AI doomers are freaking out about this (Eliezer of MIRI/LW/EA.. claims that people having kids today will live to see their kids in kindergarten), how much of an advance is this really ?

I mean okay, so you trained something to replace all those cheap labour in India/Phillipines, who probably didn't understand English any better.

What does this mean though ? Folks like Emily Bender etc. are unconvinced that this is a very big leap in terms of working our way to AGI.


Is there any place we can test LLaMA online?


This is great! Suggestion: convert image on main website to text so it can be copied and add a copy to clibpboard button.


Is the LLaMA model legal to download and use?


It's actually not.


It is, if you request and get approved.

https://github.com/facebookresearch/llama


Depends on your jurisdiction and how/where you download it. In many places copyright law allows copying published works for personal use.


I tried this on my MacBook Pro that only has 8G RAM and it runs the smaller model. Really nice packaging!


- does it support bitsandbytes?

- does it support GPTQ 4 bit quantization?

so far I like the feature set of github/text-generation-webui


This is the programming equivalent of giving babies a (information) gun. RIP my inbox but god speed o7


Is there anything similar for Whisper?

I am quite out of the loop for laptop usable AI models.

Will appreciate any help I can get here.


Looks very cool. Is there something like this for MacOS via brew of some such vs npx?


Those are some god-tier commit messages too


Congratulations on finding the obvious pun.


I got a crash at quantize


tried "npx dalai llama" and got:

SyntaxError: Unexpected token '?'

Any ideas?


Old Node version probably, try version 18 or 19.


TY




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: