Petals: Run 100B+ language models at home bit-torrent style

bogwog · on Jan 2, 2023

So NVLink/NVSwitch pools multiple GPU resources on a single (very expensive) system. A cheaper alternative to that is "offloading", which is a technique that splits the inference process into smaller steps, so it can run on systems with much less resources available... and Petals is a 10x faster alternative to that.

Did I get that right?

This AI stuff is moving very fast and it's hard to keep up, but it's all fascinating.

borzunov · on Jan 2, 2023

You're right. This comment explains offloading in more detail: https://news.ycombinator.com/item?id=34216213

cardine · on Jan 2, 2023

Offloading is when the computation is done on the CPU instead of the GPU. DeepSpeed is an example of this.

borzunov · on Jan 2, 2023

In case of offloading, the computations are usually still performed on GPU, but the model is hosted in RAM/SSD instead of the GPU memory (and its chunks are copied to the GPU memory when necessary).

cardine · on Jan 2, 2023

A lot of computation is offloaded to the CPU, such as gradients and optimizer states. You are right though that quite a bit of computation is still done on the GPU.

rolenthedeep · on Jan 2, 2023

I remember when GPUs were starting to support arbitrary computation and offloading meant shifting work away from the CPU.

parentheses · on Jan 2, 2023

Matches my understanding also. Can someone in the know confirm?

nmitchko · on Jan 2, 2023

Your understanding is correct, but I can't vouch for the claim's accuracy. This could make the execution of models much more accessible to people who don't have a 4x / RTX3090 or better in a ML or mining rig ..

thot_experiment · on Jan 2, 2023

Is there a easy way to run a large language model and/or speech synthesis model locally/in colab? Stable Diffusion is easily accessible and has a vibrant community around AUTOMATIC1111. It's super straightforward to run on a Google Colab. Are there similar open source solutions to LLM/TTS? I believe I had GPT2 running locally at one point, as well as ESPNET2? Not 100% sure it's been a while. Wondering what the state of the art for FOSS neural LLMS and TTS is in 2023.

takantri · on Jan 2, 2023

For LLMs, the closest thing that comes to mind is KoboldAI[1]. The community isn't as big as Stable Diffusion's, but the Discord server is pretty active. I'm an active member of the community who likes to inform others on it (you can see my previous Hacker News comment was about the same thing, haha).

Like Stable Diffusion, it's a web UI (vaguely reminiscent of NovelAI's) that uses a backend (in this case, Huggingface Transformers). You can use different model architectures, as early as GPT-2 to the newer ones like BigScience's BLOOM, Meta's OPT, and EleutherAI's GPT-Neo and Pythia models, just as long as it was implemented in Huggingface.

They have official support for Google Colab[2][3]; most of the models shown are finetunes on novels (Janeway), choose-your-own-adventures (Nerys / Skein / Adventure), or erotic literature (Erebus / Shinen). You can use the models listed or provide a Huggingface URL.

[1] - https://github.com/koboldai/koboldai-client (source code)

[2] - https://colab.research.google.com/github/koboldai/KoboldAI-C... (TPU colab; 13B and 20B models)

[3] - https://colab.research.google.com/github/koboldai/KoboldAI-C... (GPU colab; 6B models and lower)

nmfisher · on Jan 3, 2023

I'm squarely in Kobold's userbase but hadn't come across it until your post, so thanks for your efforts to spread awareness.

Roark66 · on Jan 2, 2023

There is (for many, but not all large models). Specifically there is huggingface's accelerate library that let's you run the model partially on your gpu, partially on cpu/ram and what doesn't fit in ram is cached in nvme storage (a mirror of two fast drives recommended).

I didn't have much luck with stock accelerate, but once gpu is disabled (so it runs only on cpu offloading to nvme storage where ram is insufficient) worked pretty well with me. (there is a small code change that has to be done as the stock software refuses to run without gpu-it is a simple change described in its github issues). My gpu is 8gb vram, but this way I managed to run 7b parameter models. In principle I could run a lot larger ones, but of course it takes a lot more time. The 7b bloom takes 90s for one inference and additional 60s to load the model (from a spinning disc array) initially.

borzunov · on Jan 2, 2023

Really large (GPT-3-sized) language models have much more parameters than diffusion models, so it's difficult to load them locally unless you have a server with 8x 3090/3x A100 GPUs. Petals is the only way to fine-tune and inference 100B+ parameter models from Colab, as far as I know.

thot_experiment · on Jan 2, 2023

Interesting, how does that work with the multiple GPUs? I'm not familiar with the internal workings of these models, is there anywhere where I can get a brief rundown of how the processing is split. I imagine there can't me much swapping between GPUs as that seems prohibitively slow? How is the model split such that it can be worked on in parallel by multiple GPUs w/o being bottlenecked by IO?

borzunov · on Jan 2, 2023

I think this is a relevant link for you: https://huggingface.co/transformers/v4.9.0/parallelism.html

For large LMs, people usually use tensor-parallelism (TP) or pipeline-parallelism (PP). TP involves lots of communication, but uses all GPUs 100% of the time and works faster. PP requires much less communication, but may keep some GPUs idle while they are waiting for data from others.

Usually, TP is used when you have good communication channels between GPUs (e.g., they are in one data center and connected with NVLink), while PP is used when communication is a bottleneck (like in Petals, where the data is sent over the Internet, which is much slower than NVLink).

nmitchko · on Jan 2, 2023

You can split the model across devices with huggingface accelerate library.

Check out the infer_auto_memory_map metho which will optimize the model for your configuration (multi gpu, ram, nvme) and then run dispatch model on with that memory map.

zone411 · on Jan 2, 2023

You can read all the gory details here: https://arxiv.org/pdf/2207.00032.pdf

borzunov · on Jan 2, 2023

clarification: You can also use offloading on Colab, but inference with offloading is at least 10x slower (see other comment threads). So it can't really be used for interactive inference, but may be used for fine-tuning with large batches/sequence lengths.

Shindi · on Jan 3, 2023

Surprised mentioned GPT-J! Here is a colab link: https://colab.research.google.com/github/NielsRogge/Transfor...

Although you need a premium GPU. I admit it's not as good at zero shot or 1-shot as GPT-3 but if you provide examples, you can get as good of output. I feel like the team behind it needs better marketing.

thot_experiment · on Jan 3, 2023

Nice, this looks pretty good. I have Google Colab Pro+ so I can use the 40GB GPUs there. Am I correct that I could also run this locally on 2X 11gb 1080Ti?

FireInsight · on Jan 2, 2023

Not sure about TTS, but I've trained GPT-2 (a pytorch implementation I think) on my own data and it worked pretty well, also tried eleutherai's 6B model but, couldn't figure out how to run it.. About an "easy way", I don't think such user interface like what Stable Diffusion has got exists as of now.

Roark66 · on Jan 2, 2023

This is a nice effort if it allows you to run bloom 170B in 1s per token. Just for comparison sake. With a last gen Ryzen cpu (16core) it takes me about 90s to run the model with 32gb ram (the entire model uses few GB of nvme storage too, as 32gb isn't enough ram).

However, I wonder how they prevent abuse. The main page doesn't mention it. As they mentioned block chain I suspect there will be some sort of credits implemented. I'll definitely be watching where this project goes.

Edit:just to clarify the 90s is not the 170b parameter model. It is 7b bloom version. I forgot to mention it and it puts the ability to run a 170B model in 1s in better perspective.

borzunov · on Jan 2, 2023

A Petals dev here. At the moment, we're working on a centralized incentive system, no blockchain involved. It will award points if someone is running a server that consistently stays online and returns correct results. Then, users will be able to spend these points for prioritized inference and (maybe) extra features like increased sequence length/batch size limits. This way, the swarm will prioritize people who actually contribute compute and serve others in the remaining idle time.

prettyStandard · on Jan 2, 2023

Is it possible to have the server shutdown predictably when it finishes tasks periodically, and not get penalized? I would like my machine to run while I'm not using it.

borzunov · on Jan 2, 2023

Sure! People who disconnect for a while (not necessarily predictably) won't be penalized - it's okay if you suddenly decide to use your GPU for something else, then get back to running a server.

narrator · on Jan 2, 2023

Maybe you could get Bram Cohen to work on this. Seriously, reach out to him, he loves to work on these game theory sorts of things.

oefrha · on Jan 2, 2023

I’d say his reputation suffered quite a bit after the whole Chia (that proof of SSD thrashing coin) BS.

technocratius · on Jan 2, 2023

This sounds (very narrowly) similar to the Enigma network, a blockchain-based technology that can be used for fully encrypted multi-party computation (MPC). It was one of the earlier blockchain projects that actually had an interesting use case and technology in this quite "overhyped" space. They rebranded to the Secret network [0] a few years back and somehow I don't find this use case/promise back nowadays...website screams all of the Web3 BS buzzwords it seems :(

[0] https://scrt.network

ShamelessC · on Jan 2, 2023

Well yeah, the whole movement is founded in deliberate ignorance of all the existing, _working_ solutions we already have. Also, apparently none of them watched the HBO comedy Silicon Valley.

hatenberg · on Jan 3, 2023

Just shows nobody actually needs blockchain even for decentralized systems

rolenthedeep · on Jan 2, 2023

So this sounds like BOINC but specifically for language neural nets?

It's a very interesting concept, and I quite like the idea of a public, open compute cloud. I'd like to see more detail on security: if I'm going to donate time on my personal machine, I'd like some assurance that the workload is properly sandboxed and can't reasonably access my network or data.

Mostly out of interest, what's the advantage to this over just using the existing BOINC network? I've been running BOINC on and off since the dialup days, it's an extremely mature platform with all kinds of workload capabilities.

borzunov · on Jan 3, 2023

During the training, participants only exchange tensors (embeddings, gradients) and never send code to each other. No other peer can execute arbitrary code on your computer - they can only request you to run one of pre-defined BLOOM layers. You can further isolate the Petals server from your machine by running it a Docker container (see the command in the repo).

A client needs to communicate with multiple servers in a specific way to run the model, I'm not sure our communication model can be implemented with BOINC.

Oras · on Jan 2, 2023

I think it is a great start, and I suppose there will be many iterations to ensure fair usage.

It would be interesting to reach a point to be similar to docker where you don't need to load each layer again and you only need your specific layer. The shared models layers would be already loaded, and running multiple models at once would consume less GPU memory.

tommica · on Jan 2, 2023

What an interesting concept - also makes me wonder how BitTorrent could be used for more de-centralizing of data, while keeping it accessible on-demand.

_joel · on Jan 2, 2023

Sounds just like https://ipfs.tech/

jazzyjackson · on Jan 2, 2023

rather, ipfs sounds just like bittorrent

neiman · on Jan 2, 2023

It has a unique identifier of data (their main feature), a naming system and some other features, that makes it quite different than bittorrent (as far as two p2p sharing data networks can be different, obviously).

ShamelessC · on Jan 2, 2023

Right, but one works and has widespread adoption. The other does not. And they certainly cover similar ground.

Alifatisk · on Jan 2, 2023

I’ve always wanted to download an ipfs-node and run it on my pc in the background but I’m worried if it will wear down my hard drives?

sva_ · on Jan 2, 2023

I tried the chat on http://chat.petals.ml, and it seems to struggle with the current load (as per the disclaimer at the top)

    Human: How is the weather today?
    
    AI: the
    AI theAI)aultAIAI ) course )
    . can?esterday to people?
    ? is to think thatified )

Really cool project though, I wanted to work on something similar.

metadat · on Jan 2, 2023

I replied to me with:

  It is nice today.

Not garbled, but also extremely shallow.

jck · on Jan 2, 2023

To be fair, so was the question.

This is a language model, not an oracle or an interface to weather forecast data.

borzunov · on Jan 2, 2023

It varies from time to time. You can also switch to the few-shot mode to try machine translation, code generation, or other tasks involving longer responses

throwaway743 · on Jan 2, 2023

Won't even load for me atm

tjoff · on Jan 2, 2023

> Fine-tuning and inference up to 10x faster than offloading

What is "offloading" in this context?

borzunov · on Jan 2, 2023

Offloading is another popular method for running large LMs when you don't have the GPU memory to fit the entire model. Imagine you have an A100 GPU with 80 GB memory and want to generate text with BLOOM, a 70-block transformer model with ~2.5 GB of weights per block. For each token, offloading will load the first 1/3 of the model (~27 blocks) from RAM/SSD to your GPU memory, run a forward pass through them, then free the memory and load the next 2/3, and so on.

It turns out, Petals is faster than offloading even though it communicates over the Internet (possible, with servers far away from you). That's because Petals only sends NN activations between servers (a small amount of data), while offloading copies hundreds of GB of NN weights to GPU VRAM to generate each new token.

madisonmay · on Jan 2, 2023

Interestingly it sounds like offloading could be made quite efficient in a batch setting if you primarily care about throughput rather than latency. Though I guess for most current LLM applications latency is quite important.

taink · on Jan 2, 2023

It's mentioned in their paper: https://arxiv.org/pdf/2209.01188.pdf

  Several recent works aim to democratize LLMs
  by “offloading” model parameters to slower but
  cheaper memory (RAM or SSD), then running
  them on the accelerator layer by layer (Pudipeddi
  et al., 2020; Ren et al., 2021). This method allows
  running LLMs with a single low-end accelerator
  by loading parameters from RAM justin-time for
  each forward pass. Offloading can be efficient for
  processing many tokens in parallel, but it has inher-
  ently high latency: for example, generating one to-
  ken with BLOOM-176B takes at least 5.5 seconds
  for the fastest RAM offloading setup and 22 sec-
  onds for the fastest SSD offloading. In addition,
  many computers do not have enough RAM to of-
  fload 175B parameters.

dpflan · on Jan 2, 2023

Is a mobile device / edge device a possible participant / source of resources?

simongray · on Jan 2, 2023

What an fascinating concept. I guess this won't be useful for any kind of realtime feedback system, though?

borzunov · on Jan 2, 2023

A Petals dev here. It is not real-time, but we think the speed of ~1 token/sec may be enough for some interactive apps such as chat bots (especially, if you show tokens to a user once they are generated). You can try one at http://chat.petals.ml (heads-up: it may be laggy right now due to lots of HN users trying out the system).

Of course, you could do better if you have enough high-end GPUs to host the entire model yourself (3x A100 or 8x 3090). But if you don't, 1 token/sec is much faster than what you get with other existing methods.

dpflan · on Jan 2, 2023

I have not read the technical details, apologies for ignorance, but is there an opportunity for caching?

jerpint · on Jan 2, 2023

Probably not, since you need to compute the activations of unknown inputs and there could be infinitely many variations of them

KaoruAoiShiho · on Jan 2, 2023

What are the speeds of other existing methods?

borzunov · on Jan 2, 2023

Theoretical best-case for RAM offloading is 5.5 sec/token, for SSD offloading - 22 sec/token. Implementations we've tested are not faster than 10 sec/token though. See details in our paper: https://arxiv.org/pdf/2209.01188.pdf

colordrops · on Jan 2, 2023

Why not?

simongray · on Jan 2, 2023

How would one make a reliable realtime system that depends entirely on unknown network conditions? Perhaps inside a closed network it is possible.

_joel · on Jan 2, 2023

That's orthoganal to a realtime system. You can infer at a fair speed so realtime would be possible.

simongray · on Jan 2, 2023

Guarantees are not orthogonal to realtime feedback, they are essential. If I write a query, it is not irrelevant whether it takes 1 second or 1 minute to return at any given moment.

You write that speed can be inferred, but the analogy that was used here is BitTorrent—and my experience with BitTorrent tells me that it certainly cannot be inferred.

_joel · on Jan 2, 2023

If you read the article text and the response from the dev then yes, inference can happen at 1/s or if parallelised, more. I'm not sure what your parameters are for a realtime system. If you're talking about network reliability, that's a different issue. Yes it can infer quickly, can it do it reliably is another matter.

Kerbonut · on Jan 2, 2023

Anyone participating in the swarm is able to potentially log the tokens that get processed by their node. Obviously a security concern. Is there any way to implement homomorphic computing to securely process the tokens?

borzunov · on Jan 2, 2023

A Petals dev here. Indeed, the public swarm should not be used for any kind of sensitive data (we have warnings about that in the instructions). If someone wants to process such data, we recommend to set up a private swarm among the orgs they trust (e.g., a couple of labs/small companies who don't have many GPUs themselves may set up a private swarm and collaborate to process their datasets).

Regarding homomorphic encryption (HE), I'm afraid the current methods to run neural networks in the HE fashion involve 10-100x slowdown, since they are mostly not designed for floating-point operations. We'd love to find a way to do it faster though, since privacy is obviously an important issue for many tasks.

Kerbonut · on Jan 2, 2023

Hi there, thanks for taking the time to answer questions! There are numerous use cases where even a 100x slowdown would be acceptable if it was demonstrable able to process sensitive data. Can you help me understand what kind of a slowdown that is? Could the 10-100x slowdown be overcome by more compute nodes, or would it require the nodes themselves to be 10-100x faster for e.g.?

borzunov · on Jan 2, 2023

If someone wants to process sensitive data and is okay with 10x slowdown, it's better to use offloading. This is another, slower method for running large LMs locally without high-end GPUs, see details here: https://news.ycombinator.com/item?id=34216213

In other words, if Petals nodes became 10-100x slower, Petals would lose its competitive advantage over simpler methods that don't communicate over the Internet.

alexb_ · on Jan 2, 2023

Is the MIT license that this uses compatible with the RAIL license that Bloom uses? Or are there not issues with that?

borzunov · on Jan 2, 2023

BLOOM is a large LM, and Petals is a tool for running large LMs (not necessarily BLOOM). People using Petals should still follow the model's terms of use regardless of how the tool is licensed.

alexb_ · on Jan 2, 2023

Thanks for the clarification

vegabook · on Jan 2, 2023

Any plans for releasing an API spec that would allow for access from languages other than Python?

borzunov · on Jan 2, 2023

There's a lightweight HTTP API for inference: https://github.com/borzunov/chat.petals.ml#http-api-methods

oersted · on Jan 2, 2023

Are there basic stats on real-time contributors and latency?

thomastjeffery · on Jan 2, 2023

What's the point?

So you can get predicted text that looks "coherent". Then what?

There is literally no place to add logic. Neural net-based language models are impressive, sure, but it's not hard to see how useless they are.

The only time their output is logically coherent is when they are lucky, and that seems to happen often because most of their input was logically coherent to begin with.

fasterik · on Jan 2, 2023

Whether or not the current technology is useless is an empirical question. How many people are using ChatGPT, Stable Diffusion, etc. for economically or personally valuable activities? We actually don't know.

Even if we assume the technology is useless in its current state, it is still incremental progress. Could we have predicted 10 years ago what neural networks would be capable of today? Now, tell me what neural networks will be doing in 10 years. If you think you know the answer with any degree of certainty, you're probably deluded.

thomastjeffery · on Jan 2, 2023

My point is that ML-based NLP (like chatGPT) has a clear ceiling, and we seem to have reached that.

We can get coherent (understandable) output all day long, but we can never introduce logic.

ML-based NLP is a semantic word-guessing machine. It's based on entirely on how often words show up near each other in the training datasets. There is no room to add logic.

The entire exercise is like a magic trick: impressive sure, but at the end of the day, a fool's errand.

fasterik · on Jan 3, 2023

You make very strong claims about things we know very little about. It's far from clear that we have reached a ceiling. Who can predict how systems with 10x the parameters and as-yet-undiscovered deep learning models will behave?

We don't understand how humans do logic. It's entirely possible that whatever structure in the human brain is responsible for handling logic can emerge in a neural network.

If we're talking about what it takes to get to true AGI in the near future, then I agree that a pure neural network approach might not cross the finish line first. I think Stuart Russell made this point in an interview, basically saying that a neural network is a very inefficient computational model and that we could do the same thing much more efficiently if we had the right "good old fashioned AI" algorithm. But fundamentally a neural network is just computing a function so there's nothing in principle preventing a neural network from doing whatever a symbolic system does. It's mostly a matter of efficiency and hardware availability.

thomastjeffery · on Jan 4, 2023

But we do know plenty about it. It's right there in front of us. Pretending there is some understanding just out of reach is called mysticism.

What you are telling me is that I should place my expectations for the future, not on the reality in front of me, but on the hopes and dreams you have for the future. That's circular reasoning.

The very reason that I don't place credibility in your assertions is the lack of reason itself: in your assertions, and in what a neural network is.

Neural networks are like dreams. Wonderful only when your intention is to get lost in a swirl of memories. Useless if you want to actually accomplish something.

Knowing the difference is crucial, because that difference can never be taught to a neural network without completely redefining what a neural network is in the first place.

Knowing the difference is literally the thing neural networks are incapable of doing. They don't know anything. They just guess. That's literally the function. In the code. Guess what comes next.

There is no sense pretending sense itself will magically appear out of a guessing machine. Neural networks are nonsense generators, and that is what they are forever doomed to be.

fasterik · on Jan 4, 2023

You made two claims: (1) current language models are useless and (2) current language models have reached a ceiling. I said:

How many people are using ChatGPT, Stable Diffusion, etc. for economically or personally valuable activities?

If (1) is true, then the answer to that question is "zero" or at least "close to zero". Do you really believe that?

If (2) is true, then it is also true to say that transformer models will never exceed today's capabilities by a significant amount at any time in the future. Do you really believe that?

thomastjeffery · on Jan 9, 2023

Yes. The ceiling is the floor.

The limitation is inherent in the core design. There is no overcoming. This is not a hurdle or a wall. It's a design flaw.

Is it totally useless to everyone? No. Not completely. It's like a coherent search engine: a way to find data that is close to other data. But "close to" in this case is only "semantically", and never "logically", so that's that.

Is it going to get any less useless than it is? Only slightly. "It" will never get better. The only better version of "it" is a completely new ground-up redesign that doesn't resemble "it" at all.

fasterik · on Jan 10, 2023

Modern neural network architectures are Turing complete [1]. So I don't see any argument for a limit in principle unless you are arguing that a Turing machine can't achieve language understanding. If that's what you're saying, then I wonder who is espousing mysticism here.

[1] https://arxiv.org/abs/1901.03429

thomastjeffery · on Jan 11, 2023

Are you forgetting the distinction between a program and a computer?

Language understanding doesn't magically spawn itself as a process on your computer! Someone has to write that program first.

And that's my point. ChatGPT transforms language, but it does not understand it. For that, we will need a different kind of program.

fasterik · on Jan 12, 2023

Language understanding doesn't magically spawn itself as a process on your computer! Someone has to write that program first.

Do you think it's impossible for such a program to emerge as weights in a Turing complete neural network architecture?

qznc · on Jan 2, 2023

Play with https://chat.openai.com/ to experience how powerful predicting text is.

thomastjeffery · on Jan 2, 2023

I have.

And as I said, it's very impressive.

And it has some usefulness: essentially it's an alternative to reading through many pages/posts of StackOverflow and Wikipedia.

But it doesn't know anything. It has no clue whatsoever whether it is correct or incorrect. It only makes guesses. The only reason there is useful output is because that output is a transformation of useful input.

There is no logic. There is no way to introduce logic. There is no way to filter it through logic.

If some coherent mixture of the ML's training datasets already contains the answer to your question - like literary or code examples, definitions, etc. - then the output will be useful. Otherwise, it's just wrong, and sometimes unexpectedly so.

The output of chatGPT (or any other ML-based NLP) can only be as correct or knowledgeable as the data it is trained on; and it will practically never even match that level, because it is only mixing words by semantic popularity, never by logical relationship.

borzunov · on Jan 2, 2023

Chat bot interfaces are only a small part of what can be done with large LMs.

You can use and fine-tune them to solve almost all existing natural language processing tasks: machine translation, recommendation/search, text classification and summarization, code generation, etc.

thomastjeffery · on Jan 2, 2023

False.

You can use them to transform already existing text and code (the training datasets); but you can never do more than that.

There is no room in the ML algorithm to introduce logic. It's doomed to forever be a guessing game; and the resulting guesses will always be limited by the information it is fed to begin with.

The only reason chatGPT is so impressive is that it is transforming human conversation that itself is impressive (except that we were already aware of it). The code generation, literature, and definitions, etc. it outputs are all just rephrasing the written code, literature, and definitions that it was given as training data.

It's effectively no more than a sleight-of-hand. Flashy and impressive, but never anything more.

stolsvik · on Jan 3, 2023

You should read this: https://ai.googleblog.com/2022/11/characterizing-emergent-ph... .. and probably also the paper.

I find these emergent phenomena pretty interesting.

thomastjeffery · on Jan 9, 2023

They are missing the forest for the trees.

The "emergent phenomena" can be trivially explained by the input they are giving it.

They are not using a dataset that contains an equal amount of "correct" and "incorrect" responses. They are using datasets of human communication, which are obviously filtering for "correct" data. We get things wrong occasionally, but that is quite rare relative to what we get right. We can't even structure a sentence without getting something correct!

If you feed a dog good food, is it really a surprise that dog is healthy? You never fed it poison!

The language model is only returning semantic relationships. The "emergent phenomena" is that most semantic relationships in human communication just happen to also be logical relationships.

But the language model doesn't know that. In no way does it interact with logic. It only interacts with semantics.

If anyone actually bothered to train an instance of GPT or whatever on poisoned data, (i,e nonsensical stories) then you would see that emergent phenomena disappear. But no one is writing the nonsensical stories in the first place, so such a dataset does not exist.

marvin · on Jan 2, 2023

You're at least a few weeks behind the state of the art.

rightbyte · on Jan 2, 2023

Can anyone get it to write code? It just says it has written the code to the file system when I prompt it.

borzunov · on Jan 2, 2023

You can switch the chat bot (http://chat.petals.ml) to the "few-shot mode" and provide a couple of "task description & code" examples. Then you can add a new task description and it'll respond with code.

The underlying LM, BLOOM, had a few programming languages in its dataset, so it works at least with Python and C++.

Seattle3503 · on Jan 2, 2023

Could this approach be used on other types of models, such as image models eg stable diffusion?

ShamelessC · on Jan 2, 2023

There is probably less motivation, as these models are much smaller. An M1 Mac can run inference in under a minute. An GPU in as little as 3-5 seconds. This is supposed to be as much as 20x faster when the distilled Stable diffusion model is released.

29athrowaway · on Jan 2, 2023

People nowadays will use 100 GB in VRAM to run a model that taught itself how to do quicksort.

pmontra · on Jan 2, 2023

I guess that it could be used to create a private swarm, if one has a lot of hardware at home.

KETpXDDzR · on Jan 3, 2023

So... How can I mine crypto with this? :)

NicoleJO · on Jan 2, 2023

What is the copyright status of these models?

borzunov · on Jan 2, 2023

Petals runs BLOOM, an open-source, publicly released model of the same size as GPT-3. Here's a description of the data used to train this model: https://huggingface.co/bigscience/bloom (the "Training" section)

alexb_ · on Jan 2, 2023

BLOOM is not open source. It has the RAIL license, which exists solely to place restrictions on the use of the software, as well as forcing people to update. Read more: https://bigscience.huggingface.co/blog/the-bigscience-rail-l...

NicoleJO · on Jan 3, 2023

Thanks. It contains copyrighted material and is therefore illegal to use.

artist_eren · on Jan 2, 2023

exciting, will surely check the git repo

m00dy · on Jan 2, 2023

who is working on blockchain + web3 + AI inference => Decentralized AI besides me ?

ShamelessC · on Jan 2, 2023

Increasingly fewer people are interested in scamming people after the FTX debacle, from what I can tell

dpflan · on Jan 2, 2023

Can you elaborate on what you are doing?

prettyStandard · on Jan 2, 2023

I have been interested in 2 of those 3. Where are you working on them?

Labo333 · on Jan 2, 2023

I would love for most of the Blockchain trend to be converted in efforts towards BitTorrent style projects.

Distributed File Sharing or computation without the whole tokenomics that, while interesting, creates too much attention from scammers.

vintermann · on Jan 2, 2023

That's not going to happen, because "distributed" was always a misnomer when it came to blockchain things. "Massively redundantly replicated" would be better. If work is distributed, every participant has a little piece of the work to do, but in e.g. blockchain contracts, all the participants need to do the whole calculation.

jlokier · on Jan 2, 2023

That's changing. The high replication to verify untrusted peers is decreasing and there's a realistic prospect of it going away due to gradual adoption and development of new zk-proof techniques.

In zk proofs-of-computation-result, different nodes can perform different intensive parts of a calculation and send the results along with proofs that those are the correct results. Other nodes can accept the results and verify the proofs with remarkable efficiency, then use those partial results for further calculations. To me it still feels counterintuitive and almost magical that any large, arbitrary computation result can be easily verified without repeating the computation, without the verifier needing much memory or data.

For cryptocurrency blockchains this allows smart-contract (computational) transactions to be accepted with only one node having to execute the code, everyone else just efficiently verifies the proof to accept the state change. As proofs can be aggregated, this scales well: it isn't necessary for every node to run all the verifications, either.

For big, distributed calculations like the article's, the whole calculation can progress using those partial results without having to rely on trust and reputation, and everyone can have high confidence that the final result is what it should be, not undermined by subterfuge or subtly inaccurate contributions.

This is an offshoot of zero-knowledge proofs, as ironically zero-knowledge is not required for these types of applications. Just the efficient verifiability part.

(Fwiw, I am working on large, scalable zk-proofs-of-computation in my spare time, in optimised software and with hardware accelaration, if anyone is interested in discussing this stuff.)

vlovich123 · on Jan 2, 2023

> To me it still feels counterintuitive and almost magical that any large, arbitrary computation result can be easily verified without repeating the computation, without the verifier needing much memory or data.

Why counterintuitive? That’s kind of all of cryptography and most of computer science. Take factoring into primes (which has been done for forever): it’s really time consuming and expensive to determine what the prime factors for a number are, particularly if it’s a big number and you know it only has two. That’s because division is very very difficult and time consuming. Multiplication on the other hand is super cheap so once you tell me the prime factors, I can confirm much more quickly whether or not they’re factors.

In computer science, one of the earliest identified computation classes is NP complete which has this property. Eg traveling salesman and knapsack packing problem are examples. It can be insanely difficult to find a path that exists between two cities in a graph under some cost. But if you give me a solution I can easily confirm whether it meets the criteria (global optimality testing is itself NP complete but if you give me a set of solutions you can verify which one is the cheapest).

I’m not claiming that factorization is NP btw. There are complexity classes beyond NP that share this property. https://cstheory.stackexchange.com/questions/159/is-integer-...

Anyway. ZK proofs themselves are super surprising and not intuitive but not because verification is fast but because verification reveals nothing to the verifier about the solution. That’s the mind blowing result.

jlokier · on Jan 3, 2023

It's really different from the cryptography you're describing, although they do have the theory of NP in common. Mathematical cryptography depends on the assumed hardness of just a very few problems, notably factorisation of numbers with two or more large factors, and discrete logarithm in some groups. The particular problems are very specific.

What I find remarkable is that zk-proof-of-computation works for any kind of computation. On the face of it, it might seem that some computations would resist being compressible that way, but no, it works with anything that can be run on any real computer.

It doesn't depend on what kind of computation, so it has nothing to do with which program, how it's written, the complexity class (linear time, P, NP-complete, superexponential etc), or even on the size of the problem. It doesn't even depend on how much memory the problem requires. You can have a computation that requires terabytes or exabytes of RAM to compute, and the world's largest supercomputer running for a decade: The proof that the output is correct, no matter how much complexity went into calculating it, is still small and fast to verify.

But you still have to do the computation somewhere to get the proof. That's why it's called "argument of knowledge", because the entity constructing the proof must have access to ("knowledge of") the computation.

So it's still about feasible computations. Usual zk-proof-of-computation can't be used to prove things larger than there's a computer able to compute.

That boundary is different from cryptography (and P vs NP), which is more about verifiability of problems requiring exponentially larger time and/or space to solve if you don't have the secrets, so if the parameters are suitable, these are about infeasible computations by any physically realisable computer.

The connection is that that zk-proofs-of-computation are about making proofs of feasible computations, while ensuring it's infeasible to compute a false proof, or to find the secret inputs if there are any (there don't have to be).

(By the way, you may be thinking of discrete logarithm not division. Division is not difficult. In finite fields such as used in cryptography, division can be computed by constant exponention using Fermat's Little Theorom, and exponentiation takes logarithmic time in the size of the field using a repeated squaring method. Division is slower than multiplication, but not prohibitively so; it's used in elliptic curve operations. The hardness of factorising certain numbers is for a different reason than division.)

uoaei · on Jan 2, 2023

BitTorrent then, under this perspective, is "less massively, less redundantly replicated" since people only hold onto and seed the torrents whose files they store locally, but they do not just go on and download and seed everything out there. Seems like a nice compromise to me, but obviously risks data becomming irrevocable in some cases.

Is this also how IPFS works?

javajosh · on Jan 2, 2023

I don't know much about it, but I would assume that some effort would be made by miners to look in different parts of the search space - you don't want 1000 machines looking checking the same hashes, in order, after all.

dsco · on Jan 2, 2023

How would you incentivize nodes long-term without a reward/token system?

wongarsu · on Jan 2, 2023

Bittorrent basically runs on social norms, and people's willingness to help their community. Or people go to private trackers, where contributing is a requirement for being in the community and having access to its resources.

Blockchains are built under the assumption that everyone is selfish and untrustworthy. Which is a decent assumption when building a crypto currency, but that doesn't mean that every system has to run like that.

hazebooth · on Jan 2, 2023

As much as I like private trackers, very few use the ratio-less model to protect against serial leechers.

Typically on a tracker you’re given a currency (although not as sound as some e-coins) and can use that to influence your upload or download statistics, which in turn affect your ratio. Some trackers might employ rules where your user class has to have a certain ratio, or else you’ll lose privileges like certain forums or even the ability to download at all. (The trackers are private and can control which peers you can see)

britneybitch · on Jan 2, 2023

A token system can incentivize cooperation in a hostile environment with selfish nodes. But in a friendly environment you might not need the same level of incentives.

I see some similarity here to the world of private torrent trackers. You want a Linux ISO, I want a Linux ISO, we're all working towards the same goal. So we're already incentivized to cooperate, without getting money involved. And trackers also have things like minimum seeding ratios to keep people honest. In the case of AI, you and I both want to generate images, so we're also working towards the same goal, so let's help each other out so both of our workloads finish faster. Maybe idealistic, but I think it could work.

CodexArcana · on Jan 2, 2023

In the heyday of BitTorrent sharing was the incentive, and still is I imagine. We don't need to financialize everything.

croes · on Jan 2, 2023

That's more cryptocurrency than blockchain.

Blockchain as such has nothing to do with the costs of a node and incentives to run one

aliqot · on Jan 2, 2023

"That's just the nose, see the nose can exist discretely without the finger to pick it."

vintermann · on Jan 2, 2023

That's a different question. But BitTorrent does fine with simple tit-for-tat rules, no tranferable tokens required.

z3c0 · on Jan 2, 2023

Why not just "decentralized"? I'm not sure I ever saw a blockchain that was posited as "distributed".

croes · on Jan 2, 2023

Blockchain is a distributed ledger.

z3c0 · on Jan 2, 2023

Looking into it, you're correct, but only insofar as we're referring to two different meanings of distributed. Bitcoin is a distributed ledger, as in "fault tolerant". But the consensus mechanism is decentralized. Thus, the decision-making is not distributed.

A distributed consensus mechanism would segment the decisions amongst nodes, not poll for a unanimous response.

bluelightning2k · on Jan 2, 2023

This is a very articulate and interesting way to put it.

"Massively redundantly replicated"

Pilottwave · on Jan 2, 2023

While I understand where you are coming from, this argument is basically "money attracts attention from scammers". As a counter I would say "Money attracts attention, period" and attention is an important resource to foster growth.

Decentralized tech would never be where it is today if it weren't for investor attention and the potential for gains. We just have to separate the wheat from the chaff, and remain vigilant for bad actors.

Labo333 · on Jan 2, 2023

What I mean is that the current signal-to-noise ratio is way too weak.

This created a lot of bubbles. NFTs are already down by a lot, now yield farming (https://www.bloomberg.com/news/articles/2022-04-25/sam-bankm...) just took a big hit from the FTX case. I see way too many "revolutionnary" projects from fresh graduates. There is no way that tens of thousands of inexperienced people with barely enough CS education to pass programming interviews would magically create innovation just because VCs put a ton of money on them.

Also, can you tell me more about where decentralized tech is today? BitTorrent was a revolution as a way of information sharing, Onion was a revolution for privacy and Bitcoin was a revolution for decentralized ledgers.

Starting from that, IPFS is the continuation of BitTorrent with more features and Ethereum is a more efficient (especially since The Merge) and customizable (smart contracts are advanced checkers for write operations) ledger.

But what are the real world applications of those technologies? What are concrete use cases of Ethereum and IPFS besides payments, records and file sharing?

Surely there are exciting progresses to be made on the technical side like zk-SNARKS but how useful will they be to society?

I think we already have all the technical blocks we need. If there is no real-world adoption maybe we should just wait another 10 years before pumping crazy amounts of money.

2fast4you · on Jan 2, 2023

*blockchain tech would never be where it is today…

I’ll bet blockchain is only as popular as it is because of the money. But other forms of decentralization like Mastodon or Matrix are pretty separate from the whole crypto sphere

dist1ll · on Jan 2, 2023

Matrix is separate from the crypto sphere, because it solves a different problem.

Federated platforms appeal to the privacy-oriented "f** big tech" mindset, which is pretty common in the hacker & FOSS crowds. I'd put it in the same category as VPNs, E2E messengers and TOR.

olivierduval · on Jan 2, 2023

VPN are really usefull for business to link different locations with internet instead of using (awfully costly) dedicated link... or to allow remote work

So VPN are not really in the same category

mikepurvis · on Jan 2, 2023

VPN as a technology, yes. But I think "VPN" in this discussion is referring specifically to the myriad of consumer-oriented paid solutions (SurfShark, NordVPN, whatever) that are pitched as being about protecting your online security, pirating with impunity, and bypassing region-locks.

Labo333 · on Jan 2, 2023

Totally agree on that.

Big corps only invest in blockchain because of the buzz words that are used as marketing by the consulting firms to sell their "expertise" and by VCs to sell their companies.

Sure they hope to gain some money, like luxury brands wanting to sell to crypto-billionaires. But crypto was a useful toy, then Ponzi scheme and now it's a closed loop. How long will the bubble last?

sdiacom · on Jan 2, 2023

The "chaff" and the bad actors are in it for the money. Without them, "decentralized tech" indeed wouldn't be where it is today -- meaning, it wouldn't be overwhelmingly associated with crypto-adjacent grifts.

The real decentralized tech, the one that serves a purpose other than emptying the wallets of naïve crypto-enthusiasts, does just fine without a profit motive. You don't need get-rich-quick promises to get an audience if you're actually doing something useful.

ricardobeat · on Jan 2, 2023

Where exactly is decentralized tech today?

Nobody around me ever uses any of it. Old p2p networks (gnutella, kademlia, emule) had way larger impact on society 20 years ago.

blamestross · on Jan 2, 2023

BitTorrent isn't getting any smaller. Mainline DHT is still 10x bigger than Bitcoin.

pmontra · on Jan 2, 2023

Email and the web are more decentralized than they look. Just think that different FOSS and closed source user agents and servers interoperate without any problem, especially for email.

ricardobeat · on Jan 3, 2023

They were talking about crypto. Email and the web were born out of research & universities, not the “potential to attract investors”.

feanaro · on Jan 2, 2023

And in non-web space, there's Matrix (https://matrix.org).

yunohn · on Jan 2, 2023

> Decentralized tech would never be where it is today if it weren't for investor attention

That’s exactly why blockchains haven’t found Product Market Fit.

Investors != Users

boramalper · on Jan 2, 2023

> Distributed File Sharing or computation without the whole tokenomics

They went hand in hand even back in the day: private torrent trackers were all about tokenomics where tokens were the number of bytes you've seeded (uploaded) minus you've downloaded.

I'm not saying it's impossible to imagine distributed file sharing otherwise, but to "guarantee" the availability of (especially unpopular) content, you need some incentive mechanisms either built in to the protocol or externally imposed.

birracerveza · on Jan 2, 2023

From the project's readme:

>Please do not use the public swarm to process sensitive data. We ask for that because it is an open network, and it is technically possible for peers serving model layers to recover input data and model outputs or modify them in a malicious way. Instead, you can set up a private Petals swarm hosted by people and organization you trust, who are authorized to process your data.

This is what blockchain and staking tokens is for. (Part of the reason, at least)

You act maliciously, the network slashes your stake. "pinky promise not to do bad stuff" only goes so far... and it's really not far at all. You can trust "trusted" organizations or private individuals, but they have no incentives to ensure that the service works as intended, regardless of intent.

amelius · on Jan 2, 2023

A blockchain does not magically solve security issues.

In fact, it adds traceability. And data stored in it can never be deleted. Just to name a few issues.

menzoic · on Jan 2, 2023

> A blockchain does not magically solve security issues.

This is a weird statement. Blockchain security is real and it isn't "magic". Blockchain is specifically designed to secure decentralized applications.

> In fact, it adds traceability. And data stored in it can never be deleted. Just to name a few issues.

These aren't issues, these are part of the security model. Traceability is fine here because everything is pseudonymous, if you want to avoid that use a chain that has untraceable transactions with zero knowledge proofs (zero traceability).

> And data stored in it can never be deleted. Just to name a few issues.

Storing data on blockchain is extremely expensive. Only hashes are stored on chain, not the data itself. Hashes are much different from encryption because they're irreversible.

birracerveza · on Jan 2, 2023

> A blockchain does not magically solve security issues.

No, but staking is certainly an improvement over "pinky promise", and it requires a public blockchain.

> issues

I'm fairly sure those are features, not issues. You are free to disagree.

taink · on Jan 2, 2023

Are you arguing for the processing of sensitive data on a public proof-of-stake blockchain?

First, automating the detection of malicious acts against sensitive data seems pretty difficult. So this can't be implemented to systematically occur, and has to be determined after the fact by an investigation. Then, if a malicious act has been detected, the stake is slashed (and the acts are reverted where possible).

Is my understanding sound so far?

Because this would mean in any case where a slashed stake is considered an "acceptable cost" to the bad actor, then the sensitive data is fairly accessible -- the stake is effectively a paywall. And raising the stake is a difficult decision because higher stake means less actors and higher risk of collusion.

I mean this is probably fine for a very large public blockchain where detecting malicious acts is not as difficult or where the malicious act is not very profitable, but sensitive data can, depending on its nature, be extremely profitable to exploit (and as I stated, I don't see how it could be easily detected).

With sensitive data, "trusting" an organization only means having a legal agreement or strategic alliance with a third party. In these circumstances the consequences are usually more serious for the malicious actor than the loss of an arbitrary amount of money.

I've seen suggestions to do sensitive (e.g. medical) data processing on the ethereum blockchain from some enthusiasts and I have never been able to understand this beyond assuming they have a insufficient threat model in mind for this kind of data.

Galanwe · on Jan 2, 2023

I agree that unfortunately a lot of crypto projects are way too tokenomics centered, instead of utility centered.

BitTorrent style projects are far more restrictive for a lot of applications though. If something is without cost, then it becomes open to abuse.

Take domain names for instance. I would love to have a decentralized name registry, so that no country have censorship power on the _whole_ internet, as we've seen with recent US intervention at the tld level.

DNS is a good example because it's quite trivial to implement with a plain old DHT. The problem though is how do you prevent scammers and squatters in this model?

There needs to be a cost on a distributed database, otherwise after 1 year it will be fully squatted, used as free hosting, store illegal content, DDoS'd for fun, etc.

How to set this cost though, while keeping the distributed nature of this database ? the simplest solution is to let the users decide, over the price of a token, sold by people running nodes, bought by people using the service.

Honestly I love this idea. The problem with crypto currently is that a whole bunch of parasites jump on these tokens to speculate on their price without giving a.. about the underlying utility. This completely screws the price optimum and creates a inflated price bubble, in turn preventing adoption.

scotty79 · on Jan 2, 2023

> ... bunch of parasites jump on these tokens to speculate on their price without giving a.. about the underlying utility. This completely screws the price optimum and creates a inflated price bubble, in turn preventing ...

We have exactly the same problem with real life systems like food, raw materials and real estate.

Galanwe · on Jan 2, 2023

Not quite as much I would say, because crypto tokens are priced based on an over speculation of their potential long term explosion, rather than their more down to earth utility function.This is because their current utility is still to be discovered.

Take the DNS example for instance, this was implemented on Ethereum by "ENS", but the price of ETH/gas at the time made a single ".eth" domain name cost something like $500.

scotty79 · on Jan 2, 2023

> Not quite as much I would say

Way more in terms of money involved, just slower.

college_physics · on Jan 2, 2023

Imho the "print-your-own-money" siren call is only one of the aspects that hampered the whole blockchain world from delivering the disruption it so much craved. The core architectures themselves are somehow too overengineered for broad applicability. Maybe that is what was needed to support the digital gold use case, but it is manifestly not needed for all sorts of other very relevant decentralized applications (bittorrent, fediverse, messaging, email etc)

Its a mute point whether the whole crypto/blockchain period was a net positive. It certainly made a noisy case for "re-decentralization" given the very real and mostly harmful status quo. One could also argue that it diverted vital resources to potentially dead-end or limited use areas. The recurrent scams may also give decentralization a bad name to an uninformed public that can't distinguish all the different versions.

What matters next is that projects that deliver real benefits to users get attention and traction. Worth keeping in mind that the real trouble starts when you get noticed by vested interests as a potential threat.

aortega · on Jan 2, 2023

While superficially similar, BitTorrent and a blockchain are inherently different designs that target different problems. Blockchain is massive data replication, BitTorrent is massive data distribution (with some replication too).

That's why you can actually attack and shut down a bittorrent network, by targeting the index servers, that are not massively replicated. I.E. The Piratebay is often down.

As a solution for this, I'll shamelessly plug my small project here, that combines bittorrents with the blockchain as a invulnerable piratebay-like bittorrent index server, called Blockchain Bay: https://github.com/ortegaalfredo/blockchainbay

It's command line, and don't use any tokenomic scams. You pay the blockchain only for the data you need to upload, that is fortunately, very little as bittorrent magnet links are very small.

spaceman_2020 · on Jan 2, 2023

Well-designed tokenomics with strong utility are incredibly powerful tools to not only incentivize usage, but also direct governance and reissue profits as dividends to participants.

Of course its abused by shady operators out to make a quick buck, but issuing tokens, when done right, is a great innovation by itself.

nuke-web3 · on Jan 2, 2023

https://ipfs.io sans the filecoin aspect that creates incentives for long term/proven storage of data is what you are likely asking for.

O__________O · on Jan 2, 2023

Anyone aware of an open source token-based system that allows users to pool hardware assets, but allow some sort of priority and fairness enforcement to reduce network abuse?

comboy · on Jan 2, 2023

One can't scale without the other.

Alifatisk · on Jan 2, 2023

I sometimes tend to forget that you can use decentralization without using all these crypto stuff.

ShamelessC · on Jan 2, 2023

So does literally the entire “web 3” movement.

szundi · on Jan 2, 2023

Hm, so the emlolyees never turn off their personal computers please and I have a chatbot? Makes sense.