Hacker News new | past | comments | ask | show | jobs | submit login
Petals: Run 100B+ language models at home bit-torrent style (github.com/bigscience-workshop)
594 points by antman on Jan 2, 2023 | hide | past | favorite | 155 comments



So NVLink/NVSwitch pools multiple GPU resources on a single (very expensive) system. A cheaper alternative to that is "offloading", which is a technique that splits the inference process into smaller steps, so it can run on systems with much less resources available... and Petals is a 10x faster alternative to that.

Did I get that right?

This AI stuff is moving very fast and it's hard to keep up, but it's all fascinating.


You're right. This comment explains offloading in more detail: https://news.ycombinator.com/item?id=34216213


Offloading is when the computation is done on the CPU instead of the GPU. DeepSpeed is an example of this.


In case of offloading, the computations are usually still performed on GPU, but the model is hosted in RAM/SSD instead of the GPU memory (and its chunks are copied to the GPU memory when necessary).


A lot of computation is offloaded to the CPU, such as gradients and optimizer states. You are right though that quite a bit of computation is still done on the GPU.


I remember when GPUs were starting to support arbitrary computation and offloading meant shifting work away from the CPU.


Matches my understanding also. Can someone in the know confirm?


Your understanding is correct, but I can't vouch for the claim's accuracy. This could make the execution of models much more accessible to people who don't have a 4x / RTX3090 or better in a ML or mining rig ..


Is there a easy way to run a large language model and/or speech synthesis model locally/in colab? Stable Diffusion is easily accessible and has a vibrant community around AUTOMATIC1111. It's super straightforward to run on a Google Colab. Are there similar open source solutions to LLM/TTS? I believe I had GPT2 running locally at one point, as well as ESPNET2? Not 100% sure it's been a while. Wondering what the state of the art for FOSS neural LLMS and TTS is in 2023.


For LLMs, the closest thing that comes to mind is KoboldAI[1]. The community isn't as big as Stable Diffusion's, but the Discord server is pretty active. I'm an active member of the community who likes to inform others on it (you can see my previous Hacker News comment was about the same thing, haha).

Like Stable Diffusion, it's a web UI (vaguely reminiscent of NovelAI's) that uses a backend (in this case, Huggingface Transformers). You can use different model architectures, as early as GPT-2 to the newer ones like BigScience's BLOOM, Meta's OPT, and EleutherAI's GPT-Neo and Pythia models, just as long as it was implemented in Huggingface.

They have official support for Google Colab[2][3]; most of the models shown are finetunes on novels (Janeway), choose-your-own-adventures (Nerys / Skein / Adventure), or erotic literature (Erebus / Shinen). You can use the models listed or provide a Huggingface URL.

[1] - https://github.com/koboldai/koboldai-client (source code)

[2] - https://colab.research.google.com/github/koboldai/KoboldAI-C... (TPU colab; 13B and 20B models)

[3] - https://colab.research.google.com/github/koboldai/KoboldAI-C... (GPU colab; 6B models and lower)


I'm squarely in Kobold's userbase but hadn't come across it until your post, so thanks for your efforts to spread awareness.


There is (for many, but not all large models). Specifically there is huggingface's accelerate library that let's you run the model partially on your gpu, partially on cpu/ram and what doesn't fit in ram is cached in nvme storage (a mirror of two fast drives recommended).

I didn't have much luck with stock accelerate, but once gpu is disabled (so it runs only on cpu offloading to nvme storage where ram is insufficient) worked pretty well with me. (there is a small code change that has to be done as the stock software refuses to run without gpu-it is a simple change described in its github issues). My gpu is 8gb vram, but this way I managed to run 7b parameter models. In principle I could run a lot larger ones, but of course it takes a lot more time. The 7b bloom takes 90s for one inference and additional 60s to load the model (from a spinning disc array) initially.


Really large (GPT-3-sized) language models have much more parameters than diffusion models, so it's difficult to load them locally unless you have a server with 8x 3090/3x A100 GPUs. Petals is the only way to fine-tune and inference 100B+ parameter models from Colab, as far as I know.


Interesting, how does that work with the multiple GPUs? I'm not familiar with the internal workings of these models, is there anywhere where I can get a brief rundown of how the processing is split. I imagine there can't me much swapping between GPUs as that seems prohibitively slow? How is the model split such that it can be worked on in parallel by multiple GPUs w/o being bottlenecked by IO?


I think this is a relevant link for you: https://huggingface.co/transformers/v4.9.0/parallelism.html

For large LMs, people usually use tensor-parallelism (TP) or pipeline-parallelism (PP). TP involves lots of communication, but uses all GPUs 100% of the time and works faster. PP requires much less communication, but may keep some GPUs idle while they are waiting for data from others.

Usually, TP is used when you have good communication channels between GPUs (e.g., they are in one data center and connected with NVLink), while PP is used when communication is a bottleneck (like in Petals, where the data is sent over the Internet, which is much slower than NVLink).


You can split the model across devices with huggingface accelerate library.

Check out the infer_auto_memory_map metho which will optimize the model for your configuration (multi gpu, ram, nvme) and then run dispatch model on with that memory map.


You can read all the gory details here: https://arxiv.org/pdf/2207.00032.pdf


clarification: You can also use offloading on Colab, but inference with offloading is at least 10x slower (see other comment threads). So it can't really be used for interactive inference, but may be used for fine-tuning with large batches/sequence lengths.


Surprised mentioned GPT-J! Here is a colab link: https://colab.research.google.com/github/NielsRogge/Transfor...

Although you need a premium GPU. I admit it's not as good at zero shot or 1-shot as GPT-3 but if you provide examples, you can get as good of output. I feel like the team behind it needs better marketing.


Nice, this looks pretty good. I have Google Colab Pro+ so I can use the 40GB GPUs there. Am I correct that I could also run this locally on 2X 11gb 1080Ti?


Not sure about TTS, but I've trained GPT-2 (a pytorch implementation I think) on my own data and it worked pretty well, also tried eleutherai's 6B model but, couldn't figure out how to run it.. About an "easy way", I don't think such user interface like what Stable Diffusion has got exists as of now.


This is a nice effort if it allows you to run bloom 170B in 1s per token. Just for comparison sake. With a last gen Ryzen cpu (16core) it takes me about 90s to run the model with 32gb ram (the entire model uses few GB of nvme storage too, as 32gb isn't enough ram).

However, I wonder how they prevent abuse. The main page doesn't mention it. As they mentioned block chain I suspect there will be some sort of credits implemented. I'll definitely be watching where this project goes.

Edit:just to clarify the 90s is not the 170b parameter model. It is 7b bloom version. I forgot to mention it and it puts the ability to run a 170B model in 1s in better perspective.


A Petals dev here. At the moment, we're working on a centralized incentive system, no blockchain involved. It will award points if someone is running a server that consistently stays online and returns correct results. Then, users will be able to spend these points for prioritized inference and (maybe) extra features like increased sequence length/batch size limits. This way, the swarm will prioritize people who actually contribute compute and serve others in the remaining idle time.


Is it possible to have the server shutdown predictably when it finishes tasks periodically, and not get penalized? I would like my machine to run while I'm not using it.


Sure! People who disconnect for a while (not necessarily predictably) won't be penalized - it's okay if you suddenly decide to use your GPU for something else, then get back to running a server.


Maybe you could get Bram Cohen to work on this. Seriously, reach out to him, he loves to work on these game theory sorts of things.


I’d say his reputation suffered quite a bit after the whole Chia (that proof of SSD thrashing coin) BS.


This sounds (very narrowly) similar to the Enigma network, a blockchain-based technology that can be used for fully encrypted multi-party computation (MPC). It was one of the earlier blockchain projects that actually had an interesting use case and technology in this quite "overhyped" space. They rebranded to the Secret network [0] a few years back and somehow I don't find this use case/promise back nowadays...website screams all of the Web3 BS buzzwords it seems :(

[0] https://scrt.network


Well yeah, the whole movement is founded in deliberate ignorance of all the existing, _working_ solutions we already have. Also, apparently none of them watched the HBO comedy Silicon Valley.


Just shows nobody actually needs blockchain even for decentralized systems


So this sounds like BOINC but specifically for language neural nets?

It's a very interesting concept, and I quite like the idea of a public, open compute cloud. I'd like to see more detail on security: if I'm going to donate time on my personal machine, I'd like some assurance that the workload is properly sandboxed and can't reasonably access my network or data.

Mostly out of interest, what's the advantage to this over just using the existing BOINC network? I've been running BOINC on and off since the dialup days, it's an extremely mature platform with all kinds of workload capabilities.


During the training, participants only exchange tensors (embeddings, gradients) and never send code to each other. No other peer can execute arbitrary code on your computer - they can only request you to run one of pre-defined BLOOM layers. You can further isolate the Petals server from your machine by running it a Docker container (see the command in the repo).

A client needs to communicate with multiple servers in a specific way to run the model, I'm not sure our communication model can be implemented with BOINC.


I think it is a great start, and I suppose there will be many iterations to ensure fair usage.

It would be interesting to reach a point to be similar to docker where you don't need to load each layer again and you only need your specific layer. The shared models layers would be already loaded, and running multiple models at once would consume less GPU memory.


What an interesting concept - also makes me wonder how BitTorrent could be used for more de-centralizing of data, while keeping it accessible on-demand.


Sounds just like https://ipfs.tech/


rather, ipfs sounds just like bittorrent


It has a unique identifier of data (their main feature), a naming system and some other features, that makes it quite different than bittorrent (as far as two p2p sharing data networks can be different, obviously).


Right, but one works and has widespread adoption. The other does not. And they certainly cover similar ground.


I’ve always wanted to download an ipfs-node and run it on my pc in the background but I’m worried if it will wear down my hard drives?


I tried the chat on http://chat.petals.ml, and it seems to struggle with the current load (as per the disclaimer at the top)

    Human: How is the weather today?
    
    AI: the
    AI theAI)aultAIAI ) course )
    . can?esterday to people?
    ? is to think thatified )
Really cool project though, I wanted to work on something similar.


I replied to me with:

  It is nice today.
Not garbled, but also extremely shallow.


To be fair, so was the question.

This is a language model, not an oracle or an interface to weather forecast data.


It varies from time to time. You can also switch to the few-shot mode to try machine translation, code generation, or other tasks involving longer responses


Won't even load for me atm


> Fine-tuning and inference up to 10x faster than offloading

What is "offloading" in this context?


Offloading is another popular method for running large LMs when you don't have the GPU memory to fit the entire model. Imagine you have an A100 GPU with 80 GB memory and want to generate text with BLOOM, a 70-block transformer model with ~2.5 GB of weights per block. For each token, offloading will load the first 1/3 of the model (~27 blocks) from RAM/SSD to your GPU memory, run a forward pass through them, then free the memory and load the next 2/3, and so on.

It turns out, Petals is faster than offloading even though it communicates over the Internet (possible, with servers far away from you). That's because Petals only sends NN activations between servers (a small amount of data), while offloading copies hundreds of GB of NN weights to GPU VRAM to generate each new token.


Interestingly it sounds like offloading could be made quite efficient in a batch setting if you primarily care about throughput rather than latency. Though I guess for most current LLM applications latency is quite important.


It's mentioned in their paper: https://arxiv.org/pdf/2209.01188.pdf

  Several recent works aim to democratize LLMs
  by “offloading” model parameters to slower but
  cheaper memory (RAM or SSD), then running
  them on the accelerator layer by layer (Pudipeddi
  et al., 2020; Ren et al., 2021). This method allows
  running LLMs with a single low-end accelerator
  by loading parameters from RAM justin-time for
  each forward pass. Offloading can be efficient for
  processing many tokens in parallel, but it has inher-
  ently high latency: for example, generating one to-
  ken with BLOOM-176B takes at least 5.5 seconds
  for the fastest RAM offloading setup and 22 sec-
  onds for the fastest SSD offloading. In addition,
  many computers do not have enough RAM to of-
  fload 175B parameters.


Is a mobile device / edge device a possible participant / source of resources?


What an fascinating concept. I guess this won't be useful for any kind of realtime feedback system, though?


A Petals dev here. It is not real-time, but we think the speed of ~1 token/sec may be enough for some interactive apps such as chat bots (especially, if you show tokens to a user once they are generated). You can try one at http://chat.petals.ml (heads-up: it may be laggy right now due to lots of HN users trying out the system).

Of course, you could do better if you have enough high-end GPUs to host the entire model yourself (3x A100 or 8x 3090). But if you don't, 1 token/sec is much faster than what you get with other existing methods.


I have not read the technical details, apologies for ignorance, but is there an opportunity for caching?


Probably not, since you need to compute the activations of unknown inputs and there could be infinitely many variations of them


What are the speeds of other existing methods?


Theoretical best-case for RAM offloading is 5.5 sec/token, for SSD offloading - 22 sec/token. Implementations we've tested are not faster than 10 sec/token though. See details in our paper: https://arxiv.org/pdf/2209.01188.pdf


Why not?


How would one make a reliable realtime system that depends entirely on unknown network conditions? Perhaps inside a closed network it is possible.


That's orthoganal to a realtime system. You can infer at a fair speed so realtime would be possible.


Guarantees are not orthogonal to realtime feedback, they are essential. If I write a query, it is not irrelevant whether it takes 1 second or 1 minute to return at any given moment.

You write that speed can be inferred, but the analogy that was used here is BitTorrent—and my experience with BitTorrent tells me that it certainly cannot be inferred.


If you read the article text and the response from the dev then yes, inference can happen at 1/s or if parallelised, more. I'm not sure what your parameters are for a realtime system. If you're talking about network reliability, that's a different issue. Yes it can infer quickly, can it do it reliably is another matter.


Anyone participating in the swarm is able to potentially log the tokens that get processed by their node. Obviously a security concern. Is there any way to implement homomorphic computing to securely process the tokens?


A Petals dev here. Indeed, the public swarm should not be used for any kind of sensitive data (we have warnings about that in the instructions). If someone wants to process such data, we recommend to set up a private swarm among the orgs they trust (e.g., a couple of labs/small companies who don't have many GPUs themselves may set up a private swarm and collaborate to process their datasets).

Regarding homomorphic encryption (HE), I'm afraid the current methods to run neural networks in the HE fashion involve 10-100x slowdown, since they are mostly not designed for floating-point operations. We'd love to find a way to do it faster though, since privacy is obviously an important issue for many tasks.


Hi there, thanks for taking the time to answer questions! There are numerous use cases where even a 100x slowdown would be acceptable if it was demonstrable able to process sensitive data. Can you help me understand what kind of a slowdown that is? Could the 10-100x slowdown be overcome by more compute nodes, or would it require the nodes themselves to be 10-100x faster for e.g.?


If someone wants to process sensitive data and is okay with 10x slowdown, it's better to use offloading. This is another, slower method for running large LMs locally without high-end GPUs, see details here: https://news.ycombinator.com/item?id=34216213

In other words, if Petals nodes became 10-100x slower, Petals would lose its competitive advantage over simpler methods that don't communicate over the Internet.


Is the MIT license that this uses compatible with the RAIL license that Bloom uses? Or are there not issues with that?


BLOOM is a large LM, and Petals is a tool for running large LMs (not necessarily BLOOM). People using Petals should still follow the model's terms of use regardless of how the tool is licensed.


Thanks for the clarification


Any plans for releasing an API spec that would allow for access from languages other than Python?


There's a lightweight HTTP API for inference: https://github.com/borzunov/chat.petals.ml#http-api-methods


Are there basic stats on real-time contributors and latency?


What's the point?

So you can get predicted text that looks "coherent". Then what?

There is literally no place to add logic. Neural net-based language models are impressive, sure, but it's not hard to see how useless they are.

The only time their output is logically coherent is when they are lucky, and that seems to happen often because most of their input was logically coherent to begin with.


Whether or not the current technology is useless is an empirical question. How many people are using ChatGPT, Stable Diffusion, etc. for economically or personally valuable activities? We actually don't know.

Even if we assume the technology is useless in its current state, it is still incremental progress. Could we have predicted 10 years ago what neural networks would be capable of today? Now, tell me what neural networks will be doing in 10 years. If you think you know the answer with any degree of certainty, you're probably deluded.


My point is that ML-based NLP (like chatGPT) has a clear ceiling, and we seem to have reached that.

We can get coherent (understandable) output all day long, but we can never introduce logic.

ML-based NLP is a semantic word-guessing machine. It's based on entirely on how often words show up near each other in the training datasets. There is no room to add logic.

The entire exercise is like a magic trick: impressive sure, but at the end of the day, a fool's errand.


You make very strong claims about things we know very little about. It's far from clear that we have reached a ceiling. Who can predict how systems with 10x the parameters and as-yet-undiscovered deep learning models will behave?

We don't understand how humans do logic. It's entirely possible that whatever structure in the human brain is responsible for handling logic can emerge in a neural network.

If we're talking about what it takes to get to true AGI in the near future, then I agree that a pure neural network approach might not cross the finish line first. I think Stuart Russell made this point in an interview, basically saying that a neural network is a very inefficient computational model and that we could do the same thing much more efficiently if we had the right "good old fashioned AI" algorithm. But fundamentally a neural network is just computing a function so there's nothing in principle preventing a neural network from doing whatever a symbolic system does. It's mostly a matter of efficiency and hardware availability.


But we do know plenty about it. It's right there in front of us. Pretending there is some understanding just out of reach is called mysticism.

What you are telling me is that I should place my expectations for the future, not on the reality in front of me, but on the hopes and dreams you have for the future. That's circular reasoning.

The very reason that I don't place credibility in your assertions is the lack of reason itself: in your assertions, and in what a neural network is.

Neural networks are like dreams. Wonderful only when your intention is to get lost in a swirl of memories. Useless if you want to actually accomplish something.

Knowing the difference is crucial, because that difference can never be taught to a neural network without completely redefining what a neural network is in the first place.

Knowing the difference is literally the thing neural networks are incapable of doing. They don't know anything. They just guess. That's literally the function. In the code. Guess what comes next.

There is no sense pretending sense itself will magically appear out of a guessing machine. Neural networks are nonsense generators, and that is what they are forever doomed to be.


You made two claims: (1) current language models are useless and (2) current language models have reached a ceiling. I said:

How many people are using ChatGPT, Stable Diffusion, etc. for economically or personally valuable activities?

If (1) is true, then the answer to that question is "zero" or at least "close to zero". Do you really believe that?

If (2) is true, then it is also true to say that transformer models will never exceed today's capabilities by a significant amount at any time in the future. Do you really believe that?


Yes. The ceiling is the floor.

The limitation is inherent in the core design. There is no overcoming. This is not a hurdle or a wall. It's a design flaw.

Is it totally useless to everyone? No. Not completely. It's like a coherent search engine: a way to find data that is close to other data. But "close to" in this case is only "semantically", and never "logically", so that's that.

Is it going to get any less useless than it is? Only slightly. "It" will never get better. The only better version of "it" is a completely new ground-up redesign that doesn't resemble "it" at all.


Modern neural network architectures are Turing complete [1]. So I don't see any argument for a limit in principle unless you are arguing that a Turing machine can't achieve language understanding. If that's what you're saying, then I wonder who is espousing mysticism here.

[1] https://arxiv.org/abs/1901.03429


Are you forgetting the distinction between a program and a computer?

Language understanding doesn't magically spawn itself as a process on your computer! Someone has to write that program first.

And that's my point. ChatGPT transforms language, but it does not understand it. For that, we will need a different kind of program.


Language understanding doesn't magically spawn itself as a process on your computer! Someone has to write that program first.

Do you think it's impossible for such a program to emerge as weights in a Turing complete neural network architecture?


Play with https://chat.openai.com/ to experience how powerful predicting text is.


I have.

And as I said, it's very impressive.

And it has some usefulness: essentially it's an alternative to reading through many pages/posts of StackOverflow and Wikipedia.

But it doesn't know anything. It has no clue whatsoever whether it is correct or incorrect. It only makes guesses. The only reason there is useful output is because that output is a transformation of useful input.

There is no logic. There is no way to introduce logic. There is no way to filter it through logic.

If some coherent mixture of the ML's training datasets already contains the answer to your question - like literary or code examples, definitions, etc. - then the output will be useful. Otherwise, it's just wrong, and sometimes unexpectedly so.

The output of chatGPT (or any other ML-based NLP) can only be as correct or knowledgeable as the data it is trained on; and it will practically never even match that level, because it is only mixing words by semantic popularity, never by logical relationship.


Chat bot interfaces are only a small part of what can be done with large LMs.

You can use and fine-tune them to solve almost all existing natural language processing tasks: machine translation, recommendation/search, text classification and summarization, code generation, etc.


False.

You can use them to transform already existing text and code (the training datasets); but you can never do more than that.

There is no room in the ML algorithm to introduce logic. It's doomed to forever be a guessing game; and the resulting guesses will always be limited by the information it is fed to begin with.

The only reason chatGPT is so impressive is that it is transforming human conversation that itself is impressive (except that we were already aware of it). The code generation, literature, and definitions, etc. it outputs are all just rephrasing the written code, literature, and definitions that it was given as training data.

It's effectively no more than a sleight-of-hand. Flashy and impressive, but never anything more.


You should read this: https://ai.googleblog.com/2022/11/characterizing-emergent-ph... .. and probably also the paper.

I find these emergent phenomena pretty interesting.


They are missing the forest for the trees.

The "emergent phenomena" can be trivially explained by the input they are giving it.

They are not using a dataset that contains an equal amount of "correct" and "incorrect" responses. They are using datasets of human communication, which are obviously filtering for "correct" data. We get things wrong occasionally, but that is quite rare relative to what we get right. We can't even structure a sentence without getting something correct!

If you feed a dog good food, is it really a surprise that dog is healthy? You never fed it poison!

The language model is only returning semantic relationships. The "emergent phenomena" is that most semantic relationships in human communication just happen to also be logical relationships.

But the language model doesn't know that. In no way does it interact with logic. It only interacts with semantics.

If anyone actually bothered to train an instance of GPT or whatever on poisoned data, (i,e nonsensical stories) then you would see that emergent phenomena disappear. But no one is writing the nonsensical stories in the first place, so such a dataset does not exist.


You're at least a few weeks behind the state of the art.


Can anyone get it to write code? It just says it has written the code to the file system when I prompt it.


You can switch the chat bot (http://chat.petals.ml) to the "few-shot mode" and provide a couple of "task description & code" examples. Then you can add a new task description and it'll respond with code.

The underlying LM, BLOOM, had a few programming languages in its dataset, so it works at least with Python and C++.


Could this approach be used on other types of models, such as image models eg stable diffusion?


There is probably less motivation, as these models are much smaller. An M1 Mac can run inference in under a minute. An GPU in as little as 3-5 seconds. This is supposed to be as much as 20x faster when the distilled Stable diffusion model is released.


People nowadays will use 100 GB in VRAM to run a model that taught itself how to do quicksort.


I guess that it could be used to create a private swarm, if one has a lot of hardware at home.


So... How can I mine crypto with this? :)


What is the copyright status of these models?


Petals runs BLOOM, an open-source, publicly released model of the same size as GPT-3. Here's a description of the data used to train this model: https://huggingface.co/bigscience/bloom (the "Training" section)


BLOOM is not open source. It has the RAIL license, which exists solely to place restrictions on the use of the software, as well as forcing people to update. Read more: https://bigscience.huggingface.co/blog/the-bigscience-rail-l...


Thanks. It contains copyrighted material and is therefore illegal to use.


exciting, will surely check the git repo


who is working on blockchain + web3 + AI inference => Decentralized AI besides me ?


Increasingly fewer people are interested in scamming people after the FTX debacle, from what I can tell


Can you elaborate on what you are doing?


I have been interested in 2 of those 3. Where are you working on them?


I would love for most of the Blockchain trend to be converted in efforts towards BitTorrent style projects.

Distributed File Sharing or computation without the whole tokenomics that, while interesting, creates too much attention from scammers.


That's not going to happen, because "distributed" was always a misnomer when it came to blockchain things. "Massively redundantly replicated" would be better. If work is distributed, every participant has a little piece of the work to do, but in e.g. blockchain contracts, all the participants need to do the whole calculation.


That's changing. The high replication to verify untrusted peers is decreasing and there's a realistic prospect of it going away due to gradual adoption and development of new zk-proof techniques.

In zk proofs-of-computation-result, different nodes can perform different intensive parts of a calculation and send the results along with proofs that those are the correct results. Other nodes can accept the results and verify the proofs with remarkable efficiency, then use those partial results for further calculations. To me it still feels counterintuitive and almost magical that any large, arbitrary computation result can be easily verified without repeating the computation, without the verifier needing much memory or data.

For cryptocurrency blockchains this allows smart-contract (computational) transactions to be accepted with only one node having to execute the code, everyone else just efficiently verifies the proof to accept the state change. As proofs can be aggregated, this scales well: it isn't necessary for every node to run all the verifications, either.

For big, distributed calculations like the article's, the whole calculation can progress using those partial results without having to rely on trust and reputation, and everyone can have high confidence that the final result is what it should be, not undermined by subterfuge or subtly inaccurate contributions.

This is an offshoot of zero-knowledge proofs, as ironically zero-knowledge is not required for these types of applications. Just the efficient verifiability part.

(Fwiw, I am working on large, scalable zk-proofs-of-computation in my spare time, in optimised software and with hardware accelaration, if anyone is interested in discussing this stuff.)


> To me it still feels counterintuitive and almost magical that any large, arbitrary computation result can be easily verified without repeating the computation, without the verifier needing much memory or data.

Why counterintuitive? That’s kind of all of cryptography and most of computer science. Take factoring into primes (which has been done for forever): it’s really time consuming and expensive to determine what the prime factors for a number are, particularly if it’s a big number and you know it only has two. That’s because division is very very difficult and time consuming. Multiplication on the other hand is super cheap so once you tell me the prime factors, I can confirm much more quickly whether or not they’re factors.

In computer science, one of the earliest identified computation classes is NP complete which has this property. Eg traveling salesman and knapsack packing problem are examples. It can be insanely difficult to find a path that exists between two cities in a graph under some cost. But if you give me a solution I can easily confirm whether it meets the criteria (global optimality testing is itself NP complete but if you give me a set of solutions you can verify which one is the cheapest).

I’m not claiming that factorization is NP btw. There are complexity classes beyond NP that share this property. https://cstheory.stackexchange.com/questions/159/is-integer-...

Anyway. ZK proofs themselves are super surprising and not intuitive but not because verification is fast but because verification reveals nothing to the verifier about the solution. That’s the mind blowing result.


It's really different from the cryptography you're describing, although they do have the theory of NP in common. Mathematical cryptography depends on the assumed hardness of just a very few problems, notably factorisation of numbers with two or more large factors, and discrete logarithm in some groups. The particular problems are very specific.

What I find remarkable is that zk-proof-of-computation works for any kind of computation. On the face of it, it might seem that some computations would resist being compressible that way, but no, it works with anything that can be run on any real computer.

It doesn't depend on what kind of computation, so it has nothing to do with which program, how it's written, the complexity class (linear time, P, NP-complete, superexponential etc), or even on the size of the problem. It doesn't even depend on how much memory the problem requires. You can have a computation that requires terabytes or exabytes of RAM to compute, and the world's largest supercomputer running for a decade: The proof that the output is correct, no matter how much complexity went into calculating it, is still small and fast to verify.

But you still have to do the computation somewhere to get the proof. That's why it's called "argument of knowledge", because the entity constructing the proof must have access to ("knowledge of") the computation.

So it's still about feasible computations. Usual zk-proof-of-computation can't be used to prove things larger than there's a computer able to compute.

That boundary is different from cryptography (and P vs NP), which is more about verifiability of problems requiring exponentially larger time and/or space to solve if you don't have the secrets, so if the parameters are suitable, these are about infeasible computations by any physically realisable computer.

The connection is that that zk-proofs-of-computation are about making proofs of feasible computations, while ensuring it's infeasible to compute a false proof, or to find the secret inputs if there are any (there don't have to be).

(By the way, you may be thinking of discrete logarithm not division. Division is not difficult. In finite fields such as used in cryptography, division can be computed by constant exponention using Fermat's Little Theorom, and exponentiation takes logarithmic time in the size of the field using a repeated squaring method. Division is slower than multiplication, but not prohibitively so; it's used in elliptic curve operations. The hardness of factorising certain numbers is for a different reason than division.)


BitTorrent then, under this perspective, is "less massively, less redundantly replicated" since people only hold onto and seed the torrents whose files they store locally, but they do not just go on and download and seed everything out there. Seems like a nice compromise to me, but obviously risks data becomming irrevocable in some cases.

Is this also how IPFS works?


I don't know much about it, but I would assume that some effort would be made by miners to look in different parts of the search space - you don't want 1000 machines looking checking the same hashes, in order, after all.


How would you incentivize nodes long-term without a reward/token system?


Bittorrent basically runs on social norms, and people's willingness to help their community. Or people go to private trackers, where contributing is a requirement for being in the community and having access to its resources.

Blockchains are built under the assumption that everyone is selfish and untrustworthy. Which is a decent assumption when building a crypto currency, but that doesn't mean that every system has to run like that.


As much as I like private trackers, very few use the ratio-less model to protect against serial leechers.

Typically on a tracker you’re given a currency (although not as sound as some e-coins) and can use that to influence your upload or download statistics, which in turn affect your ratio. Some trackers might employ rules where your user class has to have a certain ratio, or else you’ll lose privileges like certain forums or even the ability to download at all. (The trackers are private and can control which peers you can see)


A token system can incentivize cooperation in a hostile environment with selfish nodes. But in a friendly environment you might not need the same level of incentives.

I see some similarity here to the world of private torrent trackers. You want a Linux ISO, I want a Linux ISO, we're all working towards the same goal. So we're already incentivized to cooperate, without getting money involved. And trackers also have things like minimum seeding ratios to keep people honest. In the case of AI, you and I both want to generate images, so we're also working towards the same goal, so let's help each other out so both of our workloads finish faster. Maybe idealistic, but I think it could work.


In the heyday of BitTorrent sharing was the incentive, and still is I imagine. We don't need to financialize everything.


That's more cryptocurrency than blockchain.

Blockchain as such has nothing to do with the costs of a node and incentives to run one


"That's just the nose, see the nose can exist discretely without the finger to pick it."


That's a different question. But BitTorrent does fine with simple tit-for-tat rules, no tranferable tokens required.


Why not just "decentralized"? I'm not sure I ever saw a blockchain that was posited as "distributed".


Blockchain is a distributed ledger.


Looking into it, you're correct, but only insofar as we're referring to two different meanings of distributed. Bitcoin is a distributed ledger, as in "fault tolerant". But the consensus mechanism is decentralized. Thus, the decision-making is not distributed.

A distributed consensus mechanism would segment the decisions amongst nodes, not poll for a unanimous response.


This is a very articulate and interesting way to put it.

"Massively redundantly replicated"


While I understand where you are coming from, this argument is basically "money attracts attention from scammers". As a counter I would say "Money attracts attention, period" and attention is an important resource to foster growth.

Decentralized tech would never be where it is today if it weren't for investor attention and the potential for gains. We just have to separate the wheat from the chaff, and remain vigilant for bad actors.


What I mean is that the current signal-to-noise ratio is way too weak.

This created a lot of bubbles. NFTs are already down by a lot, now yield farming (https://www.bloomberg.com/news/articles/2022-04-25/sam-bankm...) just took a big hit from the FTX case. I see way too many "revolutionnary" projects from fresh graduates. There is no way that tens of thousands of inexperienced people with barely enough CS education to pass programming interviews would magically create innovation just because VCs put a ton of money on them.

Also, can you tell me more about where decentralized tech is today? BitTorrent was a revolution as a way of information sharing, Onion was a revolution for privacy and Bitcoin was a revolution for decentralized ledgers.

Starting from that, IPFS is the continuation of BitTorrent with more features and Ethereum is a more efficient (especially since The Merge) and customizable (smart contracts are advanced checkers for write operations) ledger.

But what are the real world applications of those technologies? What are concrete use cases of Ethereum and IPFS besides payments, records and file sharing?

Surely there are exciting progresses to be made on the technical side like zk-SNARKS but how useful will they be to society?

I think we already have all the technical blocks we need. If there is no real-world adoption maybe we should just wait another 10 years before pumping crazy amounts of money.


*blockchain tech would never be where it is today…

I’ll bet blockchain is only as popular as it is because of the money. But other forms of decentralization like Mastodon or Matrix are pretty separate from the whole crypto sphere


Matrix is separate from the crypto sphere, because it solves a different problem.

Federated platforms appeal to the privacy-oriented "f** big tech" mindset, which is pretty common in the hacker & FOSS crowds. I'd put it in the same category as VPNs, E2E messengers and TOR.


VPN are really usefull for business to link different locations with internet instead of using (awfully costly) dedicated link... or to allow remote work

So VPN are not really in the same category


VPN as a technology, yes. But I think "VPN" in this discussion is referring specifically to the myriad of consumer-oriented paid solutions (SurfShark, NordVPN, whatever) that are pitched as being about protecting your online security, pirating with impunity, and bypassing region-locks.


Totally agree on that.

Big corps only invest in blockchain because of the buzz words that are used as marketing by the consulting firms to sell their "expertise" and by VCs to sell their companies.

Sure they hope to gain some money, like luxury brands wanting to sell to crypto-billionaires. But crypto was a useful toy, then Ponzi scheme and now it's a closed loop. How long will the bubble last?


The "chaff" and the bad actors are in it for the money. Without them, "decentralized tech" indeed wouldn't be where it is today -- meaning, it wouldn't be overwhelmingly associated with crypto-adjacent grifts.

The real decentralized tech, the one that serves a purpose other than emptying the wallets of naïve crypto-enthusiasts, does just fine without a profit motive. You don't need get-rich-quick promises to get an audience if you're actually doing something useful.


Where exactly is decentralized tech today?

Nobody around me ever uses any of it. Old p2p networks (gnutella, kademlia, emule) had way larger impact on society 20 years ago.


BitTorrent isn't getting any smaller. Mainline DHT is still 10x bigger than Bitcoin.


Email and the web are more decentralized than they look. Just think that different FOSS and closed source user agents and servers interoperate without any problem, especially for email.


They were talking about crypto. Email and the web were born out of research & universities, not the “potential to attract investors”.


And in non-web space, there's Matrix (https://matrix.org).


> Decentralized tech would never be where it is today if it weren't for investor attention

That’s exactly why blockchains haven’t found Product Market Fit.

Investors != Users


> Distributed File Sharing or computation without the whole tokenomics

They went hand in hand even back in the day: private torrent trackers were all about tokenomics where tokens were the number of bytes you've seeded (uploaded) minus you've downloaded.

I'm not saying it's impossible to imagine distributed file sharing otherwise, but to "guarantee" the availability of (especially unpopular) content, you need some incentive mechanisms either built in to the protocol or externally imposed.


From the project's readme:

>Please do not use the public swarm to process sensitive data. We ask for that because it is an open network, and it is technically possible for peers serving model layers to recover input data and model outputs or modify them in a malicious way. Instead, you can set up a private Petals swarm hosted by people and organization you trust, who are authorized to process your data.

This is what blockchain and staking tokens is for. (Part of the reason, at least)

You act maliciously, the network slashes your stake. "pinky promise not to do bad stuff" only goes so far... and it's really not far at all. You can trust "trusted" organizations or private individuals, but they have no incentives to ensure that the service works as intended, regardless of intent.


A blockchain does not magically solve security issues.

In fact, it adds traceability. And data stored in it can never be deleted. Just to name a few issues.


> A blockchain does not magically solve security issues.

This is a weird statement. Blockchain security is real and it isn't "magic". Blockchain is specifically designed to secure decentralized applications.

> In fact, it adds traceability. And data stored in it can never be deleted. Just to name a few issues.

These aren't issues, these are part of the security model. Traceability is fine here because everything is pseudonymous, if you want to avoid that use a chain that has untraceable transactions with zero knowledge proofs (zero traceability).

> And data stored in it can never be deleted. Just to name a few issues.

Storing data on blockchain is extremely expensive. Only hashes are stored on chain, not the data itself. Hashes are much different from encryption because they're irreversible.


> A blockchain does not magically solve security issues.

No, but staking is certainly an improvement over "pinky promise", and it requires a public blockchain.

> issues

I'm fairly sure those are features, not issues. You are free to disagree.


Are you arguing for the processing of sensitive data on a public proof-of-stake blockchain?

First, automating the detection of malicious acts against sensitive data seems pretty difficult. So this can't be implemented to systematically occur, and has to be determined after the fact by an investigation. Then, if a malicious act has been detected, the stake is slashed (and the acts are reverted where possible).

Is my understanding sound so far?

Because this would mean in any case where a slashed stake is considered an "acceptable cost" to the bad actor, then the sensitive data is fairly accessible -- the stake is effectively a paywall. And raising the stake is a difficult decision because higher stake means less actors and higher risk of collusion.

I mean this is probably fine for a very large public blockchain where detecting malicious acts is not as difficult or where the malicious act is not very profitable, but sensitive data can, depending on its nature, be extremely profitable to exploit (and as I stated, I don't see how it could be easily detected).

With sensitive data, "trusting" an organization only means having a legal agreement or strategic alliance with a third party. In these circumstances the consequences are usually more serious for the malicious actor than the loss of an arbitrary amount of money.

I've seen suggestions to do sensitive (e.g. medical) data processing on the ethereum blockchain from some enthusiasts and I have never been able to understand this beyond assuming they have a insufficient threat model in mind for this kind of data.


I agree that unfortunately a lot of crypto projects are way too tokenomics centered, instead of utility centered.

BitTorrent style projects are far more restrictive for a lot of applications though. If something is without cost, then it becomes open to abuse.

Take domain names for instance. I would love to have a decentralized name registry, so that no country have censorship power on the _whole_ internet, as we've seen with recent US intervention at the tld level.

DNS is a good example because it's quite trivial to implement with a plain old DHT. The problem though is how do you prevent scammers and squatters in this model?

There needs to be a cost on a distributed database, otherwise after 1 year it will be fully squatted, used as free hosting, store illegal content, DDoS'd for fun, etc.

How to set this cost though, while keeping the distributed nature of this database ? the simplest solution is to let the users decide, over the price of a token, sold by people running nodes, bought by people using the service.

Honestly I love this idea. The problem with crypto currently is that a whole bunch of parasites jump on these tokens to speculate on their price without giving a.. about the underlying utility. This completely screws the price optimum and creates a inflated price bubble, in turn preventing adoption.


> ... bunch of parasites jump on these tokens to speculate on their price without giving a.. about the underlying utility. This completely screws the price optimum and creates a inflated price bubble, in turn preventing ...

We have exactly the same problem with real life systems like food, raw materials and real estate.


Not quite as much I would say, because crypto tokens are priced based on an over speculation of their potential long term explosion, rather than their more down to earth utility function.This is because their current utility is still to be discovered.

Take the DNS example for instance, this was implemented on Ethereum by "ENS", but the price of ETH/gas at the time made a single ".eth" domain name cost something like $500.


> Not quite as much I would say

Way more in terms of money involved, just slower.


Imho the "print-your-own-money" siren call is only one of the aspects that hampered the whole blockchain world from delivering the disruption it so much craved. The core architectures themselves are somehow too overengineered for broad applicability. Maybe that is what was needed to support the digital gold use case, but it is manifestly not needed for all sorts of other very relevant decentralized applications (bittorrent, fediverse, messaging, email etc)

Its a mute point whether the whole crypto/blockchain period was a net positive. It certainly made a noisy case for "re-decentralization" given the very real and mostly harmful status quo. One could also argue that it diverted vital resources to potentially dead-end or limited use areas. The recurrent scams may also give decentralization a bad name to an uninformed public that can't distinguish all the different versions.

What matters next is that projects that deliver real benefits to users get attention and traction. Worth keeping in mind that the real trouble starts when you get noticed by vested interests as a potential threat.


While superficially similar, BitTorrent and a blockchain are inherently different designs that target different problems. Blockchain is massive data replication, BitTorrent is massive data distribution (with some replication too).

That's why you can actually attack and shut down a bittorrent network, by targeting the index servers, that are not massively replicated. I.E. The Piratebay is often down.

As a solution for this, I'll shamelessly plug my small project here, that combines bittorrents with the blockchain as a invulnerable piratebay-like bittorrent index server, called Blockchain Bay: https://github.com/ortegaalfredo/blockchainbay

It's command line, and don't use any tokenomic scams. You pay the blockchain only for the data you need to upload, that is fortunately, very little as bittorrent magnet links are very small.


Well-designed tokenomics with strong utility are incredibly powerful tools to not only incentivize usage, but also direct governance and reissue profits as dividends to participants.

Of course its abused by shady operators out to make a quick buck, but issuing tokens, when done right, is a great innovation by itself.


https://ipfs.io sans the filecoin aspect that creates incentives for long term/proven storage of data is what you are likely asking for.


Anyone aware of an open source token-based system that allows users to pool hardware assets, but allow some sort of priority and fairness enforcement to reduce network abuse?


One can't scale without the other.


I sometimes tend to forget that you can use decentralization without using all these crypto stuff.


So does literally the entire “web 3” movement.


Hm, so the emlolyees never turn off their personal computers please and I have a chatbot? Makes sense.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: