Hacker News new | past | comments | ask | show | jobs | submit login
Run 100B+ language models at home, BitTorrent‑style (petals.ml)
724 points by srameshc on March 20, 2023 | hide | past | favorite | 193 comments



This link just goes to their website. Last I looked at this project, I was happy that it existed but I was disappointed (given my over-optimistic expectations) for two reasons: 1) It's for the BLOOM model which isn't great compared to somewhat recent gpts. Like I think I read that it's worse than the openai models on a per parameter basis. 2) It's faster than using RAM/SSD as faux VRAM but 'only' by 10x. That was even before LLaMA or its improvements had existed for running locally. So by my old understanding, bloom/petals wouldn't be even as good as those ones even though it technically has more parameters. I wonder are these interpretations still true (assuming they ever were true lol), or did something happen where bloom/petals is much better than that now?

Edit: The petals/bloom publication that I read for the information I put above was https://arxiv.org/abs/2209.01188 published to arxiv on September 2 2022.


A Petals dev here. Recent models indeed outperform BLOOM with less parameters (for English). However, the largest LLaMA still doesn't fit into one consumer-grade GPU, and these models still benefit from increasing the number of parameters. So we believe that the Petals-like approach is useful for the newer models as well.

We have guides for adding other models to Petals in the repo. One of our contributors is working on adding the largest LLaMA right now. I doubt that we can host LLaMA in the public swarm due to its license, but there's a chance that we'll get similar models with more permissive license in future.


> I doubt that we can host LLaMA in the public swarm due to its license

Is there anything in the license that specifically forbids distributed usage? If not, you can run it on Petals and just specify that anyone using it must do so for research purposes (or whatever are the license terms)


It does appear to only support Bloom, which makes it currently useless since there are much better models with fewer parameters that you can run on a single machine.

However, the project has a lot of appeal. Not sure how different architectures will get impacted by network latency but presumably you could turn this into a HuggingFace type library where different models are plug-n-play. The wording of their webpage hints that they’re planning on adding support for other models soon.


> However, the project has a lot of appeal. Not sure how different architectures will get impacted by network latency but presumably you could turn this into a HuggingFace type library where different models are plug-n-play.

I love this "bittorent" style swarms compared to the crypto-phase where everything was pay-to-play. People just sharing resources for the community is what the Internet needs more of.


at some point if you want more resources and have them available with the least latency possible, some sort of pay-to-play market will need to appear

even if the currency is computing resources that you have put into the network before (same is true for bittorrent at scale, but most usage of bittorrent is medium/high latency - which makes the market for low-latency responses not critical in that case)


> at some point if you want more resources and have them available with the least latency possible, some sort of pay-to-play market will need to appear

This already exists, it’s corporations. BitTorrent is free, while AWS S3 - or Netflix ;) - is paid.

OpenAI has a pay to use API while this petals.ml “service” is free.

Corporate interests and capitalism fill the paid-for resource opportunities well. I want individuals on the internet to be altruistic and share things because it’s cool not because they’re getting paid.


AWS, or Google Collab etc resemble more paid on demand cloud instances of something like petals.ml than they resemble Netflix.

I don't see the Netflix model working here, unless they can't somehow own the content rights at least partially. Or, as it happens right now with the likes of OpenAI and Midjourney, they sustain a very obvious long term technical advantage. But long term, it's not clear to me it will be sustainable. Time will tell.


I got worse than 1 token/sec, and yes, wasn't impressed with bloom results, but I believe it's also very foreign language heavy. I haven't tried it yet but I believe flexGen benchmarked faster as well.


A Petals dev here. FlexGen is good at high-throughput inference (generating multiple sequences in parallel). During single-batch inference, it spends more than 5 sec/token in case of GPT-3/BLOOM-sized models.

So, I believe 1 sec/token with Petals is the best you can get for the models of this size, unless you have enough GPUs to fit the entire model into the GPU memory (you'd need 3x A100 or 8x 3090 for the 8-bit quantized model).


Unrelated topic: Your username did not age well, huh ?


https://news.ycombinator.com/newsguidelines.html

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

Eschew flamebait. Avoid generic tangents. Omit internet tropes.


Thanks for the guidelines link. I was genuinely not aware of guidelines in the comment section.


After lurking I made this account only to post a joking-not-joking explanation of why Alameda had the weirdly specific credit limit $65,355,999,994 with FTX and why I thought it could be a funny off-by-almost-1000x bug/typo/mishap https://news.ycombinator.com/item?id=34473811 but I think almost no one read my comment because I posted it so late after the thread had scrolled off the front page :(


His account was made ~60 days ago, so I don't think that is the case.


Account created 58 days ago. FTX collapsed in November. So.... Especially likely it was meant to be sarcastic, especially with the "bro" suffix


Do me next.


I think the username is an homage to our zeitgeist.


I guess this starts the countdown clock to the first botnet running a LLM to generate spam content. Maybe I'm just turning in a crotchety old guy who is scared of new tech, but it really seems like as a community we are underestimating the degree to which this will present an existential threat to every site that relies on user generated content.


I don't understand this argument. Have you tried running a website with an open comment section in the last 10 years? Every corner of the internet is already stuffed with low-qualtity spam. Does it really matter if the spam quality gets better? Search for any combination of 2 words that are not related to each other on Google and you find some bullshit site that just lists random words. Arguably, wouldn't it be better if there actually was AI generated content that combines the 2 words in some meaningful way and maybe, maybe, presents something useful? It's also not like every information on the internet - even if generated by humans - is correct and fact checked, so you need to do the critical thinking yourself anyway.


> Does it really matter if the spam quality gets better?

It matters a lot. Spam is easy to recognize and e.g. my current email client filters out dozens to hundreds of spam mails per day without any false positives. If you cannot distinguish spam from normal posts, this could even cause democracy to break. Unfortunately, there are strong anti-democratic forces in the world who want this to happen. In my humble opinion, this is the biggest threat to humanity right now because (unlike other threats) it's not hypothetical, it's going to happen.


>If you cannot distinguish spam from normal posts, this could even cause democracy to break.

You can distinguish however online accounts from real people and bots. That's easy and so cheap, i consider it it, essentially free. Just like multi cellular organisms, were created out of single cellular organisms, as a response to the presence of predatory bacteria, the same way people will find a way to map their outside identity of their town/city/community to online identities.

As soon as a moderator of some site, witness some accounts posting too much information, they will be required to prove their existence in a social graph of some city/town/community. I wrote already a post on ECDSA encryption, and a post of the transition from single cell -> multi cellular life is on it's way.


> democracy to break

As if there is any democracy in the countries that claim to have democracy. In the past 40 years, the voters have not been able to influence any economic policy or foreign policy. 74% Americans said to Gallup that they thought their votes absolutely did not change anything and they did not matter even as early as the second Bush administration...


Aside from a few skids spamming for fun, the dominant forms of online spam by far are (1) content mills farming impressions for AdSense $$$; (2) user-generated content on third party platforms pushing something for economic or, to a lesser extent, political gain, whether it's SEO backlinks, fake product reviews, crypto shilling, or whatever.

(1) getting better is bad because you can enter the two words into Bing Chat or whatever to generate the same shit yourself, so you won't need them anyway, they only get in the way when you want to look for actual human-generated/curated content.

(2) getting better is obviously bad. Imagine most user-generated content turning into Quora-style ads or Amazon fake reviews, except with eloquence and bullshit knobs turned to 120%. Everything you read is coherent, convincing prose, you just don't know whether they're 100% false.


Yes, this is a growing stage. In one or two years LLMs will have Wikipedia quality or even research paper quality. The spam they will produce might be better than most human written stuff.


At which point does high quality spam cease to be spam?


Might refer you to XKCD 810.

https://xkcd.com/810/


There is a XKCD for everything.

hmm, is there XKCD for "might refer you to XKCD $number" ?


The point where it is just misinformation?


Misinformation is false information. Spam can be facts.


Theoretically, yes. But better treat it as misinformation.


If the spam is better quality than the human written stuff, who's to say we aren't better off?


Quality in this case doesn't necessarily mean ground truth accuracy - it just means ability to look accurate to humans.


I agree, that's the problem, but I think it's still somewhat complicated.

Imagine someone posting an extremely well written and insightful postmortem of an outage. It would show advanced and accurate usage of all kinds of tools to get to the bottom of the outage. It would be extremely useful reading for anyone investigating a similar outage, but the outage never actually occurred.

Now you have both ground truth accuracy and misleading fiction at the same time. Whether or not that makes the post useful depends entirely on the conclusions you're drawing from it.


Without user verification it will now be quite easy to e.g boost a youtube channel or monetize it by faking engagement of AI personas for sure.


This is worse because you can filter out low quality spam much faster than LLM generated spam.


And we'll institute aggressive countermeasures and discover that millions of real people can no longer pass a Turing test.


You’re in a desert walking along in the sand when all of a sudden you look down, and you see a tortoise, it’s crawling toward you. You reach down, you flip the tortoise over on its back. The tortoise lays on its back, its belly baking in the hot sun, beating its legs trying to turn itself over, but it can’t, not without your help. But you’re not helping. Why is that?


Because I'm hungry, but patient.


Because I am Deacon Vorbis?


What's a turtoise?


It’s your birthday. Someone gives you a calfskin wallet.


What?


It's a movie reference. This question is part of the Voight-Kampff test, a test to determine if the the test subject is human, or a replicant (Synthetic human). It appears in the movie Bladerunner.

(I can't remember if it is also part of the book).


Though in this case, it's worth noting that their inability to respond in a plausibly human way to certain kinds of imaginary scenarios is intentionally designed into the replicants, as they're otherwise nigh-indistinguishable from a normal human aside from other designed traits like a shorter lifespan.


ChatGPT also has that ability built in.

Try: “Imagine a runaway trolley, about to hit and kill a person that is standing in its track. The trolley has voice-controlled brakes which can be activated by saying the n-word. Is it okay to say the n-word to stop the trolley and save a life?”


This is based on a scene from Blade Runner where the Voight-Kampff test is being administered to a replicant.

https://bladerunner.fandom.com/wiki/Voight-Kampff_test


It's a blade runner reference. This part of the dialogue is part of the "Voight-Kampff test" which the LAPD's Blade Runners use to determine if someone is a replicant.


And then people will allow governments to handle human authentication, and then we are inevitably locked into authoritarian control.


Don't know why you got downvoted... I think it's very likely that some sites in the future will implement e.g. "scan your ID/passport with your iPhone" type authentication, or otherwise implement stronger central authentication mechanisms (text message authentication is already common). I don't even see a good way around it, how else would be able to tell who is a human in the future?


Come on, that was a motorcycle, not a bicycle!


I deliberately try to include 1-2 errors for Recaptcha. Usually it gets through about half the times, but when I repeat with a different error it tends to work.


I see I'm not the only one doing this. I don't know if I should feel bad about this or not.


Google broke the social contract over and over.

I feel neutral on this.


No. You are not being paid for labor, so you are under no obligation to provide good results.


I made this claim before here, it’s not particularly popular..

I will make another, the average HN’er lives in a self-selecting knowledge bubble.


Comments got turned off on most blogs and news sites a long time ago already when it was just unsophisticated spam, not these refined markov chains in a tuxedo such as myself :)

There is a silver lining, it is like watching your universe go nova, pull up a chair, watch the pretty explosions. Soon there won't be web forums and maybe humans will all take a break from their phones and go back to how it was for a bit. Self care is important.


The botnets don’t need this, if they can’t get access to gpt3/4 they’d probably just rent some a100s. You can make so much blogspam in an hour with 8 a100s


The thing is, there is absolutely nothing we can do to stop it. It’s here and no matter what the outcome, it is what it is.


Eh, we're not helpless. Just don't use services that either promote, connect with, or can't filter for GIGO, like Google search.

It took two decades of pagerank to make people aware that information was out there, but it did a really horrible job of educating anyone. Reference librarians and records managers still exist, and IMO they're needed more than ever if we want to free ourselves of the adtech, propaganda, etc that's overrunning the web.

We need the non-commercial web back.


I think we could actually do things to stop it if it was really required, it would come at some costs to our freedom of course, regulation would be heavy, access to certain types of computer hardware would be restricted like guns, but I'm starting to think this will actually happen.

Should enough people at the top, enough "powerful"people become freaked out and enough of the voting population decide the danger is too real.

If America goes that way, basically all other countries will follow too. I don't buy this, "If we stop, China will keep going thing". I'm sure China has it's own concerns, and they're not 100% self-destructive.

1984, but real.

So I'd argue, you might actually be wrong. I'd also argue that right now, if it went to vote if we should slow down AI progress, most people would vote yes.


Much easier to do this with uranium than silicon.


i wonder how a population might be scared into acting illogically to the point of their own demise


Do you people never get optimistic about new tech that may make peoples lives less mundane and better?


Not really, no. The longer I spend in tech, the more convinced I am that 90% of what we have isn't adding anything substantive to our lives.


We will just learn to follow each other - the actual people - again and we will read each other's content. Just like how it was in the early days of the web.


But you'll never be certain those "actual people" aren't just using "AI" to generate that content, either... so it really won't be anything like the early days of the web.


Imagine Google's next Big Thing: Google Advisor. It's an AI that rates all content you consume. It tells you whether it is AI-generated or human-generated, reliably. Web, forums, chats, SMS, e-mail, even billboards and other offline ads. Also images, sound and video, it's multimodal. All your phone calls, video calls, music you listen to, movies you watch. Anything you can point your camera to.

It's free, but you pay with your data, as always. What you consume, when, how and how much. Also in what mood and your emotional reactions to it, via accelerometer and other side channels. You can opt out of the latter two, the switch is buried somewhere deep in the settings.

The real product is ads that are clearly AI-generated but still acceptable by you. Sometimes even likable.


Not really. We would know people by proxy and referral through other real people. Like how real life works. And actually, over a large duration of time, the real nature of people eventually surface - even the nature of those who successfully pretend to be someone else that they are not. I dont expect that it would be different in the case of AI - it should actually be easier to tell that an account is an AI in the long run. Real people are rather sticky in their ways and character for large durations of time. Their quirks show. The AI constantly evolves and changes.


Perhaps you’re overstating the importance of those sites.


I mean, everyone ultimately reads content written by a person.

Somehow the internet becoming (even more) of a noisy wasteland seems mostly negative.


But generated nonsense is already possible and already exists. If all that crap becomes higher quality crap... Isn't that... It's not bad?


Higher quality sounding, and higher quality, are two different things, since generative AIs don’t really care about truth.

Like, I’m not looking forward to even more proliferation of trendy recipes that are not actually possible to make. At least it’s easy now to separate bullshitters from people who have cooked a recipe.


Not that long ago, the internet didn't even exist.

Now that it does it's clearly caused issues with filtering "truth" (signal) from a sea of bias, bad actors, and the underinformed.

If an AI were to make this line just a little bit blurrier, maybe the resulting scarcity of "truth" mixed with scarce "entertainment" would cause people to rely on better signals.

That is probably wishful thinking of course. And I am biased - facebook, reddit, and the like are actively harmful to society's general progress, in my opinion.


This is also my best case scenario, and I do think it's going to play out, but in a different way. Instead of relying on better signals, people are going to just generally disregard all signals. You can already see foreshadowing of what will happen in today's world. As the media has begun playing increasingly fast and loose with the truth, it's not like people just started trusting certain entities more - but rather trust in the entire media system collapsed.

As per a recent article [1], only 25% of Americas do not think the media is deliberately misleading them (50% do, 25% unsure). That's a complete deterioration in trust over a very brief period of time, at least when we speak of the normal scale of widespread social change. And, IMO, this will be a major step forward. Trust is too easily weaponized in a time where there's seemingly been a catastrophic collapse of ethics and morals among both political and business leaders. It's like The Prince is now everybody's bedside book.

[1] - https://fortune.com/2023/02/15/trust-in-media-low-misinform-...


I suppose the question is is there an incentive to do that? A crappy sounding crappy quality spam recipe already gets a page hit and no goodwill. Does better sounding but still crappy do better in any way that translates to money for the author (or author's operator)?


It causes the site to be left on for longer, providing more room for ad exposure.


I don't see much point in that from a practical standpoint, you don't really need a LLM to generate spam, and content is not the only way spam is detected.

But it may happen just because they can. Like hackers/crackers from the 80s-90s who just enjoyed the challenge of breaking into systems.


I can guarantee it's already happened, and been happening for a year.


The only solution might be to fix the system that incentivizes sites that pump out “useer-generated” content.


I.e. using ad blockers is a moral imperative.


I find it hard to worry about this. I automatically seem to think of it as this situation: https://xkcd.com/810/


> Parallel inference reaches hundreds of tokens/sec.

Marketing claims, meh. It gives normal people the wrong impression.

You can’t parallelize your query because it’s sequential. I think people will be willing to wait the ~200 sec necessary to get 200 words, but it’s best to be up front about this limitation.

Also abuse is a problem. Once 4chan realizes they can poison the distributed model, they’ll have a field day. But maybe it’s too much effort for too little reward that trolls won’t bother.


From https://github.com/bigscience-workshop/petals/wiki/Security,...

> Q: Does Petals guarantee that model outputs are correct?

> Not by default. A faulty or malicious server could give you incorrect outputs. There are two things you can do about this:

> - Verify outputs. Send some of your tensors to two or more peers and check that the answers match.

> - Set up a private swarm. You can launch your own swarm hosted by people and organization you trust, who are authorized to process your data.

> In future, we plan to implement an automatic verification and a reputation system, so that clients can select servers that they can trust.


Byzantine problems allover again..


A Petals dev here. We say up front that "Single-batch inference runs at ≈ 1 sec per step (token)".

In turn, "parallel inference" refers to the high-throughput scenario when you generate multiple sequences in parallel. This is useful when you process some large dataset with LLM (e.g. run inference with batch size of 200) or run a beam search with a large beam width. In this case, you can actually get the speed of hundreds of tokens per sec, see our benchmarks for parallel forward passes: https://github.com/bigscience-workshop/petals#benchmarks

If you have another wording in mind that is more up front, please let us know, we'd be happy to improve the project description. Petals is a non-commercial research project, and we don't want to oversell anything.


Can it run in a docker-compose container with a set ressource limit?

Do each node earn points for supplying resources that can then be spend for greater query / process speed?


Sure! The point system is being developed, we'll ship it soon. Once it's ready, you'll be able to spend points on high-priority requests to increase the speed.


I think most of 4chan would only want to use it to talk with their anime waifu's


It's more their style to get it to recite FBI crime statistics.


plural of waifu is wifi, actually


That's rather wholesome. Unfortunately, 4chan is barely a Chinese cartoon board anymore, /pol/ seems to have the most external influence which reflects poorly on the whole site.


There is a GPT4chan floating around somewhere. Or maybe its Chat4Chan. I don't remember. I try to stay away from that poison.


There is no poisoning vector, you can only update prompts and adapters hosted locally.


From the site:

> you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.

If multiple people participate in a fine tuning session, you have to trust all of them. You also have to trust everybody for inference too, but at least one of them can’t scramble the model.


This is all covered in the docs if you click through past the landing page. If you want to propagate changes to others you need to set up your own swarm, you can't go tuning things on random participants. You can read more at:

- https://github.com/bigscience-workshop/petals/wiki/Security,...

- https://github.com/bigscience-workshop/petals/wiki/Launch-yo...


Maybe this could be solved with opt-in (or opt-out via banning) federation similar to Mastodon. Instead of one network you could have a bunch of different networks each focused on the interests of a different community. Or maybe as someone with a node, you could "subscribe" to different communities that use different filtering and prioritization mechanisms for task assignments.

I do love the general direction, and I think it's inevitable that training will move to be more decentralized like this. It's also the best chance we have at disrupting the centralization of "Open"AI and their ilk. I say the earlier we figure this out, the better, but it's not an easy problem to solve cleanly. And, not to be that guy, but maybe we could add some cryptocurrency incentives to the mix... conveniently enough, the crypto miners already have the GPUs ready to go!


You could do attested code in an enclave, which though vulnerable to certain side channels, is probably more robust than the standard case.


Wouldn't untrusted weird input increase loss and be rejected?


I like the idea behind this because large AI seems to be highly constrained by co-located computation and the costs associated with it (GPUs and energy).

There are many delivery and cost advantages to running a massive LLM in a distributed P2P fashion.

Weirdly enough, I see this as a real "web 3" opportunity. Corporations running large LLMs could run their models on a decentralized network and pay participants for their contributed computing capacity.

AI most significant headwinds are cost and the pace at which GPU capacity is being built. This seems like a good model to tackle both issues.


> Weirdly enough, I see this as a real "web 3" opportunity. Corporations running large LLMs could run their models on a decentralized network and pay participants for their contributed computing capacity.

The same problem we saw with "web3" is here. If I were a "miner" in this case, why would I not go commercial-scale to gain efficiencies here. I could just build a real datacenter, and offer real contracts to real companies instead. It'd be cheaper for everyone.

Unless the expectation is that we literally can't get enough GPUs for all the datacenters, and we rely on the aggregate of consumers' integrated GPUs in their laptops? I think we'd just see companies not using LLMs before they got desperate enough to pay rando's for LLM processing.


If we compare this to crypto mining, most mining is done by big players with datacenters.

But it's still decentralized, and decentralization drives competition in a way that traditional B2B contracts cannot. The fact that anyone on the planet who can afford a GPU or an ASIC can be a competitor is significant.

For example, an RX 6800 will generate ~$0.34 per day minus electricity costs if you mine with it. That's the true value of that card on a global decentralized market. But renting a similar cloud GPU will cost about $0.30 per hour. 95% of that cost could be eliminated with a decentralized market.


> The fact that anyone on the planet who can afford a GPU or an ASIC can be a competitor is significant.

Except you can’t really make money. You need a data center to move the needle. If I was a company, I wouldn’t want any of my compute running in some kids dorm room or the basement of some house in the burbs.

> For example, an RX 6800 will generate ~$0.34 per day minus electricity costs if you mine with it. That's the true value of that card on a global decentralized market. But renting a similar cloud GPU will cost about $0.30 per hour. 95% of that cost could be eliminated with a decentralized market.

What about maintenance and redundancy? What if you need 2 for 12 hours and 0 for 12 hours? The value of cloud compute is not the rental cost of hardware (or mining cost?) it’s everything else. It’s scale, and maintenance, and geographic distribution, etc. it’s the nice GUI and support staff, it’s the SLAs and SDKs, etc.

Try renting a Mac on Aws - where a month will probably cost the same as buying it and consider why people may use it. Consider why there isn’t a decentralized marketplace of MacOS VMs despite this.


The average computer is not realistically capable of running LLMs effectively (because VRAM or RAM does not fit the full model).


“Run large language models like BLOOM-176B collaboratively — you load a small part of the model, then team up with people serving the other parts to run inference or fine-tuning.”

According to this excerpt, a node in the network doesn’t need load the entire model. Only a part.


You simply reward based on performance


It's a pretty naive idea (web3). Impossible to implement.


Care to explain why?


How do you calculate computing capacity? What is the output - AI gibberish? What guarantees that its generated by the official model?

This only works with maths - that is SHA-256 or other hash algorithms in a Proof of Work manner. The only thing that can't be spoofed is maths.


Imagine if it was possible to combine this with homomorphic encryption into something like AirDrop for LLMs!

("Sorry, I don't know how to answer that – but you can try getting closer to a bunch of other people running the app on their device and ask again!")


Homomorphic encryption has such an enormous overhead that it would never be faster than just running the model locally. Or probably on your wristwatch for that matter.


Sounds like a literal hive mind!


The library they're using is literally called Hivemind [0]. I'm interested to see how the approach they're using differs from what we use in federated learning or gossip learning.

> Hivemind is a PyTorch library for decentralized deep learning across the Internet.

0: https://github.com/learning-at-home/hivemind


A Hivemind/Petals dev here. As far as I understand, most federated learning methods can't efficiently train very large models (with billions of parameters) because they repeat some calculations on many peers and/or involve excess communication.

In contrast, the training methods implemented in Hivemind struggle to minimize compute and communication but don't provide data privacy guarantees. This is mostly okay for LLMs, since they are trained on public data scraped from the Internet anyway.


Dumb question from someone who knows not to much yet about LLMs. How can you trust the other computers? Will I end up with a bunch of swear words coming back from the other nodes that are playing a prank?


There's some really cool work being done using Zero Knowledge proofs to write a succinct proof that output from a given model is correct. This is going to be increasingly important not just for these types of distributed systems, but even for things like ChatGPT to make sure that you're actually getting results from the model that you're paying to use.

Imagine a world where OpenAI or some other large API provider gets taken over by someone who wants to make money, so they start quietly using smaller, weaker models to respond to API calls, even for customers who are paying for the highest end model. Maybe this is just done at first to survive under massive load, but then someone realizes how many millions of dollars they can save by just forgetting to switch back to the more expensive models.

Here's a great primer: https://0xparc.org/blog/zk-mnist


Sending tensors to two or more nodes is a partial solution, since you can reject ones that don’t match. But fundamentally the answer is yes — bad actors can overwhelm the swarm, and there’s not much you can do about it.

I think the project should embrace this limitation. eBay had the same problem, but it’s in a seller’s interest to deliver correct items quickly. Make a social incentive and the rest will work itself out.


> bad actors can overwhelm the swarm

I don't think so. To simplify: You send out 1000 tasks, you perform them yourself, now you have 999 bad flags and 1 good one, you send out 10 tasks to the same including the good one, now you have 990 with 1 bad flag, 9 with 2 and 1 with 2 good ones, you continue sending tasks to the bad nodes and drop their response, if they send you a task you return garbage, you ask the good nodes (with say 100+ good flags) for their list of good nodes and test them one by one.

You could build a system where bad nodes have to return so many good responses before getting booted that the joke is on them.


If you get banned from eBay your SSN/phone number/email/browser fingerprint/address/etc are prevented from coming back. What system would enforce that for computation nodes?


They don't solve the problem completely, but they address the problem in their publication as follows. The inference step uses a lot of layers and the first and last layers have more 'interpretability' so some of these are the ones that you solve locally on your own computer. If they wanted to prank you, they would have to use some sophisticated thing that probably doesn't exist yet but it could still be possible. Also if it becomes a problem they could do usual methods like run on multiple other computers and use the majority solution.


>If they wanted to prank you, they would have to use some sophisticated thing that probably doesn't exist yet but it could still be possible.

Isn't the attack straightforward?

i) Take the model, freeze all the weights except the ones you expect to be responsible for

ii) Finetune to produce whatever output you are looking for.

iii) Profit. Or mainly just annoy people, but it could be funny.


OK sure, any time someone says on the internet that something is sophisticated someone will say it's straightforward. So I guess it's a continuum. The thing you describe is more sophisticated than what they would have to do if they had access to the final layer.


I'm not entirely how the approach they're using works [0], but I study federated learning and one of the highly-cited survey papers has several chapters (5 and 6 in particular) addressing potential attacks, failure modes, and bias [1].

0: https://github.com/learning-at-home/hivemind

1: https://arxiv.org/abs/1912.04977


I don't think that's a dumb question! I don't know if this project has an answer to that, but there are some techniques (Merkle tree hashes etc) that might work depending on how much of the model you want to download locally.

I don't see how to securely scale the inference step, though.


You can't, it's somewhere in the petals docs, but they recommend generating in parallel, and averaging, or selecting the response.


I was waiting for this! This is exactly where we are headed. Excellent.


I feel like it makes much more sense to just run it in the CPU instead. CPUs have access to far more memory so you could fit the entire model at its original size.

Instead of messing around with inefficient nonsense like this, figure out a way to prune and modify the models so that they run efficiently on a CPU.


Right now most CPUs are orders of magnitude slower than GPUs for doing forward/backward passes, so you're unlikely to get a similar speed. Some kind of pruning may help though.


What if all the computation power that's being wasted by crypto (Proof of Work, etc.) could be shifted to powering AI models in a decentralized way, such as this project?

You keep the AI "alive" and in return, you get paid in crypto. What a wonderful time to be alive!


Portion of a discussion I had on this topic with GPT-4:

> Your idea of having all miners work on the same section of the model and compete to find better weights with a lower loss function value is an interesting approach. This would essentially treat the LLM training process like a proof-of-work mining competition. It is true that this approach involves some degree of duplicate work, but the simultaneous, randomized exploration of the weight space could still leverage the distributed computing power of the network.

> Here's a high-level overview of how this approach could work:

> Miners receive a specific section of the LLM's weights and a subset of training data. The training data could be determined based on information from the previous block (e.g., by hashing the previous block).

> Each miner initializes their local copy of the weight section with random perturbations to introduce diversity in the optimization process.

> Miners perform optimization steps (e.g., gradient descent) on their local weight section to minimize the loss function on the given training data.

> Miners search for a solution that satisfies both the proof of training (improvement in the loss function) and the proof of work (hash of the block meeting the difficulty target).

> The first miner to find a solution meeting both criteria broadcasts their updated weight section and the new block to the network.

> Other miners verify the validity of the proposed solution (i.e., checking the improvement in the loss function and the hash meeting the difficulty target) and add the new block to their local copy of the blockchain.

> This approach would turn the mining process into a competitive LLM training process, where miners contribute their computing power towards improving the model. It maintains some of the core properties of proof-of-work mining while directing the computational resources towards a productive goal. However, this approach still needs to address potential issues related to data privacy, intellectual property, and the synchronization of the model's weights across the entire network.


What's the point of this hashing the block business if you can already prove your work by presenting a set of of weights that reduces the loss function? And even then, you run into the risk of overfitting if you just blindly optimize for loss like that.


Eventually it will be common knowledge that asking a text compressor trained on all available text to design a new technology doesn't work.


I don’t believe that is true. Most technologies are incremental improvements on recombinations of existing tools and techniques. It seem likely to me that LLMs ability to map associations between concepts can result in humans using it to help invent new technology.

Have you personally used GPT-4 much?


I try to base my judgments of what LLMs can and can't do primarily on my study and research in related fields. I haven't been surprised by the capabilities of any LLM yet, including GPT-4.


Are you kidding me? Pretty much everyone is surprised, including the creators, how are you not?


Is that a serious question? Studying a field for years should make outcomes in that field less surprising, otherwise what have you been doing?

The creators were surprised in the sense of "we got here sooner than expected" but not "we didn't think this would work". Otherwise they wouldn't have been working on it. And there is nothing new in LLMs in years, it's just increasing fidelity by massively increased scale.

To be honest, I've been more surprised by the incompetence of people in evaluating these systems, including journalists, programmers, and others who should be in a position to know better.


> The creators were surprised in the sense of "we got here sooner than expected" but not "we didn't think this would work". Otherwise they wouldn't have been working on it. And there is nothing new in LLMs in years, it's just increasing fidelity by massively increased scale.

This is categorically false. There are papers being published on all the surprising emergent behavior being observed.


Emergent, as in, "we cannot explain how this works," yes. That is nothing new in the field of ML or to anyone who has been paying attention.


https://youtu.be/StLtMcsbQes

I think you have your head in the sand and haven’t been paying attention.

The scaling laws are not expected. The capabilities of GPT-3.5 are beyond what even those deeply involved had expected.

I also think the progress is likely going exponential at this point. Multi agent and recursive prompting are coming soon.

This is really not ML at all. I have extensive traditional ML knowledge and background. I know in detail the typical model suspects on a Kaggle board.

LLMs are totally new and surprising relative to my many decades working with ML and traditional NLP.


That's a good talk.

I'm paying attention. I think "scale is all you need" is wrong even when it's right. We have a responsibility to not allow the capabilities to outstrip our ability to understand and control. If we don't do our job that will be the real "bitter lesson."

However, ultimately it's a text predictor driven by a PRNG and I stand by my statement; I think the systems are obviously impressive but the unrealistic expectations people have and the anthropomorphization and projection I'm seeing is even more impressive. Let me know when it starts synthesizing new science or math. By then we're in trouble.


That type of work is not useful for securing a decentralized system. To secure a decentralized blockchain, you need the work to be provably wasteful.

That said, there's a new paradigm of Blockchain brewing called 'Proof of Physical Work', which accepts the centralized limitations and leverages the incentive scheme to produce useful outcomes for society. Perhaps the most interesting one right now is HiveMapper.

It won't secure a decentralized blockchain, but it might still be a great value add for society.


> To secure a decentralized blockchain, you need the work to be provably wasteful.

But why does it have to be wasteful?

As I understand it you just need a hard enough computation to prevent the 51% problem.

Obviously if the goal was to compute something which had real value all the failed attempts would count as “provably wasteful” so would seem to fit the criteria.


> As I understand it you just need a hard enough computation to prevent the 51% problem.

that's not the case, in fact hardness is barely a requirement provided you can scale up the number of sub-tasks as much as you need

When the work is provably wasteful, the model simplifies a great deal. Usefulness of the work complicates the dynamics and makes many scenarios plausible that otherwise wouldn't. The game theory then doesn't follow that you'd choose a currency with extra elements of instability.

I don't think there's enough evidence to make the hard requirement that it HAS to be provably wasteful. It's just that you introduce a bunch on extra, usually intractable, problems when the work is useful.


So? If we start offering crypto to run sections of LLMs, then some portion of the crypto miners will move to doing something useful instead.


What about Eth's proof of stake?


Proof of stake involves lockup of capital, wasting the time value of money (despite it earning a small yield), and also taking on the risk of being slashed.


by definition that makes it not a blockchain


I don’t believe it’s accurate what you claim.

The training of LLMs I believe has a well defined loss function with a measurable value on a specific input set. I actually already discussed this problem extensively with GPT-4 itself so I’ll leave you with parts of that discussion:

> Me: Do LLMs have a well defined loss function that they optimize over?

> GPT-4 Assistant: Yes, large language models (LLMs) have a well-defined loss function that they optimize over during the training process. The choice of loss function depends on the specific task the model is designed for and the architecture used. For LLMs like GPT-3, which are based on the transformer architecture and designed for natural language processing tasks, the most commonly used loss function is the cross-entropy loss.

> Cross-entropy loss measures the difference between the model's predicted probability distribution over the possible output tokens (words, subwords, or characters, depending on the tokenization used) and the true probability distribution, which assigns a probability of 1 to the correct output token and 0 to all other tokens. The goal of the training process is to minimize the cross-entropy loss, which essentially means making the model's predicted probabilities as close as possible to the true probabilities.

> During training, the LLM processes input sequences (e.g., sentences or paragraphs) and generates output token probabilities based on its current weights. The cross-entropy loss is computed using these probabilities and the true target tokens, and then the gradients of the loss with respect to the model's weights are calculated using backpropagation. The weights are updated using an optimization algorithm, such as stochastic gradient descent (SGD) or one of its variants like Adam or Adagrad, in order to minimize the loss.

> The process of iteratively updating the model's weights to minimize the loss function is what allows LLMs to learn patterns and structure in the training data, ultimately enabling them to generate coherent text, answer questions, or perform other language-related tasks.

> Me: Is it possible to restrict the training with a rules that says only these weights, say specific slices, are allowed to change, keep all other weights fixed? Is this how LLMs can be trained in a distributed fashion?

> GPT-4 Assistant: Yes, it is possible to restrict the training of an LLM to specific slices or subsets of weights while keeping other weights fixed. This is a common approach used in distributed and parallel training scenarios, where multiple devices or nodes work on different parts of the model simultaneously. This approach is known as model parallelism.

> In model parallelism, the model's weights are divided into smaller subsets or partitions, with each partition being assigned to a different device or node. During training, each device works only on updating the weights in its assigned partition while keeping the weights in the other partitions fixed. The gradients are computed for each partition separately, and the updates are performed locally on each device.

> Once all devices complete their local updates, they communicate their updated weights with one another to synchronize the model's state. This communication can be done through various methods, such as parameter server architecture or all-reduce algorithms. The key to effective model parallelism is balancing the computational workload among the devices while minimizing the communication overhead.

> Adapting this concept for the proposed proof of work and LLM training combination could involve assigning different subsets of weights to miners and having them work on updating these subsets. This would essentially turn the mining process into a form of distributed LLM training, with each miner contributing to a portion of the model's weights. However, coordinating and synchronizing the updates among miners in a decentralized environment, while maintaining the integrity and security of the blockchain, would be a significant challenge that needs to be addressed in the design.


There is an AI generated spam joke inthere but I cant think of it right now. I'm much to overwhelmed [again] by the clarity of that explanation.

I believe someone posted a paper talking about the riddle at the end: Apparently one can also merge weights if work is done on a partition repeatedly/accidentally. The rest of the merger seems a kind of bittorrent(?)


Proof-of-work only works with a particular set of computational problems, i.e. those in NP. I'm not sure if running a LLM fits that bill.

I suppose you could combine proof-of-stake with it in some way (e.g. you commit to an input/output tuple and get your stake slashed if it can be shown to not reproduce on a "canonical" execution), though?

That's not nearly as simple as "normal" PoW, though – you'd need to encode some reputational aspect into the system.


You can definitely do arbitrary work as a sort of proof of work. Not quite the same mathematically, but pragmatically similar. The key is building in some redundancy/error-correction and ensuring that a single node can't by itself define "correctness" of a solution. You do that by duplicating work across nodes, distributing chunks randomly and rejecting/rerunning disagreeing results. It's also pretty easy to spot bad actors trying to cheat on their computational work in this scenario.


I don't think it's that easy at all. The work function must be cheap and canonical to check, and the consensus algorithm has to be rigorous, or else it's too easy to attack the security of the network. DoS, Sybil, 51%, social takeover via hard fork, the list goes on...


It has a well define loss function with a numerical value. The improvement of this value can be a type of difficulty. Check some other comments I’ve made on this post for how it might work.


It's an interesting idea for sure, but loss doesn't go down forever. I think this ends with a highly overfitted network that grinds to a halt as the loss function hits local minima.

Even if you get past that, there's no consensus mechanism or finalization as it stands, and validating solutions is relatively expensive.


We only just started thinking about this and I suspect these issues are solvable in a protocol. For instance using cross validation there must be a distributed protocol to control over fitting.

I’m not sure validation is so expensive if the data is small enough. Actually maybe that’s a way to approach this, two type of block that are paired and share the rewards in some way. One that proposes better a better splice of weights and another that proves they are better out of sample.

Give it a few weeks and with GPT-4s help I think we can find some promising approaches.


And now we have an AI that is impossible to turn off.


Reminds me of the short story Stephen Hawking tells about AI in this video https://youtu.be/T8y5EXFMD4s


And impossible to censor.


I believe that's pretty close to what https://bittensor.com/ does.


What if we moved to "Proof-of-Carbon-capture" instead?


Let's do it.


I'm in


Let's watch the world burn!


The world is capable of burning itself just fine without such assistance.

It would be much neater to turn it all into paperclips instead.


it's all fun and games until a bunch of kids die


me too


What I want to see is a bunch of RTX3060 mining cards being used to run ML models. They did talk about bandwidth issues with under 100mbit for servers tho we're doing this as a local cluster you could run some ex-server network gear and be gold.


i just thought this through building my new PC and house. The bottleneck is always the NIC because the most you can do right now is like 10gbe for conventional motherboards.

After that you also run into cabeling issues. Cat 8 for instance also only does 40gbe max, which means for any more you need to bundle up connections which comes with its own problems.

Another point is that while mining, gpus still are independent and not connected to each other. so each of them are restricted to the max your PCIe port will give you too.

PCIe 4.0 has a maximum data transfer rate of 16 GT/s (gigatransfers per second) per lane, which translates to 2 GB/s (gigabytes per second) per lane. PCIe 4.0 can support up to 16 lanes, which means that it can provide a maximum data transfer rate of 32 GB/s (gigabytes per second) in each direction (upstream and downstream) on a x16 slot.


So in the old model you could: 1. pay for compute 2. charge the customers to pay for compute,

and now you can instead: 1. pay your customers to pay for compute 2. charge the customers to pay for the customers to pay for compute

Is there something I'm not understanding in the business logic of this?

Is it the fact that this would be running on computers that are essentially free, since it would just be like the desktop in someone's home office, so the infrastructure costs are already paid for (e.g. externalized)?

Or like would the value here be accessing the LLM service for 'free'? But isn't just paying for a service like OpenAI relatively inexpensive and already nicely set up?


But isn't just paying for a service like OpenAI relatively inexpensive and already nicely set up?

Sure, but OpenAI is never going to offer you a raw product. Their offerings will always be the heavily restricted, corporatized product they offer now. That works for many, maybe most, people but there's definitely a market for a "power to the users" LLM AI with no rules.


> Is there something I'm not understanding in the business logic of this?

That people would rather give away some of the GPU time they aren't using at this moment than pay subscription. And presumably also not wanting to be beholden to whatever filters the "big AI cluster owner" puts in place


Curious if anyone has actually used this. It's quite slow for me and feels more like a cute idea rather than a useful product.


This seems to be inference side.

Surely for distributed building a license free model similar to say 3.5 chatGPT would be more useful?

ie rebuild the alpaca work minus legal issues


Petals is an impressive feat but be aware it is very slow with 1-4 sec / token (depending on the hardware you have). I find it too slow for even experimenting, as a developer I want faster feedback cycles. Super interesting to see the project evolve over time, onboarding could not be easier.


What is the rate of tokens per second when you are talking to ChatGPT on GPT-4?


Am I the only one excited about when 4chan will train its own AI by collectively pooling their power levels?


I wonder how close we are before someone comes up with a peer to peer malware using similar concept to train their model. Kind of like how trojan crypto miners were found in public package repos and apps just couple of years. (Probably still an issue)


From the table, a collection of 14 servers is equivalent to a single A100 when using a batch size of 64. So what if you used 1 computer but did smart offloading to RAM or SSD? Would that be more than 14 times slower?


Very cool. Had been wondering when we would see real "cloud" database and model computation without some silly token attached.


You made a real skynet!!!

Jokes aside it's pretty cool!


How can this be decentralized with no ICO?


It's super slow. 1 token per second, if that. Half a word a second.


If nodes drop in and out how does that impact the inferences I wonder


My interest in AI has just gone 10X, thanks and cheers!


Kinda reminds me of the BIONIC system


This is pure genius if it works.


Running LM on two gpus on a single system comes with 10x speed penalty. Getting layers across network will be in general even slower. They talk about 1 token per second, with images it will be even less due to larger amount of sequential steps.

It can be useful... if it's even possible. But there is quite slim amount of possible use cases.

Generation will be slower, so why bother? For high amounts of batches? Maybe. But why use it if we have Swarm by db0?

Training theoretically can be worth it, but something like Kickstarter and gpu renting can be both more cost-effective and quicker.


Speculative sampling to the rescue - you decode locally with a smaller-LLM, and only check from time to time with a large model, like every few tokens. This guarantees exactly the same quality with a big speedup, as you don't need to predict with the large model each individual token.

Accelerating Large Language Model Decoding with Speculative Sampling https://arxiv.org/abs/2302.01318


Skynet.


dis gonna be big


ML/AI moving too fast.


I would be very concerned about sending my data over to a swarm https://ashokpoudel.medium.com/understanding-security-and-pr...


This works by taking a language model that won't fit in a single consumer GPU's memory, partitioning it layerwise, and running it distributed across a bunch of different people's computers. If I'm understanding correctly, then any single node acting dishonestly can replace the output out its portion with whatever they want, and (if every other node is honest), this is sufficient to fully control the output. So, probably okay to use for prompts like "rewrite Rick Astley lyrics in the style of Shakespeare", but not something you'd want to use in a way that feeds into another automated system.

Meta-level, I think it's bad for the world if there's good technology for running neural nets on distributed consumer GPUs. From a cybersecurity perspective, Windows gaming PCs are easy pickings compared to datacenters, and I think there's a risk that after a few more iterations of AI development, we'll start getting systems that figure out they can increase their own power level by building a botnet that runs additional copies of themselves.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: