mryab's comments

mryab · on Feb 20, 2023

There is! See https://petals.ml/ for inference of models like BLOOM-176B over the internet or https://arxiv.org/abs/2301.11913 and https://arxiv.org/abs/2206.01288 that show you how to do pretraining from scratch in the same setting. Disclaimer: I'm a coauthor of these systems (including the one in OP)

exebook · on Feb 20, 2023

Amazing work! If I had a GPU, I'd join. I know similar project for text-to-image: https://aqualxx.github.io/stable-ui/

efreak · on Feb 22, 2023

https://github.com/aqualxx/stable-horde-notebook

My only problem with stable horde is that their anti-cp measure involves checking the prompt for words like small, meaning I can't use a nsfw-capable model with certain prompts (holding a very small bag, etc). That, and seeing great things in the image rating and being unable to reproduce because it doesn't provide the prompt.

mryab · on Jan 11, 2023

There absolutely are! Check out hivemind (https://github.com/learning-at-home/hivemind), a general library for deep learning over the Internet, or Petals (https://petals.ml/), a system that leverages Hivemind and allows you to run BLOOM-176B (or other large language models) that is distributed over many volunteer PCs. You can join it and host some layers of the model by running literally one command on a Linux machine with Docker and a recent enough GPU.

Disclaimer: I work on these projects, both are based on our research over the past three years

mryab · on Jan 18, 2021

Not directly related, but the Learning@home [1] project aims to achieve precisely that goal of public, volunteer-trained neural networks. The idea is that you can host separate "experts," or parts of your model (akin to Google's recent Switch Transformers paper) on separate computers.

This way, you never have to synchronize the weights of the entire model across the participants — you only need to send the gradients/activations to a set of peers. Slow connections are mitigated with asynchronous SGD and unreliable/disconnected experts can be discarded, which makes it more suitable for Internet-like networks.

Disclaimer: I work on this project. We're currently implementing a prototype, but it's not yet GPT-3 sized. Some issues like LR scheduling (crucial for Transformer convergence) and shared parameter averaging (for gating etc.) are tricky to implement for decentralized training over the Internet.

[1] https://learning-at-home.github.io/

gabriel_shatana · on Jan 18, 2021

Your project looks so interesting. Have u thught of putting the experts on a distributed market where their expertise and work can be exchanged for some token (obvlsy using a blockchain).

This would encourage people to host experts in your network and would create value.

mryab · on Jan 19, 2021

Thank you! This is definitely something we should look into in the future (hopefully with community help); as of now, training infrastructure and model convergence are the highest priorities. That said, we welcome all ideas of ways to motivate more volunteers to join the experiments, because Learning@home team comes from a distributed DL background with limited volunteer computing expertise.

Also, I believe that for some projects (e.g. GPT-3 replication effort) people would want to join the network regardless of the incentive mechanism, as demonstrated by Leela Chess Zero [1].

[1] http://lczero.org/

stfwn · on Jan 18, 2021

How do you deal with adversarial/byzantine updates that attempt to degrade performance or even install a backdoor? Do you use plain averaging, or some other aggregation algorithm like Multi-Krum?

mryab · on Jan 19, 2021

For now, the only separation we have is that each worker is responsible for its own weights, since network security has not been our top priority. Still, we've been thinking about adding some security measures like proof-of-work for each node and detection of anomalous inputs/gradients (or simply NaN values). Right now we're running experiments on internal hardware, but before a public launch we'll make sure that malicious participants won't put everybody else's work to waste :)

ouromoros · on Jan 19, 2021

This is also what I was thinking about. Considering that making up bad data does not require any GPU work as opposed to honest calculating nodes, the model can fall quickly if without taking some measures to deal with them (adverserial nodes).

A draft solution would be for the central server to measure the goodness of each update and drop the ones that don't perform well. This could somehow work since inference is much cheaper than gradients computing.

echelon · on Jan 18, 2021

Do you have a personal Twitter account I can follow? Your career is one I'd like to follow.

mryab · on Jan 19, 2021

Sure! It's @m_ryabinin

echelon · on Jan 19, 2021

Thanks! :D

mryab · on May 30, 2020

Here is a recent paper (disclaimer: I am the first author) named "Learning@home" which proposes something along these lines. Basically, we develop a system that allows you to train a network with thousands of "experts" distributed across hundreds or more of consumer-grade PCs. You don't have to fit 700GB of parameters on a single machine and there is significantly less network delay as for synchronous model parallel training. The only thing you sacrifice is the guarantee that all the batches will be processed by all required experts.

You can read it on ArXiv https://arxiv.org/abs/2002.04013v1 or browse the code here: https://github.com/learning-at-home/hivemind. It's not ready for widespread use yet, but the core functionality is stable and you can see what features we are working on now.