Hacker News new | past | comments | ask | show | jobs | submit login

I don't know if that's a blocker. Ordinary people commonly rent a $40k machine for 38 hours from companies like Avis and Hertz.

If training a large model now costs the same as driving to visit grandma, that seems like a pretty good deal.




That's a great comparison. For a real number, I just checked Runpod and you can rent a system with 8xA100 for $17/hr or ~$700 for 38 hours. Not cheap, but also pretty close to the cost of renting a premium vehicle for a few days. I've trained a few small models by renting an 1xA5000 system and that only costs $0.44/hr, which is perfect for learning and experimentation.


It would be great if a tradeoff could be made, though. For example, train at 1/10th the speed for 1/10th of the cost.

This could correspond to taking public transport in your analogy, and would bring this within reach of most students.


Slower training tends to be only a little cheaper, because most modern architectures parallelize well, and they just care about the number of flops.

If you want to reduce cost, you need to reduce the model size, and you'll get worse results for less money.


The problem with that is currently, the available memory scales with the class of GPU.... and very large language models need 160-320GB of VRAM. So, there sadly isn't anything out there that you can load up a model this large on except a rack of 8x+ A40s/A100s.

I know there are memory channel bandwidth limits and whatnot but I really wish there was a card out there with a 3090 sized die but with 96GB of VRAM solely to make it easier to experiment with larger models. If it takes 8 days to train vs. 1, thats fine. having only two of them to get 192GB and still fit on a desk and draw normal power would be great.


Technically this is not true- there are a lot of techniques to shard models and store activation between layers or even smaller subcomponents of the network. For example, you can split the 175B parameter bloom model into separate layers, load up a layer, read the prev. layers input from disk, and save the output to disk.

And NVIDIA does make cards like you are asking for - the A100 is the fast memory offering, the A40 the bulk slower memory (though they added the 80GB A100 and did not double the A40 to 96GB so this is less true now than the P40 vs P100 gen).

Oddly, you can get close to what you are asking for with a M1 Mac Studio - 128GB of decently fast memory with a GPU that is ~0.5x a 3090 in training.


Do you know if there's any work on peer-to-peer clustering of GPU resources over the internet? Imagine a few hundred people with 1-4 3080Tis each, running software that lets them form a cluster large enough to train and/or run a number of LLMs. Obviously the latency between shards would be orders of magnitude higher than a colocated cluster, but I wonder if that could be designed around?


Bloom-petals


Amazing. Thank you.


No prob. I think it’s a great idea


I guess this would only become a reality if games started requiring these cards.


Well if it used to cost you $1 for 1hr at 1x speed, now it will take you 10hr at 0.1x speed, and if my math checks out $1. You need to shrink the model.


But of course now you run it on your own computer instead of in the DC, which changes the numbers. Especially if your student dorm has a shared electricity bill :)


The good news is that, unlike vehicles, the rate for rented compute will continue to drop


Let's not forget that rendering 3D Animations in 3DSMAX or Maya used to take days for a single frame for a complex scene, and months for a few minutes.


You have to gas it up and heaven help you if it gets a scratch or a scuff.


Great news! Cloud instances energy usage is included in their price, and because they're remote and transient it's impossible to permanently damage them.


I think the equivalent of being not careful and getting a dent in this context is to leave it open to the internet and having a bitcoin miner installed.


You free the instance and the miner is gone.


As you are paying for the resources you use that's fine.

The closest would be if you used some form of software bug to actually cause physical damage, certainly not impossible, but extremely unlikely compared with actually physically damaging a car.


A better fit would be, if you have unlimited liability like with AWS, and you leak your key pair. Then someone runs up a 100k bill setting up mining instances


but you still have to pay for network ingress/egress traffic.


Similarly maybe we should only let people rent a NanoGPT box if they are over 25 and they have to get collision insurance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: