NanoGPT: The simplest, fastest repository for training medium-sized GPTs

paradite · 2024-06-11T10:19:57.000000Z

It takes a bit of effort to setup and train a proper GPT-2 model, especially if you are not familiar with GPU drivers and python environment.

Also don't try to train GPT-2 on your own machine as it takes days, even with a good gaming GPU.

If you are interested in trying it out, but don't have the right GPU or OS, you can check out the guide I wrote on how I did it on Azure with T4 GPU instance:

https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...

alok-g · 2024-06-11T16:58:23.000000Z

Thanks a lot for sharing this.

How much does the net cost come out to for a run? (Am not a startup.)

Also, any smaller datasets you would recommend that still demonstrate some useful capability?

Thanks.

paradite · 2024-06-11T19:36:21.000000Z

Update: added result section to my post with more details: https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...

I don't think it can go any smaller and demonstrate its capabilities. Even GPT-2 is mostly generating nonsensical sentences that resembles English.

You might be looking at ngram or markov model for simpler NLP capabilities.

alok-g · 2024-06-12T17:15:57.000000Z

Thanks!

asicsp · 2024-06-11T08:28:47.000000Z

Previous discussion: https://news.ycombinator.com/item?id=34336386 (1532 points | Jan 11, 2023 | 320 comments)

CapsAdmin · 2024-06-11T11:30:29.000000Z

Something from previous discussions about the whole

"Rather than paying $50k up front for 8 x A100's, you can just rent some GPU's for $1.2k to train the whole thing in 4 days"

That feel off to me is that it completely ignores the compute time spent exploring new ideas, failing, tweaking the training data, etc.

mikeqq2024 · 2024-06-11T12:48:56.000000Z

Link for the previous discussion? Which model, dataset, training strategy?

VagabundoP · 2024-06-11T11:12:36.000000Z

Would any GPU - even an old one - work for training these models?

I have a bunch of little home apps/ideas for some specific training I'd like to do, but don't currently have any recent GPU of note. Just a few old ones in my attic somewhere.

EDIT: ah someone has posted a blog post in the comments with some info

EDIT2: to be clearer I was talking about finetuning. I assume thats quicker/less intensive. I really need to checkout some online courses I think.

mromanuk · 2024-06-12T00:20:57.000000Z

Yes, fine tuning is fast, cheap and "easy", you can do it in a no so expensive GPU without issues.

mromanuk · 2024-06-11T12:03:17.000000Z

Have in mind that instead of training a GPT-2 model from scratch, you can opt for fine-tuning. Fine-tuning allows you to take a pre-trained model and adapt it to your specific task with much less computational resources and time. It leverages the existing knowledge of the model, resulting in better performance and faster development cycles.

szundi · 2024-06-11T09:11:13.000000Z

This guy is awesome

serverlord · 2024-06-11T17:46:18.000000Z

We need a simpler way to make GPT get mass adoption and I don't think GPT Store from OpenAI will do that.

We need a unique interface with a standardized process to help people make use-case-specific GPTs.

Let's see what the future holds.

paradite · 2024-06-11T19:39:24.000000Z

I actually built a purpose-made UI (desktop app) for coding using ChatGPT, because I found that chat interface is not ideal for daily coding tasks.

Curious what you think about it: https://prompt.16x.engineer/

serverlord · 2024-06-12T16:13:47.000000Z

This looks super interesting. Thanks for sharing! I will share my thoughts in a couple of days.

fragmede · 2024-06-11T19:49:48.000000Z

So like aider, but with more gui.

paradite · 2024-06-11T19:54:42.000000Z

Exactly.

imurray · 2024-06-11T08:40:07.000000Z

See also (from the same author) https://github.com/karpathy/llm.c — "LLMs in simple, pure C/CUDA with no need for 245MB of PyTorch or 107MB of cPython."

mhavelka77 · 2024-06-11T11:11:58.000000Z

The fact that Andrej Karpathy is part of OpenAI is giving me at least a bit of hope that the company is not going to ruin society as we know it…

animex · 2024-06-11T11:20:15.000000Z

He left in Feb 2024.

https://x.com/karpathy/status/1757600075281547344?t=gpYK73N5...

captainregex · 2024-06-11T11:18:58.000000Z

he…isn’t?

ninjin · 2024-06-11T11:21:13.000000Z

Indeed, I think he left (for the second time) early this year.