It takes a bit of effort to setup and train a proper GPT-2 model, especially if you are not familiar with GPU drivers and python environment.
Also don't try to train GPT-2 on your own machine as it takes days, even with a good gaming GPU.
If you are interested in trying it out, but don't have the right GPU or OS, you can check out the guide I wrote on how I did it on Azure with T4 GPU instance:
Would any GPU - even an old one - work for training these models?
I have a bunch of little home apps/ideas for some specific training I'd like to do, but don't currently have any recent GPU of note. Just a few old ones in my attic somewhere.
EDIT: ah someone has posted a blog post in the comments with some info
EDIT2: to be clearer I was talking about finetuning. I assume thats quicker/less intensive. I really need to checkout some online courses I think.
Have in mind that instead of training a GPT-2 model from scratch, you can opt for fine-tuning. Fine-tuning allows you to take a pre-trained model and adapt it to your specific task with much less computational resources and time. It leverages the existing knowledge of the model, resulting in better performance and faster development cycles.
See also (from the same author) https://github.com/karpathy/llm.c — "LLMs in simple, pure C/CUDA with no need for 245MB of PyTorch or 107MB of cPython."
Also don't try to train GPT-2 on your own machine as it takes days, even with a good gaming GPU.
If you are interested in trying it out, but don't have the right GPU or OS, you can check out the guide I wrote on how I did it on Azure with T4 GPU instance:
https://16x.engineer/2023/12/29/nanoGPT-azure-T4-ubuntu-guid...