This is a didactic implementation. If you read the HuggingFace repo it is much more abstracted on account they implement many models in the same codebase. It's not fast or big, just easier to read and tweak.
minGPT prioritized being understandable above all else, and was not very fast. This repo includes several optimizations, but it still much more understandable than probably any other open source implementation.