Running for two epochs on the IMDB dataset (133MB corpus) it only got to a loss of 1.1. Likely the regularization is too high (I didn't tweak the hyperparameters at all, and assume regularization was quite high for the limited tinyshakespeare corpus). Either way, it at least started to learn more grammar:
Prompt: This is my review of Lord of the Rings.
> I can't tell why the movie is a story with a lot of potential
the main reason I want to see a movie that is Compared to the Baseball movie 10 Both and I can say it was not just a bad movie.
Prompt: This is my review of Lord of the Rings.
> I can't tell why the movie is a story with a lot of potential the main reason I want to see a movie that is Compared to the Baseball movie 10 Both and I can say it was not just a bad movie.