It only had part of the internet, OpenAI is nowhere near as comprehensive at web scraping as Google, I don't think they actually scraped at all for this, using existing data like CommonCrawl.
The other thing you are not understanding is that it did not memorized these things, it built representations for predicting the most likely next token. This is why it hallucinates and makes up numbers and web links or citations that do not exist.
The other thing you are not understanding is that it did not memorized these things, it built representations for predicting the most likely next token. This is why it hallucinates and makes up numbers and web links or citations that do not exist.