Yeah, it's amazingly accurate for now only because the training data consists of a high percentage of the actual data set, and so it's probably classifying based on those insignificant differences. We'll see how it holds up...
I think you're training it on the submission titles - I wonder if the text of the websites themselves might be more accurate. Certainly richer. But it's quite possible that the submission titles are more accurate.