Date also plays a huge part. Most stackoverflow posts get very little votes in the month they were posted but slowly grow as people search a problem and find an answer. The oldest posts have the highest score so something like "How do I change branch in git" will get 4000 points but posting the same thing or a similar thing today will get you a negative score.
Generally very short questions score badly because they will be stuff like "How do I make a video sharing website" but a very similar question like "How do I copy the current line in vim" will score well as long as its not a duplicate.
Both of those questions look similar when you just look the words and sentence structure. To know the difference you have to understand what a video sharing website is and know what kind of task copying a line in vim is so you can know which one is a reasonable question and which one is likely to help more future readers.
I did try to rule out question age as a major source of error in the model by training on a sample of ~1 million questions all from ~2012. But it didn’t do any better on those.
And (theoretically) duplicates are deleted by the moderators.
(I added tags to the language model later.)