You can estimate the likelihood that a particular sentence is spam by calculatin...

drakaal · on April 25, 2013

Your comment would be marked as spam using your logic. Was that intentional?

mattj · on April 25, 2013

Note this is exactly how a smart spammer would generate text (sampling from a language model, built on a public ally available data set like google ngrams or Wikipedia). If you wanted to catch someone doing this, you're much better off using your own corpus to generate a language model, as a spammer would have to scrape all your data to reconstruct the same thing.

Then, run the model over your data and start playing whack-a-mole (and refining the model).