Hmm, maybe I've been overcomplicating the problem in my mind. You've given me so...

Hmm, maybe I've been overcomplicating the problem in my mind. You've given me some good ideas.

Bigrams, as your own example shows, are too simple: in both examples, "car" will get related to "a", instead of "getting" and "driving".

Maybe if I parse all sentences with dependency and built dependency bigrams, and score sentences with frequency/inverse_freq and length of sentence (short sentences are better).