Hacker News new | past | comments | ask | show | jobs | submit login

Hmm, maybe I've been overcomplicating the problem in my mind. You've given me some good ideas.

Bigrams, as your own example shows, are too simple: in both examples, "car" will get related to "a", instead of "getting" and "driving".

Maybe if I parse all sentences with dependency and built dependency bigrams, and score sentences with frequency/inverse_freq and length of sentence (short sentences are better).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: