I used a similar technology stack for categorizing bookmarks (boilerpipe + gensim lda). Interesting that we wound up choosing the same tools.
In the interest of reporting on failed experiments, I also tried a k-means analysis written in php. It was slow and worthless, I wouldn't recommend anyone else going down that road.
In terms of next steps, I've been trying to use the open source HLDA software from David M. Blei's group [0] to do hierarchical clustering to avoid having to decide on the number of topic parameters. Haven't gotten it to compile on my machine yet though.
In the interest of reporting on failed experiments, I also tried a k-means analysis written in php. It was slow and worthless, I wouldn't recommend anyone else going down that road.
In terms of next steps, I've been trying to use the open source HLDA software from David M. Blei's group [0] to do hierarchical clustering to avoid having to decide on the number of topic parameters. Haven't gotten it to compile on my machine yet though.
[0] http://www.cs.princeton.edu/~blei/topicmodeling.html