Hacker News new | past | comments | ask | show | jobs | submit login

Slightly off-topic, but I thought this would be a good place to ask.

Are there any word embedding tools which take a Lucene/Solr/ES index as input and output a synonyms file which can be used to improve search recall?




There's a few projects that use ES/lucene as a backend/datastore once the feature engineering is done, but I don't see models operating on the native indexes directly, maybe the format is too different from one-hot (after turning off stemming/stopwords and other info-losing steps)

http://lucene.472066.n3.nabble.com/Where-Search-Meets-Machin...

https://news.ycombinator.com/item?id=11876542


Not quite about creating synonyms, but in the same area there is Semantic Vectors https://github.com/semanticvectors/semanticvectors.

They process Lucene index and create embedded representation of it. Then you can search over that representation for "semantic" matches.

Last time I checked it about a year ago the embedded collection of documents was kept in the memory and the search was implemented by a linear scan. So I suspect it can be slow on very large collection of documents.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: