Hacker News new | past | comments | ask | show | jobs | submit login

I no longer work there, but Lucidworks has had embedding training as a first-class feature in Fusion since January 2020 (I know because I wrapped up adding it just as COVID became a thing). We definitely saw that even with just slightly out-of-band use of language - e.g. in e-commerce, things like "RD TSHRT XS", embedding search with open (and closed) models would fall below bog-standard* BM25 lexical search. Once you trained a model, performance would kick up above lexical search…and if you combined lexical _and_ vector search, things were great.

Also, a member on our team developed an amazing RNN-based model that still today beats the pants off most embedding models when it comes to speed, and is no slouch on CPU either…

(* I'm being harsh on BM25 - it is a baseline that people often forget in vector search, but it can be a tough one to beat at times)




Heh. A lot of what search people have known for a while, is suddenly being re-learned by the population at large, in the context of RAG, etc :)


The thing with tech is, if you're too early, it's not like you eventually get discovered and adopted.

When the time is finally right, people just "invent" what you made all over again.


Totally. And this has even happened in search. Open source search engines like Elasticsearch, etc did this... Google etc did this in the early Web days, and so on :)


Sorry, what is it that people in search _have_ known?

I know nothing about search, but a bit about ML, so I'm curious


That ranking is a lot more complicated than cosine similarity on embeddings


What’s the model?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: