Hacker News new | past | comments | ask | show | jobs | submit login

FWIW sentence-transformers truncates the input to at most 256 tokens by default; you might just be embedding the first paragraph or so.



I average the embeddings of every 512 bytes of the page text


That might actually be making things worse for longer articles. It probably would be better would be to index them separately and aggregate back to article level post query


Ah ok, that makes sense




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: