Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
pizza
on Sept 2, 2023
|
parent
|
context
|
favorite
| on:
Wikipedia search-by-vibes through millions of page...
FWIW sentence-transformers truncates the input to at most 256 tokens by default; you might just be embedding the first paragraph or so.
lsb
on Sept 2, 2023
[–]
I average the embeddings of every 512 bytes of the page text
theolivenbaum
on Sept 4, 2023
|
parent
|
next
[–]
That might actually be making things worse for longer articles. It probably would be better would be to index them separately and aggregate back to article level post query
pizza
on Sept 2, 2023
|
parent
|
prev
[–]
Ah ok, that makes sense
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: