Hacker News new | past | comments | ask | show | jobs | submit login

> p50/p99 retrieval times at realistic loads or it didn't happen.

Therein lies the problem - how do you generate actual realistic loads for a search engine without having a large number of people use it for searches? Simply hitting it with random search terms isn't realistic.

Some people will be on slow connections, search terms for something specific might spike in only a certain region (earthquake, etc), etc.

If your terms are too random, it'll perform worse than it should (results not in the cache), and if not random enough it will perform better than it should.




One actual solution is to use historical search logs. Just because "random" is a bad answer doesn't mean people don't try and make reasonable reproductions of load to replay and benchmark. Cacheing is also a big factor.


I don't know if this is true for elasticsearch, but at least with solr, when you update an index, the default is to run some of the queries in the cache of the old searcher to warm up the new one.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: