Hacker News new | past | comments | ask | show | jobs | submit login

I'm by no means heavily experienced in this, but my current employer has a big demand for this. (e-commerce)

First there's the Docker + Kubernetes architecture that ES lends itself to really well. Then (depending on your use-case) there are concerns like hot/warm architecture, node types, ETL/indexing processes. ES recently moved over to openJDK, so there's a couple intricacies there (i.e. JVM heap size)

Then, there's document/query structure. In no particular order:

- Do you have any parent/child relationships?

- Do you have stop-word lists developed?

- Can search templates help your queries?

- How will you interface with ES? It has REST APIs, but it's recommended to not expose ES directly to your applications.

- Some advanced querying possibilities like customizing tokenizers, normalizers, and a bit of internationalization.

- Oh, we haven't even discussed security yet.

- Also, ES isn't meant to be a primary data storage. This is more so a "cache", but not quite like Redis. So, you'll need a DB elsewhere most of the time.

All of this changes depending on if you're using it for SIEM, e-commerce, AI/ML, etc. Also, Elastic now provides their own SIEM solution, a pre-built search solution (AppSearch + Search UI), built-in security features. Check out the new ES 7.2 update; it's kinda nuts.




> ES recently moved over to openJDK, so there's a couple intricacies there (i.e. JVM heap size)

My current employers uses ES - we're on 6.8, planning to move to 7 in a few months. Judging by the other replies here I'd say we have a reasonably large cluster (150+ i3.2xlarge instances, billions of documents), so tuning the cluster is very relevant to us. Could you expand on how things have changed with the move to OpenJDK?

I've seen some claims online that, contrary to what Elastic recommends in their docs, a few machines with huge heaps (100+ gb) is the way to go, rather than many machines with 20gb heaps.


>I've seen some claims online that, contrary to what Elastic recommends in their docs, a few machines with huge heaps (100+ gb) is the way to go, rather than many machines with 20gb heaps.

Usually the recommendation is less than 32GB - this link has some more discussion about it: From https://discuss.elastic.co/t/es-lucene-32gb-heap-myth-or-fac...

It seems whether it's better or worse depends on your data set . But I would love to see tests of different kinds of workloads with large or smaller heaps.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: