One common query was to check the FTP server on our side for access from a custo...

Johnny555 · on Jan 30, 2020

If that's your typical use-case, I can see why that's as fast (or faster) as Elasticsearch, but it's not clear why you'd have gone with Elasticsearch in the first place.

When I last used Elasticsearch, we indexed ~10TB of log data a day, kept 14+ days, and a typical query was looking for log records that matched a unique session ID over the past 10 days or so, not an easy task for grep. But we didn't pay $50K/month for that cluster, it was closer to $12K/month.

Before we used ES, someone had written a parallel grep that would grep multiple files at once, and would run multiple greps at once through chunks of the file, but still it could take 30 minutes or longer to churn through the logs on a 32 core machine - ES took that query time down to 100 milliseconds. The ES cluster easily paid for itself in employee time savings.

fxtentacle · on Jan 30, 2020

We used to use the ELK (ElasticSearch, Logstash, Kibana) stack hosted locally. So when we outgrew that, we just tried to find a bigger ELK provider.

We did evaluate the elastic.co Cloud - which I concur would have been cheaper than Amazon - but since their demonstration cluster failed to boot correctly and as a result suffered a data loss during their sales presentation, my boss didn't feel comfortable going with them.

That's how I was left with the decision to either scale up our old ELK stack on AWS or go with something proprietary.

londons_explore · on Jan 30, 2020

If you always want to search by one thing, you can manually index by that thing. In your case, arrange your log files by the first 6 hex characters of the user ID (/var/log/xxx/xxx/date.log), and grep will typically only then have a few megabytes to scan.

If you need real indexes, or just want something industry standard and maintainable rather than 'some guys grep script', then elastic search is probably the way to go.

Johnny555 · on Jan 30, 2020

That was just one thing we needed to search for (but by far the most common). The guy that wrote the parallel grep did try creating some indexes of common fields to speed searches, but quickly realized that he was re-implementing the wheel (poorly)

Plus we made good use of Kibana dashboards for the service

morelisp · on Jan 30, 2020

> it's not clear why you'd have gone with Elasticsearch in the first place.

Having been in a company with a similar issue (Hadoop/Spark, not ES), the issue is: You have a bunch of programmers hired to handle the 10TB data/day. Then you need some work done processing some data that is, e.g. 100GB once a week. Rather than evaluate the best way to process that data, the through process basically goes "it doesn't fit in my laptop's memory, so we'll use the cluster."

pkulak · on Jan 30, 2020

How did they connect at all with the wrong hostname?

fxtentacle · on Jan 30, 2020

They didn't, that's why they called us to complain that our FTP server was offline...

Some of them also had one of those DNS-grabbing ISPs, so by mistyping the hostname they would accidentally connect to the wrong IP.

EDIT: I think I now got what you meant. When a customer says "my password is not working" then the first thing we do is check in the logs that the customer actually did connect to the correct server. That's like the number one issue, correct username and password but wrong hostname.