I think the author addresses your point one in the article: > SQL is a perfect l...

leokennis · on March 2, 2021

I disagree with the author on that.

Yes, SQL is nicer for structured queries, sure (“KQL” in Kibana is sort of a baby step into querying data stored in Elastic).

But in Kibana, I can just type in (for example) a filename, and it will return any result row where that filename is part of any column of data.

Also, if I need more structured results (for example, HTTP responses by an API grouped per hour per count), I can pretty easily do a visualization in Kibana.

So yes, for 5% of use cases regarding exposing logging data, an SQL database of structured log events is preferred or necessary. For the other 95%, the convenience of just dumping files into Elastic makes it totally worth it.

tgtweak · on March 3, 2021

Agreed here. More and more data is semi structured and can benefit from ES (or mongo) making it easily exploitable. It's a big part of why logstash and elastic came to be.

One of the most beautiful use cases I've ever seen for elasticsearch was custom nginx access log format in json (with nearly every possible field you could want), logged directly over the network (syslogd in nginx over udp) to a fluentd server setup to parse that json + host and timestamp details before bulk inserting to elastic.

You could spin up any nginx container or vm with those two config lines and every request would flow over the network (no disk writes needed!) and get logged centrally with the hostname automatically tagged. It was doing 40k req/s on a single fluentd instance when I saw it last and you could query/filter every http request in the last day (3+bn records...) in realtime.

Reach out to datadog and ask how much they would charge for 100bn log requests per month.

marcinzm · on March 2, 2021

That argument would apply to production backend databases but I don't see how it really applies to logs. It's like they just copy and pasted a generic argument regarding structure data without taking account of the context.

Logs tend to be rarely read but often written. They also age very quickly and old logs are very rarely read. So putting effort to unify the schemas on write seems very wasteful versus doing so on read. Most of the queries are also text search rather than structured requests so the chance of missing something on read due to bad unification is very low.