Hacker News new | past | comments | ask | show | jobs | submit login

I think the author addresses your point one in the article:

> SQL is a perfect language for analytics. I love SQL query language and SQL schema is a perfect example of boring tech that I recommend to use as a source of truth for all the data in 99% of projects: if the project code is not perfect, you can improve it relatively easily if your database state is strongly structured. If your database state is a huge JSON blob (NoSQL) and no-one can fully grasp the structure of this data, this refactoring usually gets much more problematic.

> I saw this happening, especially in older projects with MongoDB, where every new analytics report and every new refactoring involving data migration is a big pain.

They're arguing that using non-structured, or variable structured data is actually a developmental burden and the flexibility it provides actually makes log analysis harder.

It seems that the "json" blobs are a symptom of the problem, not the cause of it.




I disagree with the author on that.

Yes, SQL is nicer for structured queries, sure (“KQL” in Kibana is sort of a baby step into querying data stored in Elastic).

But in Kibana, I can just type in (for example) a filename, and it will return any result row where that filename is part of any column of data.

Also, if I need more structured results (for example, HTTP responses by an API grouped per hour per count), I can pretty easily do a visualization in Kibana.

So yes, for 5% of use cases regarding exposing logging data, an SQL database of structured log events is preferred or necessary. For the other 95%, the convenience of just dumping files into Elastic makes it totally worth it.


Agreed here. More and more data is semi structured and can benefit from ES (or mongo) making it easily exploitable. It's a big part of why logstash and elastic came to be.

One of the most beautiful use cases I've ever seen for elasticsearch was custom nginx access log format in json (with nearly every possible field you could want), logged directly over the network (syslogd in nginx over udp) to a fluentd server setup to parse that json + host and timestamp details before bulk inserting to elastic.

You could spin up any nginx container or vm with those two config lines and every request would flow over the network (no disk writes needed!) and get logged centrally with the hostname automatically tagged. It was doing 40k req/s on a single fluentd instance when I saw it last and you could query/filter every http request in the last day (3+bn records...) in realtime.

Reach out to datadog and ask how much they would charge for 100bn log requests per month.


That argument would apply to production backend databases but I don't see how it really applies to logs. It's like they just copy and pasted a generic argument regarding structure data without taking account of the context.

Logs tend to be rarely read but often written. They also age very quickly and old logs are very rarely read. So putting effort to unify the schemas on write seems very wasteful versus doing so on read. Most of the queries are also text search rather than structured requests so the chance of missing something on read due to bad unification is very low.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: