Some people are probably going to throw some shade on me for saying this since i...

BurritoAlPastor · on July 21, 2020

Nagios is the Jenkins of monitoring. It's popular because you can get it running in an afternoon, and it's easy to configure by hand.

It then rots within your infrastructure, because it resists being configured any way _except_ by hand. I've built two systems for configuration-management of Nagios (at different companies), and it's an unpleasant problem to solve.

Prometheus's metric format and query syntax are cool, but the real star of the design is simply this: you don't have to restart it, or even change files on your Prometheus server, when you add or remove servers from your environment.

znpy · on July 21, 2020

I have to use an icinga instance from time to time (icinga is a nagios fork). I really can't see the value, beyond seeing if a service is up or down.

I'm surprised no one has named Zabbix. Zabbix is way better. I hadn't the chance to use Zabbix past 4.something but it's worth it.

I've been using Prometheus/grafana and frankly the value I see is it's out of the box adaptability at capturing a mutating data source (example: metrics about ephemeral pods Una kubernetes cluster).

hnarn · on July 22, 2020

> I really can't see the value, beyond seeing if a service is up or down.

This is an extreme oversimplification. The value is not in "seeing" if something is "up or down", the value is in the modularity of what a "service" can mean in the first place (anything you can script -- and the eco-system of plugins is huge), the fact that you don't have to "see" it (because notifications are extremely modular), the fact that escalations of issues can happen automatically if they are not resolved, and the fact that event-handlers in many cases can help you resolve the issue automatically without even having to raise an alert in the first place.

Nagios is a monitoring tool built with the UNIX philosophy in mind, and it's ingenious in its simplicity: decide state based on script or binary exit codes, relate dependencies between objects to avoid unnecessary troubleshooting, notify if necessary (again, with scripts/binaries) and/or try to resolve if configured. It hooks into a server frame of mind very well if you're a sysadmin.

Sure, if you main use case is "mutating data sources" and collecting metrics, any Nagios flavor won't be for you, because it's not what Nagios is made to do. There's a reason it's extremely popular in large enterprises, because it was created for them. No monitoring solution is for everyone and solves every problem.

mekster · on July 22, 2020

> you can get it running in an afternoon, and it's easy to configure by hand.

This read like a joke. Nagios looks like it's from stone age having files in cgi-bin folder with unnecessary complication to installation and management, unless they made it any better at some point.

hnarn · on July 22, 2020

While many people conflate "Nagios" with the corporate offering from the company Nagios, I personally mean the core monitoring component. There's no web interface to it (many are available, they're all ugly, but they're also not strictly necessary).

hnarn · on July 21, 2020

Yeah, I was in no way insinuating that Nagios is superior in general, or even to Prometheus, just that it does the job well for some use cases. Monitoring is tricky and you definitely need a tool box because each problem has a different optimal solution.

BurritoAlPastor · on July 21, 2020

Nagios and its forks for sure have a place in the monitoring ecosystem. They’re just not tools that tend to stick around once you’re big enough to have a dedicated DevOps or SRE team.

hnarn · on July 22, 2020

That depends completely on what type of business you're running. Anyone that has an environment that is unlikely to massively change (expansion excluded) within a few years benefits from the stability of Nagios. I know many large companies that use it, or derivatives of it, and even many state/military organizations.

egwor · on July 21, 2020

We run this at work, and I have (to put it politely) severe reservations. What does these functions for you:

- Realtime GUI which works with windows 10 (we have a web site and nagastamon) - aggregation of alerts / alert roll up - sharing filters - summary + description - temporary downtime of alerting - message rate suppression (to stop floods) - filtering of columns/ordering etc. - bulk actions for closing alerts

I've come from using: - HPOV (great but can't handle bursts of alerts) - email (everyone has to have filters, can't handle bursts of alerts that well (runs everyone out of email space!), prone to failure/delays due to email - home grown solution

hnarn · on July 22, 2020

I'm not sure what you mean by "this" but I suspect you mean Nagios XI or some other corporate offering. I was referring to the core monitoring component and its forks/derivatives.

Other than that I don't quite understand the point of your comment, you say you have "severe reservations" but many of the points you list are available even with Nagios core, and most of them are available in other Nagios variants.

thethethethe · on July 21, 2020

Possibly stupid question:

Can you use Nagios to stream metrics exported from your applications binary in real time?

For example, can you use Nagios record each http request processed by your application webserver, tagged with http method, code, latency etc?

hnarn · on July 21, 2020

Nagios is not so much a "recorder" as it is a "state inspector", basically plugins run on a schedule and inspect that things are up to spec. The situation you describe may be better suited for something like the ELK stack which can hook into your HTTPD logs.

brightball · on July 21, 2020

Ever looked at Zabbix?

hnarn · on July 21, 2020

I haven't, unfortunately, but it looks promising from just looking into it briefly. Open source monitoring is always an area that needs more competition.

orev · on July 21, 2020

I find the problem with monitoring is not a lack of options, but an overwhelming abundance of them. It’s almost impossible to evaluate all of them realistically, so you just wind up using the most popular one.

Twirrim · on July 21, 2020

Zabbix has been around for quite a long time. Easily 15 years now. I haven't looked at it since around 2013, but at the time it was placing quite some pressure on a mysql db backend. It looks like they've expanded out to support more than MySQL as the back-end these days.

znpy · on July 21, 2020

Zabbix has been growing A LOT lately, and in a good way. It's nice to see this king of projects evolving Una good direction instead of stagnating and the diying.

Twirrim · on July 24, 2020

It's great to hear. I liked zabbix in general. At the time of its initial surge in popularity, nginx and cacti were the dominant force in monitoring. Zabbix was a little quirky to get used to, but a breath of fresh air.

k-rus · on July 21, 2020

I noticed that Zabbix supports PostgreSQL and TimescaleDB as back-ends and just checked the list, which contains also Oracle and SQLite (DB2 support is experimental).