Hacker News new | past | comments | ask | show | jobs | submit login

> - mistake #1, assuming it's a technical thingy that some techie person should do and they should just get on with it and do it. Whatever it is. The mistake here is that without guidance this person is going to be selecting some random tools mostly focused on operational things (logging, infrastructure). These have low value for other stakeholders. Yes you need that but these are also commodities.

I'm honestly pretty much giving up on monitoring at work. I enjoy working on monitoring, and I think good monitoring and tracing is a strong edge you can have at a technical level. After a certain point, I'm pretty sure you will lose control without good monitoring, tracing and alerting.

But practically, whenever there is a glimmer of hope, aka spare time, exists to do some work on the monitoring, the entire product stack realizes: There is spare time. Let's dump any combination of (i) introducing entirely new tech into the stack, (ii) Requesting weeks of analysis if postgres/any underutilized upstream component might be the cause of slowness for an individual database of a cluster on a group of postgres servers could be the cause of slowness, (iii) elevate something to "critical we cannot work and customers are crying just now" and (iv) some production outage by a shitty change due to an architectural weakness the product had since 3 years and we've been pointed it out, but we need a "task force" of "all qualified people on deck" except that the issue isn't related to e.g. the database, and as such, the postgres DBAs are twiddling thumbs. But "working together" means they must be part of "all hands on deck".

Oh, and then everyone starts yelling why the we can't just pile new topics on infra every 3 month, and why we can't have an up-to-date modern monitoring stack at the same time from the same guys.

Monitoring and alerting just has zero visible value outside of infrastructure and operations imo and I'm pretty much done fighting for it. Sorry for being somewhat sour about this, I can be about things I really like.




I am the same, I love good monitoring. My place has been very focused on it, and we struggle to find time to work on it beyond basic «does it work». Yet We have so much monitoring we are drowning in metrics.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: