>Automatically create the online meeting spaces for collaboration
>Manage TODO items so nothing falls through the cracks
I work in incident response, and I feel a huge misunderstanding of incident response products fail to understand that companies already have established tools for collaborations and meetings and for capturing planned work.
I find adding these things is seen as nice and inclusive and it is easier to sell a product that does a lot, but it turns into complete bloat and makes adoption harder, and makes it harder to support a larger product.
This was a big learning for us when we were first building out Kintaba[1].
Re: task management specifically-- having previously been at FAANG companies that built all their own tools I had not realized just how prevalent Jira is. It. is. EVERYWHERE. and IT orgs at companies from 3 to 300,000 people are absolutely married to their carefully customized version of it as a system of record for everything that happens or will happen.
We see many on-premise implementations as well despite the announced sunsetting of that product.
I'm sure there's a #2 and #3 out there but honestly I almost never see it (we do see clubhouse/shortcut from time to time... but even those folks tend to move to Jira within 6 months).
OT but it really makes me doubly impressed that Slack was able move into organizations so successfully from all corners such that it was able to dodge what would traditionally be a pretty big Atlassian-owned barrier.
the on-prem version helps you ensure that it's running fast and secure, or you can end up fucking up the performance part. But on-prem also means systems that are firewalled from internet might have access to it, which helps with integration.
I think the problem is trying to present an abstraction layer to management, because we have those same features of todo lists, and recording information, in Jira and ServiceNow and like a dozen other pieces that's purpose is to coordinate and track work, and often they are unpopular with developers because they end up trying to provide an abstraction layer to the Execs to replace their management by spreadsheets, but unfortunately as anyone who has worked in software for long enough can tell you, abstractions are leaky.
Hence the dissatisfaction with a lot of these tools.
What do you think is the solution - when an enterprise already has Jira, Github and Confluence, how do you think a product like Grafana Incident should integrate with these somewhat overlapping products?
This feels like a central question of post-cloud / post-SaaS outsourcing.
In the end, it boils down to two options: offer deep APIs into your product, or don't.
IMHO, what needs to happen to support the former is for every SaaS purchase to include full technical due diligence on external integration capabilities.
Integration needs to start being a headline feature in purchasing. And less an afterthought when a horrified engineer looks at some new enterprise product that's already being adopted.
Alert templating. Grafana is fussy about configuring alerts on dashboards that have variables.
What this means is if you have 30 clusters and want to use a single dashboard with a drop-down variable seefting your cluster you cannot define alerts on it. It will refuse to do it.
Alerts are also integrated tightly in dashboards. Forces alerts to be saved/backedup/imported as single json blob. We want separate management of alerts so they can be defined as code and not in the dashboard blob of json!
What makes me chagrined is because of the above issues we have to use prometheus alert manager instead while our colleagues absolutely LOVE grafana itself! We can't duplicate alerts tens of tens times. We don't want that management nor do we want to teach our colleagues jsonnet/ksonnet to generate it. We also don't want permission problems.
The new Grafana alerts do absolutely nothing to help with this.
I'm at the point where I would pay 5 figures a year for something purely to do better alerting inside or alongside Grafana. Clicking alerts together is a nightmare when I have a ton of identical systems I need to configure. Same for dashboards - the limitations of the current mechanism are too severe.
I'd build my own templating mechanism for it, but I still want the alerts visible in Grafana itself. Zabbix has the power to do all this but with a UX that is not ideal....
Hey there! I work with alerting in general at Grafana - what are the pain points of dashboards and alerts as code you're currently experiencing? Would love to deliver / capitalise on the feedback.
Alert templating. Grafana is fussy about configuring alerts on dashboards that have variables. What this means is if you have 30 clusters and want to use a single dashboard with a drop-down variable seefting your cluster you cannot define alerts on it. It will refuse to do it.
Alerts are also integrated tightly in dashboards. Forces alerts to be saved/backedup/imported as single json blob. We want separate management of alerts so they can be defined as code and not in the dashboard blob of json!
What makes me chagrined is because of the above issues we have to use prometheus alert manager instead while our colleagues absolutely LOVE grafana itself! We can't duplicate alerts tens of tens times. We don't want that management nor do we want to teach our colleagues jsonnet/ksonnet to generate it. We also don't want permission problems.
I spent a solid day trying to play around with this to get it to work. Because of this the alerts are impossible to code review or store in a git source. Which stinks because Grafana's datasource API's would be amazing to use for alerting. But they're either unusable because anybody can change them or the administrator could bork them at any given point (which has happened before), or just undocumented to the point where they are useless.
That's not even to begin on dealing with the "big blob of json" problem [1] that was clearly important enough to be given an entire spot at GrafanaCon, but even Grafonnet is not supported with Grafana 8. There is apparently some CUE way of doing this, but I can't seem to find any official documentation on that.
Anyways, I've moved back to alertmanager for the time being.
edit: is all of grafana labs downvoting the GP? this is very honest and candid feedback here.
Hoping to see cleaner ways to integrate across data sources, but developing that contract is going to take some time I think. In the meantime, should be able to get this supported with prometheus data source in a Grafana managed alert: https://github.com/grafana/grafana/pull/44865
Grafana would do well to look into the thinkorswim desktop platform and the ability to write code around metrics. They are entirely different use cases but I feel the desired goals are the same, which is making the most of an ocean of metrics. Financial world crushes at this for obvious reasons. Tech world? not so much.
It seems like this is a special case of project management software. If the existing products can't handle incidents then that software should be improved, not new software written. It's the best way to ensure that everybody on the team knows how to use the software when it's most urgently needed.
E.g. would you change your favorite editor to a different one, in case of an incident? Probably not. So why change project management systems?
While you certainly could cobble together incident response workflows in something like Jira, I think it makes more sense to extend the monitoring and paging tooling (in large part due to the reason you mention— familiarity with the tools that you're using as part of that response).
Did we watch a different presentation? ChatOps isn't new. What you're describing is what I would consider an antiquated practice. Nobody wants to go sniffing around a PM tool at 3AM in the morning.
This is timely... I just started building out an internal "chatops" solution that leans heavily on OnCall. Looks like I may be able to set that aside.
If this is implemented as cleanly as OnCall, I have high hopes. It isn't without bugs, but it's already miles ahead of solutions like Pager Duty (in my opinion).
Please reach out to me, it would be awesome to learn your experience of using our API and make sure we're aware of all bugs you noticed (and fixing them!), matvey.kukuy @grafana.com
Yeah, there are definitely already products in this space, but we're already invested in Grafana, so it makes sense to lean in that direction, even if it meant a little custom work on our end (though it looks like that may not be necessary now)
In most of places I have been involved with ServiceNow has been the core of incident management. From alerts playbook to follow up on systems/components uptime and daily/monthly/yearly SLA breaches.
Any system that is offered for enterprises should somehow integrate into that solution.
Generally speaking, I can say that ServiceNow is horrible to work with, use, manage; but it looks like it is the solution that is dominant in enterprises.
Would it be possible to have a split offering, with both on prem and cloud? In my mind I would prefer to have things like Prometheus, Logs, and Metrics stored on prem mainly due to the volume of logs and metrics we create. Then use Grafana cloud for Grafana Dashboards, Loki logs, and incident management that pull directly from my on prem data stores. I bring this up as it may be cost prohibitive for us to store our metrics in the cloud ( we make so many metrics and logs! ) but I would love to off load hosting the front end. Grafana cloud takes care of managing and maintaining Grafana Dashboard and backend database, Authentication, updates, ect. I'm fine hosting Prometheus and Loki locally, have been for a long time! I just get annoyed having to host Grafana and setting it up, the database up, configuring auth, etc.
I'm curious about this part, and I can absolutely understand if you don't want to answer but I do have the following question:
Why is it tricky to ensure an application can run on a cloud deployed system or a local Kubernetes/Docker Swarm/newfangle containerization mechanism of choice/etc. system?
Specifically I'm wondering what barriers you're running into that are pushing the focus to go cloud only.
Yeah, building for Grafana Cloud has big dev benefits too. We can iterate quickly, run live experiments, and build a more complicated stack (e.g. for ML tasks). We're going to be integrating more and more with the rest of Grafana too. All of this is much easier to do in one place.
13 years ago I was working on a SaaS eCommerce platform and it feels like this tool is a relatively minor improvement over what we had built on top of IRC.
That said; it’s pretty cool and I’m definitely going to evaluate it: as our current PagerDuty integration is not nearly as clean as this.
You'd do your job as a CEO better if you didn't spam competitors HN threads with your own product, unless you have something relevant to bring to the table. This comment just looks like a shameless plug because you're in the same sector.
One way you could approach is to highlight what you think is good with Grafanas implementation, and what could be better, and then contrast that with your own offering, without sounding like a salesman.
>Manage TODO items so nothing falls through the cracks
I work in incident response, and I feel a huge misunderstanding of incident response products fail to understand that companies already have established tools for collaborations and meetings and for capturing planned work.
I find adding these things is seen as nice and inclusive and it is easier to sell a product that does a lot, but it turns into complete bloat and makes adoption harder, and makes it harder to support a larger product.