Hacker News new | past | comments | ask | show | jobs | submit login

I was a PD customer for 3 years at FarmLogs. There was literally zero innovation or development on that product during the entire time there. It was abandonware. I am shocked that they're dominating this space with such lackluster enthusiasm about their own product.

I'm not just referring to a lack of new features... it’s a lack of improving their core product. It's buggy and painful to use. I can’t tell you how many times acknowledging an issue (during the crisis around the issue) failed and caused cascading problems with other engineers being paged. That’s the one thing this product should do without fail.




I was also a customer for quite a while, and also didn't notice any new features or anything. But I also didn't need or desire any new features, and I can honestly say that of all third-party services I've used as a part of infrastructure across every company I've worked at, PagerDuty is the only one for which I've never experienced downtime.

To me, keeping the product stable and making sure that I never miss a page when I'm having downtime is by far the most important thing. I commend them for having designed and developed a system that appears to be one of the most robust services you could use. It means I'll continue to use them in the future without hesitation or lengthy debates on cost.


It works well, and has most of the features we need from a paging solution. Not sure what you were missing.


The logic/brains around the actual nucleus of the business (calling people when shit hits the fan) is bug ridden. At this valuation and after this amount of time I would expect that to be the most bulletproof and pleasant part of their entire program. It is not.


I agree with your assessment of zero innovation or development.

I help manage a fairly large on-call rotation and we've actively been finding gaps in their product where we think the product should be of assistance. Through support tickets or even hopping on a call with them they didn't seem all that interested in hearing about it or finding a solution for it. I mostly just got a sales pitch about upgrading our plan for new reporting features.

Some of the problem we've frequently encountered:

1. Anyone who is offboarded from the company will just get removed during the next LDAP sync. This just moves everyone up a day for their next shift, no notifications so the visibility is hard. If you try to remove someone manually there is a pre-requisite that they get removed from any schedules they belong to manually.

2. Overrides do not coincide with a specific shift, but a point in time. While I understand why overrides work the way they do, when combined with the problem outlined above, can be a real pain to fix the rotation if someone is taken out since there is no Audit Log.

Here is a scenario that happens:

- Person A is on-call on 9/29

- Person B is on-call on 9/30

- Person C is on-call on 10/1

- Person D has an override for Person B since they couldn't make their shift. Person B now does not have a shift until it goes through the rotation again.

- Person A is offboarded after the override was configured.

- Person B's next on-call shift was moved up one day to 9/29. Person B now is on-call again, with no notification.

- Person C's next is now on-call on 9/30, which has an override from Person D and now is not on-call.

As you can see, it would be beneficial if there was at minimum a notification of a gap in the rotation through automated means, or at least allowing some overrides be tied to a specific persons shift rather than point in time.

We built a tool internally that every few minutes keeps a copy of the rotation and the order of its participants and detects any changes. If it does, it will open a GitHub issue with the user that was removed and the position in the rotation they were at before being offboarded. I often put myself in that spot to preserve the rotation from any breaking changes until the rotation goes through at which time I remove myself.

3. Assistance to find someone to take your shift. Say you get scheduled for 9/30 but find out you can't make it, with one click of the button PagerDuty could email 4-5 people about your shift time and ask if they can take it. Someone can accept it and it'd notify the person their shift has been covered, and others that the request was sent to it has been taken care of. It could factor in any fatigue or length of time someone has been on-call before it includes them in the pool of engineers to cover the shift.

4. Per schedule notification policies. Anyone can change their notification settings or how they're notified. For one particular rotation, we'd like to enforce certain minimums to make sure the push notification is sent immediately, and the engineer is called if not acknowledged in 5 minutes. Currently we cannot enforce that.

5. Audit Log. Who added X to the rotation? Who removed X from the rotation? Who changed the length of shifts? I could add more. For Enterprise level software an Audit Log would be great. They mention they have one internally but don't have plans to expose it for customers.

While their API allows you to build all these tools, having it a first-class part of their product would be wonderful.


I was a PD customer for about the same length of time at my last job, but I do recall a couple of 'innovations'. Incident timelines and incident tracking were pushed reasonably hard, and not at all anything I cared about.

Here's what I'm looking for as a user:

- primary / secondary escalations pulled from a single roster without overlaps - add / remove people from roster without fucking up who's on call next

But it's pretty obvious they're more focused on listening to fence-sitters than paying customers. Which may be a viable strategy, considering their new valuation.


Honestly I think that's just a result of the fact that enterprise software is a sales driven business much more than it is a product oriented one. Additionally, especially in backend tech products, migrating can require a sizable number of man hours for limited results.


For those of you fed up with PD, you should check out https://pagertree.com

Fraction of the price, reliable, and not packaged with monitoring.


What did you find it lacking?


Aside from the lack of innovation, it's got lots of bugs. Sometimes it will continue to call you over and over when things have been acknowledged, or it wont acknowledge things, or you will try to acknowledge something that has already been acknowledged and it will error out.


So much this.

Because the things we monitor are heavily interrelated, and an outage or degradation in one service will often affect others, it's not uncommon for us to see multiple alerts at once.

The typical scenario goes like this:

1. Pile of alerts goes out.

2. I acknowledge all of them in the course of acknowledging the first.

3. Some of the alerts will decide, despite having state-changed to yellow in the app, that they somehow haven't been acknowledged, so the app will issue the notification tone — for, again, previously acknowledged alerts.

4. Some more alerts will decide they still haven't been acknowledged, and will play the notification tone again, to remind me that I have outstanding, "unacknowledged" alerts.

All of this while I'm trying to fix the broken thing and am being interrupted to attend to the feels my alerting tool has about having alerts.

That is to say: PagerDuty is getting in the way of my fixing the outage. No, I can't just ignore them. We have a global team, and whoever's secondary is probably asleep, because we've taken care in crafting our on-call rotations such that people shouldn't get alerted at 3am. If they sleep through the alerts, they will escalate up my management chain. Then I'm having to explain to my director or VP why there are "unacknowledged" alerts — while I should be fixing the fucking fire.

It's an abhorrently crap user-experience, in a way that is antithetical to the tool's very purpose, and I'm not the only person on my team who has either wanted to throw, or actually has thrown, their phone through the nearest wall because of it.


"Aside from the lack of innovation"

What exactly does this mean? A "lack of innovation" doesn't tell me anything. It does what it says on the tin. That's usually what I'm looking for from a product like this.


not op but it lacks a proper, modern ui. Recently, I needed to remove myself from a schedule and a team, took me 10 minutes.


Also true. Basic things like being able to cascade a delete or manage dependencies during a delete is a nightmare.

It truly feels like 6 months were invested into a fairly basic CRUD rails application with a Twilio gem installed and then forgotten.

Good for them to have generated so much revenue with such a weak product. We paid them for years even after we removed everyone from rotation. I am sure lots of companies do this because it is the only thing out there.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: