I found PagerDuty to be grossly underwhelming when our team moved to it.
I'd focus on stupid-easy integration & some reasonable calendar management to differentiate.
PagerDuty Calendar/Holiday/Workday management and cross-regional scheduling were very poorly implemented.
The amount of manual schedule adjustment when someone actually wanted to take their 1 week vacation was insane. We ended up with overrides on top of overrides.
I'd imagine some of the common tools like Workday should give you integrations for that.
Taking a holiday became like a 12 step program - email boss in advance, put in HR system request, cross check your PagerDuty schedule, negotiate a trade with another teammate for PagerDuty rotation, decline meetings, update your outlook calendar out-of-office, set your slack status & notifications, and re-remind everyone the week before you go.
It's almost like they wanted it to be easier to just not take time off?
The other problem with these tools is you only get value out of them depending on the level of systems integration you spend your time on. If even a single system is not integrated & still sends emails/slacks/only updates a dashboard .. then PagerDuty is simply yet-another-tool to monitor, rather than the single pane of glass.
For instance at Google, taking vacation in workday puts an OOO event on your Google calendar. Then the oncall scheduler takes that into account along with your oncall history when it schedules new rotations. You only have to get someone to cover if you are taking time off in the near future, and any surplus or deficit will be fixed by the scheduler going forward.
It's kind of wild that an expensive product with tons of funding and employees is worse than some scripts cooked up by some SREs.
I think outside some FAANG firms with strong engineering management, the purchase of these tools is a top-down affair.
So as cynical as my statement may seem- " It's almost like they wanted it to be easier to just not take time off?" .. user ergonomics just doesn't enter into the conversation at all, because the users are internal devs.. who cares.
I also worked at an org that gave each of 50+ large customers a dedicated slack channel with.. wait for it.. the entire internal 200+ member engineering org in channel with them. You can imagine the mayhem as each customer figured out how to get attention (@here/@channel/@theguythatfixedthislasttime) and "hack the phone tree" as I used to call it.
This also meant the every one of our 200+ engineers was a member of 50+ customer channels which were filled with all levels of noise that 95% of the time had nothing to do with their app/team/function, and could/should be ignored.
I am in management. I evaluated most of the on-duty tools that are available, at the cost of a couple of days of my personal time. At least they all have free trials. PageDuty, Opsgenie, Splunk On-Call (ex VictorOps), AlertOps, Squadcast, TaskCall, etc. etc.
They are all surprisingly hopeless. For example, none of them have SCIM integration to make managing your teams in them automatic. All of them have clunky calendar overrides. None of them seem to integrate well with Outlook / Google Calendar, particularly none take into account holidays. Many have no Terraform provider to manage them, and the ones that do are clunky at best, and hard to set-up/manage. OIDC is hit-and-miss. For example, for PagerDuty you need to call up their support team and get them to manually tweak something that's not in the UI settings to get OIDC for Azure AD sign-ins to work.
It's not that management is apathetic. We genuinely don't want to engineers spending their time working around vendor inadequacies and lashing this stuff together with barely-maintained scripting that they resent having to write in the first place. Why would anyone want that? Given there's seemingly no product out there that lets you avoid that, what should we do? When they're all rubbish, you either choose PagerDuty because everyone does, or Opsgenie as a protest vote, because at least both have Terraform providers and plug-ins for other things like Slack and Sentry, etc.
My point is not that management is bad & doesn't care, but that management has different priorities & cares about different things than the people using the app.
All of us have talked on this thread about all the things PD is bad at, but few have come to the defense with a list of things its great at.
The sales pitch, I am sure, to management is more along the lines of - metrics, dashboards, reports, "accountability", response times, yada yada.. and because management write the checks, I'm sure thats an area they do deliver.
> the purchase of these tools is a top-down affair
This is absolutely it. Pager Duty don't sell on-call tools to engineers, they sell the idea of having an on-call rota to CTOs, the tools are an implementation detail.
Bottom up engineering tools take more time to start with, but almost always cost less in the long run, contribute to a good engineering culture, and build a sense of ownership.
That isn't universally true. At my medium sized company, the people on call are the ones responsible for choosing an on-call system. But we've rotated through most of the main options, because we haven't really been happy with any of them.
It's much much much easier to cook up a solution for internal use than to create a product for external use that you can sell and that caters to your customers whims and that reasonably easy to use without requiring direct access to the engineering team for support.
You're right, but for $173mm in funding, you'd think you could build a product for external users that's at least feature parity with a "cooked up" internal project.
What's kind of incredible too is they actually lose money still.
The losses are growing faster than the revenue.
They've only had, what, 14 years to figure this out?
You wonder what their staff of 1000ish are working on.
Also, forgot this gem: "On January 21, 2023 PagerDuty CEO Tejada's layoff memo was criticized for insensitivity for inappropriately quoting Martin Luther King, announcing promotions of executives, and tone deafness." (from wiki)
I’m glad to hear I’m not the only one. I constantly felt like I was missing something. A tool this mature and popular should have been a little more polished by now.
It felt like a tool that was sold to CTOs, not users.
As a recipient of pages, I abhorred PagerDuty more than Slack, Teams, email or even phone calls.
At least my phone notifications & do not disturb mode can be tuned around some of those other apps. And if a support guy or teammate calls me on holiday, I can berate them such that they remember to check the holiday calendar / out-of-office status next time. No such luck with PagerDuty.
If your org is not mature enough to put in the time to have a single pane of glass / view of the world / alert state.. you shouldn't be buying any of these tools.
I tried to make managing the incident escalation policies as simple as possible. It currently works with a simple drag&drop mechanism. The way PagerDuty is doing this also seemed very cumbersome to me.
Of course All Quiet is currently missing holiday / off hours features. I'm still figuring out what's the best approach there.
Regarding your last paragraph - can you elaborate on this? I tried to make All Quiet be integratable with basically every system by Email and Webhook integrations. Do you see this as a problem?
I think the problem with integration is probably a man-hours thing.
A lot of shops have a hodgepodge of 3rd party tools which are not necessarily the latest&greatest Cloud/FAANG darlings - Geneos, Jira, ServiceNow, ControlM, Autosys, Tidal.
These tools can vary by industry & org size, so maybe finding some traction in a particular slice so that you can really serve that niche can give you more traction, I dunno.
It just has to be dead simple to integrate, and it's worth considering some sort of professional services/customer success engineering layer.
The problem I've seen is that in every (financial services) org I have worked, the teams responsible for integrations to these types of tools are really operations type teams that do a little development in their spare time. So it never gets done well, or completely.
To me, the only benefit of PagerDuty type tool is when it becomes the single point of escalation. If its just one of many, the product will not be sticky within a given org.
On pricing - I don't think it matters for small orgs the way it does for big orgs.
Let's say today I am in a role where I am effectively the CTO of a 5ish person startup, with only 3 technical. So whether I spend $200/year or $1500/year, I don't really care. Both are more than $0, and require me to review a contract, do a POC, have a lawyer or someone review things, etc.
My internal time to do all those things is more like $10-30k, easily.
Now, do I decide it's worth my time knowing that my devs are going to basically turn off their phones / go to do-not-disturb-mode still because it's sense of working hours is non-existent (making it worse than PagerDuty & Slack notification tuning, both of which I already despise)?
'Taking a holiday became like a 12 step program - email boss in advance, put in HR system request, cross check your PagerDuty schedule, negotiate a trade with another teammate for PagerDuty rotation, decline meetings, update your outlook calendar out-of-office, set your slack status & notifications, and re-remind everyone the week before you go. It's almost like they wanted it to be easier to just not take time off?'
Most of this has nothing to do with pager duty, but instead is around the companies process.
The problem is top-down CTO purchasing of these magic bullet tools w/o budgeting/planning integration.
Instead becoming "the place to look" it becomes "the Nth place to look" for alerts. Without thoughtful planning, its entirely possible to be in worse shape for having purchased it.
I used PagerDuty for more than a decade at my previous job. I didn't care much for the UI. But you know why PagerDuty does so well? Basically bulletproof reliability. 99% uptime won't cut it. 99.9% uptime won't cut it. You need to be as close as possible to 100% uptime, no excuses. Pagerduty isn't perfect, but it was one of the most reliable services we ever used.
I sincerely wish you luck with allquiet. I just want to make very sure you are aware why people still pay for Pagerduty. To compete, you need to be looking at 99.99% uptime or better (ideally 99.999%, 5 minutes of downtime a year) where 'uptime' is defined as the ability to exercise the entire notification stack. The moment someone's site has an outage and you aren't able to deliver the notification, you lose the customer and everyone they talk to.
I also worry about in-app notifications, but that's well-covered by everyone else's comments.
Pagerduty is vulnerable. Their UI is garbage. But you need to have bullet-proof uptime to take them down. It's a tough challenge and I wish you luck!
Anyone who has done real engineering would realize that this problem a consequence of a design flaw with PagerDuty (or other alternatives with a similar API, where alerting is only triggered directly by a webhook).
If your design requires that the alerting service can receive a one-off affirmative "something's broken" packet, then yes, you are asking an inherently unreliable distributed system (i.e. the Internet!) to reliably deliver a critical message at a time when you know something is broken. Good luck. :)
Instead, if you use something like a periodic heartbeat (also known as a dead man's switch, inbound liveness monitor, or outbound HTTP probe -- all of which we support at Heii On-Call https://heiioncall.com/ out of the box), you can tolerate some occasional lost messages, regardless of whose end they are on.
Real reliable systems (for example, embedded systems) use periodic heartbeats and watchdogs, and are usually designed to be lenient to the occasional missed heartbeat. If the system being monitored is truly down, then enough consecutive heartbeats will be missed that some threshold is reached and the on-call person can be alerted (or a watchdog timer can reboot a system, etc).
Also, the system at google is not in the path of the first page (that is direct from the alert infra), the more complex system is only needed for escalation.
> But you know why PagerDuty does so well? Basically bulletproof reliability. 99% uptime won't cut it. 99.9% uptime won't cut it. You need to be as close as possible to 100% uptime, no excuses.
Ah, but as always, the million dollar question is ..... What does the PagerDuty small print say ?
My guess is the small print is not "bulletproof reliability" or five-nines. I betcha their contract is full of exclusions and get-outs.
It's a bit like the famous Verizon 100% SLA. Any idiot knows its not technically feasible, but the reason you pay the Verizon-tax is so the've got some cash to pay you the inevitable SLA claims.
We use PagerDuty, but we also have an internal email/SMS based "paging backup" service and I have seen it in action 4 or 5 times in 7 years. PagerDuty isn't bulletproof.
That's a very helpful comment. Thanks. I might consider adding SMS / Calls to the notification channels. I didn't look from the perspective that Folks described here.
Regarding Uptime 99.99%. That's very true and also a good hint.
My current stack is very robust - tried and tested in many other products that I worked in.
Some other HN user also suggested to include an uptime status page to create this kind of confidence.
> My current stack is very robust - tried and tested in many other products that I worked in.
For uptime you also need to consider the availability of your hosting provider, you might even have to have a fallback installation at a different provider, something along these lines.
The main problem that Pagerduty solved at the time was deliverability of alerts. It was hard, and still is, to solve the problem of making sure the right people actually get the alerts. It was so expensive because they have redundant hardware all over the place with different providers, and they ran it all themselves, including SMS and phone gateways.
One of their key differentiators was that they were not built on AWS, so when AWS had an outage, you still knew about it. That also made it expensive.
With Pagerduty, you're mostly paying for reliability. The peace of mind knowing that someone will get notified when there is an outage.
This looks interesting, but from your page I'm not sure how you're better than Pagerduty.
App only notifications looks like a disadvantage to me. What if push notifications are down? What if I'm on DND?
Where is your infrastructure built? Is it on a cloud provider? What happens if that provider has an outage? If you want to build a PD competitor you have to build it on your own hardware in multiple datacenters owned by different people with different interconnects. If you haven't done that how will you stay up when your customer's provider goes down so that they know about it?
FYI, PagerDuty is built on top of AWS. They used to do some multi-cloud stuff, but no longer the case (too expensive, too complex, causing more issues than it solved). Source: worked there for 2 years.
Alerting or control plane or both? If alerting is AWS only, I'm very sad. What happens if there is a global AWS outage? Never been one yet, but never say never.
If your service needs to survive a global aws outage, you just can't run with any saas. So many of these companies are single regioned in AWS. Auth0, Okta, Datadog, many others put customers in a regional box, and if that region goes down, all of those customers go down.
> That's why All Quiet notifications are app only.
This is prone to loss. I've struggled with notification delivery reliability on Android for years. If I don't touch my phone for a few hours and then wake the screen, I get a flurry of notifications all at once from the last few hours. But I need to be aware of pages in real-time.
There should really be a call and text notification channel else I don't see this getting much traction over PD.
I did not yet had a problem with notifications in Android but I also see app-only as a disadvantage. What if I have no internet but could be reachable via SMS/phone? (still not that rare in Germany ;))
btw: there is an helpful Android app "missed notifications reminder" in fdroid store that re-triggers app-specific notifications if you do not view them within a minute.
Grafana OnCall is an OSS alternative (with a cloud offering) that works great out of the box if you are using Grafana/Grafana Alerting for monitoring your systems and want to have a pager-like system with phone/SMS/telegram integrations + it's own app. Best of all, it's self-hostable as well, which keeps me completely in control of my infra.
Frustrated by expensive per-seat pricing and unfriendly contract terms, we looked for PagerDuty alternatives and found a fantastic open-source project called Uptime Kuma: https://github.com/louislam/uptime-kuma
Uptime Kuma is one of the few open-source projects that feels like a commercial product: polished user experience, frequent release cadence, and a rich set of features including monitoring PostgreSQL servers, Docker containers, and so much more. Its list of supported notification services is so long that I don't even recognize half of the options available. Truly impressive.
Side note and shameless plug… We love Uptime Kuma so much that we made it one of the cornerstone applications provided by Fortressa, which we think of as the “App Store for Open Source”: https://fortressa.com/
I used to work at Opsgenie, which is one of the main competitors of Pagerduty, wish you good luck
Sms is a very good fallback channel for push notifications, especially when you don't have good internet coverage for an unexpected reason. I also personally hate phone calls, but I find it more effective to wake people up at night.
Can it have a calmer way to deliver notifications? The PagerDuty app goes directly from zero to klaxon, and the klaxons are horrible sounds from a bizarrely limited menu.
Ideally there would be a configurable escalation-to-the-same-user policy. I might want: vibrate-only notification, then normal notification, then critical notification, then phone call, in that order, with configurable delays. Ideally this would interact well with focus/sleep mode, so I could get a calm critical notification before full klaxons.
Other feature request: a way to tell the app to shut up already. I’ve occasionally dealt with an issue causing notifications every minute or so. The last thing I need while fixing it is more notifications to ack or silence.
I have absolutely no idea why PD doesn't have a big button which stops all PD notifications the moment you press it. There are very few events in life more stressful then dealing with an incident, people on the call and having your phone alerting and ringing all the frigging time. I was once told to mute myself on incident bridge because all that alerting was not only super iritating for me, but for others too. I was supposed to fix the problem but I was so distracted with alerts I wanted to snap the phone in two and punch a random PD manager in the face.
"Yes PagerDuty, I now our infra is melting, please shut the fuck up and let me do my job."
PD does have a feature IIRC "intelligent alert grouping" which will group new alerts and not display more of those. So it makes it relatively easy to snooze group of alerts for X period of time so it makes it much quieter. Unfortunately, this is locked as a feature paid extra. But still, they need a way to tell PD to stop alerting.
With All Quiet you can acknowledge an incident. It will then stop pinging you and your colleagues. Also incidents and their triggers are idempotent, so All Quiet won't create new incidents when they're triggered several times by your monitoring solution.
It might be plan dependent(?) but everything you’re describing is configurable on your PagerDuty profile. I have it configured to send me a notification immediately followed by a call if I don’t acknowledge it within a minute (yes, this is aggressive but I’m in a high urgency, quick response time SLA rotation)
Sure, but this doesn’t help. The “Push” notification type goes straight to Klaxons. I have hacked around it by delaying Push until after SMS, but that’s, at best, a kludge. There should be multiple classes of push.
> Don't lose alerts. We believe that crucial incidents need to have a dedicated context. Alerts sent out through Slack or Email can often be overlooked in those channels. That's why All Quiet notifications are app only.
Question - How does your app ensure incident notifications are received? I haven't used PagerDuty before, but for others I've used, we got a phone call for alerts and sometimes texts for warnings.
Recently, I've implemented Android notifications for an app. Even if you set "priority" : "high" (FCM level), some will still not be received right away, depending on battery level and polling frequency at device level. You may ask the user to disable battery optimizations for your app, but a call is still more reliable IMO.
I'm also not sure about why multiple notification channels can't be used. If there's an alert I'd like to be notified by all means possible (depending on severity)
This definitely seems like something that's more about development priority that they're trying to sell as a feature....
Realistically, since they just autoescalate if the incident isn't acknowledged it appears, one would think any/all notification methods would be a positive. Let the user (or their manager...) sort out what method(s) works best for them.
I cannot ensure that notifications are received. But neither can you with Email or SMS. Technically a Call might be more reliable, but still, your phone could be turned off or might not even have reception for a call.
Furthermore if a call can't be placed the originating system is aware of this immediately, and can proceed accordingly (e.g. move to the next person in line). This isn't the case for any other notification method, which requires the (absense of) an explicit ack by the user to infer that the notification failed.
Pagerduty handles this very well. If you have the app it will first try to notify through the app. If there is no ack within a couple of minutes it calls you.
If you persevere with this project you'll quickly realize you need phone calls.
Feature request: "Quiet wakeup mode" that vibrates the user's Apple Watch for a minute before making any sound. Then start the sound at the lowest volume (inaudible) and slowly increase it over 1 minute. This would be supremely useful for being oncall while sleeping near another person.
Feature request 2: "Earplug mode" that slowly increases volume and vibration from zero to max, and stays at max for 1 minute. Support alternating various tones, sounds, and voices.
If you ever start a business selling anything at all, the very first thing you will need to realize is that you never base prices on how much something costs.
You base prices on what people are willing to pay.
Like most business advice this sounds good as a one-liner, but isn't super accurate. There are lots of businesses that need to evaluate price elasticity and substitution which is more than finding the right point on the demand curve.
There’s a big gap between $5 and $21… for now. That I have no doubt they’ll spend all their time building features so they can increase prices to close that gap.
A big argument against LVT that people (often landlords, but many "socialists" too) is that rents will simply rise to reflect the extra cost to the landlord
Same with increase interest rates
It's certainly widely believed that "price = cost + %profit" rather than "price = level at which maximum profit will be achieved after factoring in market segregation"
I mean, LVT or not, that is probably true. If the price increase affects the entire market, landlords will exit the market until equilibrium is restored (or renters will increase their willingness to pay a higher rate).
If market price < cost + desired margin, some sellers will exit, reducing supply and increasing market price.
LVT also only applies to commodities, of which software isn’t really.
> landlords will exit the market until equilibrium is restored
Thus a increasing the supply of properties being sold, and thus lowering the price for those who want to buy, allowing more to buy, reducing the demand for places to rent. Win-Win.
With LVT (or higher interest rate on a variable rate mortgage) the parasitical landlord can't simply stop having a tenant, as they're still liable for the costs. They have to offer something more than the intrinsic cost of land they occupy to make it productive enough for someone to pay them.
> LVT also only applies to commodities, of which software isn’t really.
There's a strong argument that copyright (specifically that used to prevent people from using it) would count as "Land" in the economic sense - it drives rentseeking and monopoly behavior - https://progressandpoverty.substack.com/p/possibility-space-...
Keeping the "I want it for 5 because my indie project makes no money" people happy is less lucrative than the "We are paying $1000/person in on call allowances, so what is $21 - nothing!" people.
My employer sure does. Unless you're telling me that devs in major US cities are actually 2x better than rural US coders, who in turn are multiple times better than devs in poorer countries.
Your employer couldn't care less about your costs of living. He is a buyer and will gladly pay exactly zero dollars for your services. Unfortunately for him, that's not something you and your friends will accept. The price is hiked until both parties can live with it.
People in different situations accept different levels of compensation. Cost of living only becomes a significant factor under competitive pressure. If you were the sole programmer for the region you would be paid millions. I'm sure you personally, being an upstanding citizen, would lower your fee to match your costs of living as to maximize your clients' profitability. But you are rare. Most will accept the highest bid.
All I'm saying is that nothing is fundamentally based on "cost" unless under pressure. If it earns you say 500 bucks, you take the 20 dollar hit. It doesn't matter to you if it costs them 1 cent to produce. Until a competitor comes and drops the prices.
Also note that this "value"-thing is complex. Programmers that live close by and are sufficiently adjusted to the culture of the company have more value than cheap, capable, but remote workers that don't "get" you. Rural programmmers are indeed no less capable than devs in major cities, but it never was just about being capable.
This is an easy take to have on the labor market when you have bargaining power. The fact that highly paid professionals buy so readily into the moneyed class discourse disturbs me.
Here's hoping you don't get displaced by automation and suddenly discover yourself providing zero marginal gains to them. I mean, would you then sit quietly and starve?
Yes you are missing the fact that work units are not fungible and it is not trivial for a company to calculate the marginal value of employees.
The discourse of "we are paid what we are worth in the labor market" is deeply flawed due to these reasons. So when people buy into that discourse, which advances the capital owning class interests (take what we give you that's your market value) it's both sad and harmful.
Collective bargaining and regulations play a big role in compensation levels and often political capital counts for more then some abstract and impossible to calculate market value of your skill set or work output. So please disabuse yourself of the notion that you have a fair market value as a worker and that you are paid that value. This is pure political fiction.
You've reversed the relationship. As the employee, you're the seller. You're selling your time. It seems that you've priced yourself based on cost of living, because that is what the market will bear.
Yup I went from being ambivalent about PagerDuty to actively hating them. They swindle you, make it hard to cancel or downgrade, and that seems to be their way of doing business.
They’ve adopted terrible dark patterns with account management and even their service tiers. Prices keep going up; this is an excellent time for competition.
Would love to see more disruption in this space. Like others have said it's actually amazing how bad the calendar/scheduling portion of PagerDuty is. Multiple places I've worked we ended up making our own interface to managing schedules because the UI at PagerDuty is so bad. Also, people mentioning the rock solid reliability of PagerDuty likely don't have good monitoring of PagerDuty. When I worked on monitoring in the past we regularly encountered delayed notifications and ingestion failures. Pulling up their status page right now confirms they still have multiple issues a month. At a past employer we had our notification system actually check outstanding alert status and fall back to direct SMS when the ingestion latency was too high.
I worked on pygerduty, a python client library for PagerDuty, at a past job so I spent a lot of time talking to people at PagerDuty but I think they stopped inviting me to their "Founder's Club" meetings when I told them I didn't care about AI Alerts and wanted better UIs for scheduling and better reliability.
At $FANNG, we have a surprisingly complex tool to handle scheduling oncall and I don't I have seen a public equivalent so there is probably an opportunity there. Ideally you to optimize around a number of factors - vacations (which can be auto populated) for one but also fairness, shift closeness (some labor laws here), holidays, or individual preferences (e.g I like to hike on the weekends). You then also need a tool for short-term trades/overrides to basically poll the team to see who can take/trade the shift to avoid the diffusion of responsibility of yelling into a slack chennel and complexity of negotiating trades. What is nice about this is that we can setup a daily oncall so if you have a re-occuring weekly obligation that is incompatible with being oncall we can set it up so you never get scheduled for that day (provided there is enough slack for vacations and so on).
When I used one of these (we used VictorOps^) what it was really missing was 'fairness' as you say in the 'takes/trades'. We were rarely called and paid handsomely for it, so although nobody vocalised it, there was definitely a rush to grab it (because that was the 'system') if someone announced they had to put one up for grabs (due to holiday clash or whatever).
Really I think that should have automatically de-allocated some future rotation of the taker's, and definitely would've helped if it hooked into the holiday calendaring system, so anything booked far enough in advance was already avoided.
(^Not to slam VictorOps specifically, for all I know we just didn't have it configured well and it is capable of all that.)
Bravo. PagerDuty is long overdue to be replaced by a better service. Their pricing is exorbitant, and their sales / account team forces you into year long contracts to buy seats you don't need. We are currently being threatened with an $8k bill because they claim we didn't "cancel renewal" in time.
You mention price as a motivation to build allquiet. I have been happy the last couple of years with https://simplepush.io/ which is 12.49$ per year (and similarly app only).
> That's why All Quiet notifications are app only.
Does this mean that there are no text/call options? I don't have a smartphone to install apps, but haven't had problems with being on on-call rotations with some of the other tools mentioned.
This a deal breaker for me. I understand your intention here and from a pure alerting perspective you're correct, but I have use cases for other forms of notification.
dmattia noted the most important one. It’s a low probability situation but it does occur, and in some cases people reasonably object to installing an app because of a work requirement.
I also strongly prefer multiple notification channels making noise on my phone at night. A page usually wakes me up. A page plus a phone ringing always does, so far.
What are you supposed to do exactly when you are in a region with spotty internet and you get a page? Jump on an incident bridge, pull up the logs and dashboards, and start debugging? If are you going to somewhere with spotty internet, presumably it didn't just happen all of a sudden and you knew in advance, and thus could have swapped on-call duty with someone else who would have reliable internet?
I guess I just can't work for any company using All Quiet then, which is fair as I understand both my perspective and yours. But just adding this comment as a data point to consider, and I wish you the best.
Is there a notification system that tries to be a little smarter about escalations? That seems a difficult area. A developer doesn’t know - should this be escalated, or should it not? Is it sufficiently impactful, or not? The same questions arise in customer success who generally error on the side of raising the escalation level at the expense of rest. This seems like an area that could be vastly improved with statistics (something like Robust Decisions) or machine learning. I haven’t seen that, and wonder if others have?
I would recommend running through your docs with a native english speaker.
Congratulations on the launch! Be proud of what you've put together here, and looks to be all by yourself too? The infra for a site, backend, and polished mobile apps for both platforms on your own.
From another "I'm doing it all myself" dude, very well done!
I worked for a company that used PagerDuty, but eventually moved to a service that provided actual physical pagers. Well, they looked like 1990s pagers, but connected to Wifi. Really cool. Not sure why more don't do this.
Will you be using PagerDuty or another competitor to monitor your own service? I always wonder what these companies use... or if they rely on their own service. And if so, then if their service breaks how would they know?
I am so glad someone is doing this. I’ve been using PD for 10 years and still can’t figure out the UI. Things that should be dead simple (IMO) like adjusting schedules requires a separate spreadsheet to figure out.
Main difference is:
- Escalation level management is super easy: drag & drop. No cumbersome management.
- All Quiet tries to empower teams trying to strengthen thier self-organization. I believe that great teams have an inert motivation to take responsibilit yon their software.
- All Quiet focuses on an App Only notification approach by design
PagerDuty Calendar/Holiday/Workday management and cross-regional scheduling were very poorly implemented.
The amount of manual schedule adjustment when someone actually wanted to take their 1 week vacation was insane. We ended up with overrides on top of overrides. I'd imagine some of the common tools like Workday should give you integrations for that.
Taking a holiday became like a 12 step program - email boss in advance, put in HR system request, cross check your PagerDuty schedule, negotiate a trade with another teammate for PagerDuty rotation, decline meetings, update your outlook calendar out-of-office, set your slack status & notifications, and re-remind everyone the week before you go. It's almost like they wanted it to be easier to just not take time off?
The other problem with these tools is you only get value out of them depending on the level of systems integration you spend your time on. If even a single system is not integrated & still sends emails/slacks/only updates a dashboard .. then PagerDuty is simply yet-another-tool to monitor, rather than the single pane of glass.