Status pages have been the way they have seen the mid 2010s. There are few new ones but they are paid. So I decided to build this using svelte + sveltekit. It has all the necessary features. Few are yet to be built. Do check it out
In my view, a status page should have only one function: communicate to your users if your service is up, how long has it been down, and which parts are down and maybe list work that you do fix the issues. Updating it should be done, automatically and as simple as possible, as part of incident response process.
A status page should not replace your internal monitoring, so including "batteries" is both not necessary and a bad idea - because of the next point.
A status page should not have dependencies, and if it does, they should have higher availability than your service. Otherwise, you need a status page for your status page. Node.js sounds like a liability in this case.
> A status page should not have dependencies, and if it does, they should have higher availability than your service.
IMHO this is often unnecessary. The critical thing is for the failure modes of your status page to be uncorrelated with the failure modes of your service, so that you're unlikely to break both at the same time. But you might have, eg, some public API with a 99.995% availability target, and a status page with a 99.95% target - it of course depends on your situation but those numbers wouldn't strike me as being intrinsically wrong as long as the status page is properly independent of your service.
I don't really know what people expect with status pages. Having it change automatically based on metrics can result in inaccurate status. Having it behind a manual gate can be inaccurate since that takes time with approvals and such.
So, what exactly is the expectation and how can you implement a perfect status system?
> Having it change automatically based on metrics can result in inaccurate status.
A status pages job is to inform users about potential issues. A user will seek out the status page specifically if they currently see issues on their end, but usually won't if they don't. Therefore underreporting is a huge issue, because you essentially tell your users that the issue must be on their end even if it is not, but overreporting issues when there are none hurts no one and the chances are high that no user even sees it.
Completely agree. I think the unofficial Steam Status[0] by xPaw[1] is a great example. I never go to the site unless I'm having issues. Between the service stats and the page views section it is really easy to confirm my suspicion that something is on fire at Valve. If it wasn't for this post I wouldn't have known that they had a minor connection issue a few hours ago.
This is actually a fair point. The batteries in this case are nice to haves, but to be fair I personally would not be using this as infrastructure frontline status reporting.. more so for a client/customer facing status feedback.
that depends on what types of dependencies you're talking about
if you're talking about upstream servers/services, yep absolutely
but node.js dependencies (as in, libraries and packages), don't magically update by themselves. there's no reason node.js is a liability here unless you coordinate updating your service and status page dependencies at exactly the same time (which seems.. idiotic?)
Interesting. May be I can add a gradient from red -> yellow -> green with effected minute count, normalized to gradient threshold percentage. Let me try doing that
I think it's worth deciding what you want to communicate. If it's "we have good uptime, but we may lie", then it's a green bar always. If it's "we're honest about our downtime" then the colors should be distinct so users can notice and inspect the days that are <100%. A gradient runs the risk of looking "all green" even when some downtime minimal happened. If that's not the goal, then I'd recommend a step function (green, yellow, red, for example).
A tangent: This style of status page and the goal that most of these projects (like this) try to achieve is always displaying the uptime of a service based on some HTTP request. In how many cases is this actually the whole story? GitHub can serve perfectly fine HTTP content, it's still broken without git via ssh. Amazon can have their site up but if payments aren't being processed, that's worth nothing.
This is not a complaint against this specific project but am I the only one that feels that this style of status page that "has been around since the mid 2010s" rarely ever tells the full story of a service's health?
Well of course, there can still be bugs in the code even if 200 is returned.
I always implement a /~/healthcheck route which will return the exit code of each check, but also encodes it in the HTTP STATUS.
if any error is detected the status will be 500
if any warning is detected it will be 200+<numberOfWarnings>
of course checks still need to be written, e.g. a code needs to verify that it can connect to ssh and is greeded with the correct login msg.
still there could be problems for outcomming connections.
And I'd disagree that "batteries included" is a phrase used to describe "features"?
I would interpret "batteries included" to mean "you don't need to worry about shaving a yak to get this installed, it's all there and ready". Language is fun!
I'm pretty sure this phrase in tech circles was popularized by Python's moto [1], which meant to say that language ships with many features you'd have to get on your own otherwise, so to me the phrasing here was clear - you will likely not need to pull in external deps to make it useable.
One thing I would like to see in the README is how this project differs from other popular similar projects such as "Upptime" which is already mentioned in the "Inspired from" section.
I think the incident.svelte file could use some love, is it best practice to put part of the phrase somewhere else? Doesn't it increase the cognitive stress? Like there is somewhere a phrase, but part of the text is being calculated https://github.com/rajnandan1/kener/blob/74ea57d6bbf6ac4dd3e...
Isn't it easier to understand what is going on just by calculating the condition on the top and put the text on the markup based on that condition?
I feel like there are few places where in order to don't duplicate part of the text it's being made extremely difficult what the text is going to be by putting it far away
I might be mistaken, but I always thought that status pages are supposed to be hosted by a third party.
Am I supposed to have this run on a separate (presumably) dedicated server? Otherwise what's the point of having this running if it becomes inaccessible the instant my server goes down?
It entirely depends on your reason and expectation of the service. Yes, it would be best to separately host it on entirely different infra. However, if you have 1 x server, and you want to check that the services it is running are functional (such as a web server, app and database are functional), then co-locating this on there would be ok, providing the server itself doesn't go AWOL.
However, running this on a dedicated virtual server or raspberry pi would likely be always better.
There's an "awesome status pages" list GitHub repo that lists a bunch of these. One thing I was wondering if anyone has a manual one? I could use one of these but I don't want any automation. Just a manual status page updater. Could even be a static site generator with helper scripts.
Just make sure to host it on some other hosting provider. If you're on AWS, use GCP or Azure, etc..
A previous DevOps team I worked with brilliantly centralized everything into K8s, including the status page software. Then pushed a K8s update that broke everything...
This is a common mistake until the first incident. Even AWS did it. Static files in their status page were hosted on S3 and when they had a fat-finger issue, their status page went down as well.
Indeed. I was expecting one of those "server is a Raspberry Pi running on a solar panel in a cornfield" posts but this time it was hosting status pages.
It's a small project with a clear and limited scope which makes it a good "hello world" project for developers. Just like we have a good selection of static site generators and blog frameworks used by one person only, nothing wrong with that.
In the end what makes a status page successful isn't the code of the status page itself, but the reliability, the integration with existing tools (PagerDuty etc.) and all the checkboxes needed to sell to bigger companies.
Absolutely agree.
I would be adding webhook integration with custom data transformation for popular providers like Slack/PagerDuty/Discord etc
Thanks for pointing this out
A status page should not replace your internal monitoring, so including "batteries" is both not necessary and a bad idea - because of the next point.
A status page should not have dependencies, and if it does, they should have higher availability than your service. Otherwise, you need a status page for your status page. Node.js sounds like a liability in this case.