> Monitoring
> Cloudflare has implemented a fix for this issue and is currently monitoring the results.
> Posted 1 minute ago. Dec 16, 2021 - 20:44 UTC
Fast feedback, communication and fix. Always impressed with them...
Hmm #1 was fast so network is routing OK. Hmm #2 was fast so TCP is OK. Hmm #3 was fast so I know (because I worked on that code) that this code path is good. Hmm #4 is slow so that means component X is slow but still working.
Of course, in parallel I'm in a conference call with about 40 other people who have actual access to monitoring and systems and other things who can see exactly where things are.
But I was damn close with four commands and gave me confidence in what people were saying. But, I have to say, Cloudflare's internal distributed tracing system is pretty cool because I got sent a trace and you could see right where the slowdown was.
Second, "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith." (also in https://news.ycombinator.com/newsguidelines.html)
Third, punishing users for discussing something work-related is one of the worst things you can do to damage this place. We want knowledgeable comments, and for most people, our work is what we know the most about. Disincentivizing that is a losing proposition (https://hn.algolia.com/?query=disincent%20by:dang&dateRange=...) so please don't.
I'd suggest you are doing far more damage to the reputation of the board by conflating commentary with free marketing. The comment I replied to was the latter. If advertising is permitted where critique from actual paying users is openly punished, what is that saying about the board?
As for the extremely disappointing asshole comment, coming from a moderator is quite frankly astounding. "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith".
Suggest subscribing on https://www.cloudflarestatus.com/ as that was updated long before I came to HN to say anything (which was only after the incident was resolved).
That rule is great for one single person (a pilot). In corporate incident management we can do better, and not blocking communication because you're overwhelmed with keeping the system up; in fact proper incident management workflows designate different people with different roles and delegates communication duties to a dedicated person if possible.
> That rule is great for one single person (a pilot). In corporate incident management we can do better, and not blocking communication because you're overwhelmed with keeping the system up; in fact proper incident management workflows designate different people with different roles and delegates communication duties to a dedicated person if possible.
I agree with you, but Cloudflare seem to have done exactly what you're describing in this instance?
Thanks for the post. Yesterday you strongly rebutted me for saying that the widespread outages also were impacting Cloudflare [0], even tho it wasn't obvious to me at that time who was being impacted by who. Knowing the intricate connections of all the app and infrastructure and cloud providers is tricky! When stuff goes down, the blame gets spread around.
Yes text is indeed devoid of context and sentiment. But to the main point, I fear that the comments on tone are a bit beside the point anyways. So let's move past that to talk about what I really wanted to comment on.
To be direct - yesterday I spotted what seemed like an Internet-wide issue that was also impacting Cloudflare. You told me yesterday that no in fact, there was no impact on Cloudflare. Today there is a post about a separate issue where there is an impact on Cloudflare. In my mind I make the connection between these two events, and on the one hand the quick and direct denial of the issue being that of Cloudflare on the first day, but today an acknowldgement of issues, even if they were a different set of problems.
It would be helpful on outages where Cloudflare is showing an outage when the problem doesn't originate with Cloudflare to put on your own error page an indication of where the error might be. I know this might be touchy to do so, but you should feel free to point fingers when you know that an outage to your client is caused by another party.
For example:
"Error. Cloudflare reports this site is down. Issues point to an outage with [AWS, Google, Azure, Oracle <-- just kidding] as being the source of that outage"
That would help make it clear that yes, there is an outage, and no, Cloudlfare is not the proximal cause.
All this chatter about your use of words and my use of words kinda misses the main point of what I was trying to communicate.
In my experience, a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational. Euphemism and indirect implication is the norm in American communication, much to my dismay.
This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.
I'm also sometimes surprised by how effectively a simple statement like "I don't want to spend money on that" can shut down even a pushy salesman. Or even the simple "No." can work wonders.
>This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.
I've never seen any comments regarding a single period, but I've seen comments (and sometimes agree with them) regarding the perceived rudeness when ending messages in ellipses.
"Good job..." seems almost sarcastic compared to "Good job.".
Interesting: IME, it's the Americans who are called rude and overly direct. Go to Japan and give it a try, for example.
Edit: Reading your link: First, that's well-written and insightful; thank you.
However, it seems like a common (young, if I dare guess) frustration with human communication, especially among geeks (if I dare guess, here on HN, and including myself as one): Communication is not transmission of information, but a social interaction. You have to think about all these other things (where many geeks feel out of their depth), and in fact those other things are more consequential than the information (with which many geeks feel very confident). In other words, it sucks to have all the information, to be a master at it, and find that it doesn't matter so much.
Tip: Don't try to dismiss it; it's human nature and won't change; learn the skills. 'Skill' #1: learn to not objectify the other party (they aren't an endpoint device in your communication network), and the best tools for that: curiosity about them - about their unique universe in their mind, their own wants and perspectives, completely unrelated to yours - and compassion: they have a difficult life too. (Of course, that's just my perspective! :) )
> a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational.
I think this can be said of the British too. Though we would probably make the mistake of interpreting it as rude rather than aggressive. As someone who doesn't communicate particularly directly, I often make this mistake myself.
Though I'm not sure which side of "the pond" is worse in this respect.
It's possible it's cultural. Sometimes strong, absolute rebuttals come across as someone just trying to shut down a conversation and deny. Other times a direct answer is just a direct answer. The problem is that context and tone are helpful here.
as a native, I can say this is absolutely true and also horrible for productive communication. particularly the pacific and mountain west is really rough.
Especially when discussing politics it can be confusing as hell trying to figure out what somebody really believes/wants because the tip toeing around egg shells can make the words impossible to decode.
You're engaging in it yourself right now by using euphemism/implication to suggest a certain type of person instead of just saying "whiny, overly sensitive children" (which I assume is what you mean).
You should remove the plank from your own eye, first.
You don't get to blame others for your voluntarily chosen courses of action.
All my sites running through Cloudflare Tunnel are very very slow, but still just about online.
As a side note, I'll take this opportunity call out the superb Checkmk monitoring system which alerted me to this. I don't see Checkmk mentioned on HN that often...
CheckMK offers pretty solid monitoring our of the box, but I think it falls quite short of the mark when you want to add more than the default monitoring. It's still probably the easiest solution for monitoring a bunch of basic VMs, servers, etc. You can set it up and get a solid idea of how it works in only a few hours.
I don’t think I’d use it even if they have a self-hosted version. The way they set their pricing based on “number of services” seems like the kind of tactic you use when you want to intentionally make things confusing so you can extract as much value as possible.
What happened to honest businesses with fair, easy to understand pricing?
> What happened to honest businesses with fair, easy to understand pricing?
Well in my case the pricing is very easy to understand. It's free!
I only have < 25 hosts so I self-host the open source version on a $5/month DigitalOcean instance (ironically also reverse proxied through Cloudflare)
So I certainly don't think that's exactly dishonest or unfair. It's been rock solid since I've used it. I don't know how many services you'd need to monitor but the starting prices for Standard and Enterprise seem pretty reasonable to me?
It probably doesn't scale to a very large operation - but then it's not really "cloud first" monitoring akin to something like Prometheus, so perhaps their target audience isn't really likely to have a huge number of services to monitor.
It's part of our process. Here's the internal timeline
T+0 Automatic comms thread created
T+1 XXX Is this a P0, do we need a status page?
@YYY
T+1 YYY Eyes on
T+4 ZZZ Yes
let's get super-generic status page up
@XXX / @YYY - you have one handy?
I see it now thx
I would love if their customer service was so fast. They keep ghosting us for 7 days. Poking the on a live chat results in "hey we'll look at this".
Truly loving the service but we had to "unproxy" our website. When it works, it brings so much value. I'm guessing our issue isn't trivial to solve though.
Fast feedback, communication and fix. Always impressed with them...