Hacker News new | past | comments | ask | show | jobs | submit login
Cloudflare Experiencing Latency Issues (cloudflarestatus.com)
151 points by z0a on Dec 16, 2021 | hide | past | favorite | 61 comments



> Monitoring > Cloudflare has implemented a fix for this issue and is currently monitoring the results. > Posted 1 minute ago. Dec 16, 2021 - 20:44 UTC

Fast feedback, communication and fix. Always impressed with them...


Should be cleared up now. Sorry about that.


I really like it when people in your position at such a big company post on here, even if it is a brief comment like this. Thank you!


I would have posted faster but I was too busy doing my own little bit of debugging which consisted of

    1. dig @1.1.1.1 jgc.org
    2. nc -v 104.22.11.223 80
    3. curl -v https://jgc.org/cdn-cgi/trace
    4. curl -v https://jgc.org/
Hmm #1 was fast so network is routing OK. Hmm #2 was fast so TCP is OK. Hmm #3 was fast so I know (because I worked on that code) that this code path is good. Hmm #4 is slow so that means component X is slow but still working.

Of course, in parallel I'm in a conference call with about 40 other people who have actual access to monitoring and systems and other things who can see exactly where things are.

But I was damn close with four commands and gave me confidence in what people were saying. But, I have to say, Cloudflare's internal distributed tracing system is pretty cool because I got sent a trace and you could see right where the slowdown was.


Now you're just showing off :)


Allow an old man the fantasy that he still knows how the whole of the system works.


That's a huge problem for companies of almost any scale. Can you shed some light on tools used internally in Cloudflare for tracing?


sounds like you do. ;)


What was component X? Was it a buggy rollout?


A proxy that serves traffic. No, it was way more complicated than “buggy rollout”.


you often do you code these days?


Hardly ever. If I start a project I don’t end up having time to finish or maintain it which isn’t fair to the team.

If I write something it’s for my own use. And I like to write things that test Cloudflare. Доверяй, но проверяй.


I see comrades you have on your team have taught you well /jk


[flagged]


First, please don't be an asshole and/or be snarky on HN. We don't want that here. (See https://news.ycombinator.com/newsguidelines.html.)

Second, "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith." (also in https://news.ycombinator.com/newsguidelines.html)

Third, punishing users for discussing something work-related is one of the worst things you can do to damage this place. We want knowledgeable comments, and for most people, our work is what we know the most about. Disincentivizing that is a losing proposition (https://hn.algolia.com/?query=disincent%20by:dang&dateRange=...) so please don't.


I'd suggest you are doing far more damage to the reputation of the board by conflating commentary with free marketing. The comment I replied to was the latter. If advertising is permitted where critique from actual paying users is openly punished, what is that saying about the board?

As for the extremely disappointing asshole comment, coming from a moderator is quite frankly astounding. "Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith".


We have whole teams of people to get status up and inform people and support them.


And yet I'm learning about this outage from HN. How did that happen?


Suggest subscribing on https://www.cloudflarestatus.com/ as that was updated long before I came to HN to say anything (which was only after the incident was resolved).


Maybe it happened because you're busy commenting on HN instead of checking their status page.


Aviate Navigate Communicate.

They did all three of those in the space of a few minutes (if you were checking their status page - which you can subscribe to by the way).


That rule is great for one single person (a pilot). In corporate incident management we can do better, and not blocking communication because you're overwhelmed with keeping the system up; in fact proper incident management workflows designate different people with different roles and delegates communication duties to a dedicated person if possible.


> That rule is great for one single person (a pilot). In corporate incident management we can do better, and not blocking communication because you're overwhelmed with keeping the system up; in fact proper incident management workflows designate different people with different roles and delegates communication duties to a dedicated person if possible.

I agree with you, but Cloudflare seem to have done exactly what you're describing in this instance?


Yes, my comment was about the Aviate Navigate Communicate rule, not about cloudflare's response


I'm always surprised you have the time to respond in these threads, jgrahamc.


Good to let people know what's going on if I have time.


Thanks for the post. Yesterday you strongly rebutted me for saying that the widespread outages also were impacting Cloudflare [0], even tho it wasn't obvious to me at that time who was being impacted by who. Knowing the intricate connections of all the app and infrastructure and cloud providers is tricky! When stuff goes down, the blame gets spread around.

[0] https://news.ycombinator.com/item?id=29568319

edit: changed "scolded" to "strongly rebutted"


Sorry if that came across harshly. ASCII is a tough medium.


Yes text is indeed devoid of context and sentiment. But to the main point, I fear that the comments on tone are a bit beside the point anyways. So let's move past that to talk about what I really wanted to comment on.

To be direct - yesterday I spotted what seemed like an Internet-wide issue that was also impacting Cloudflare. You told me yesterday that no in fact, there was no impact on Cloudflare. Today there is a post about a separate issue where there is an impact on Cloudflare. In my mind I make the connection between these two events, and on the one hand the quick and direct denial of the issue being that of Cloudflare on the first day, but today an acknowldgement of issues, even if they were a different set of problems.

It would be helpful on outages where Cloudflare is showing an outage when the problem doesn't originate with Cloudflare to put on your own error page an indication of where the error might be. I know this might be touchy to do so, but you should feel free to point fingers when you know that an outage to your client is caused by another party.

For example:

"Error. Cloudflare reports this site is down. Issues point to an outage with [AWS, Google, Azure, Oracle <-- just kidding] as being the source of that outage"

That would help make it clear that yes, there is an outage, and no, Cloudlfare is not the proximal cause.

All this chatter about your use of words and my use of words kinda misses the main point of what I was trying to communicate.


I think we may have done that in the past. I'll bring up internally.


Unless the comment was edited, I don't think that was a "scolding" :-)


Probably my interpretation. I should have said "strong rebuttal". Edited my comment above ;)


Maybe "corrected"?


Is there a particular reason you perceived their response as "scolding?" It just looks like a straightforward answer.


In my experience, a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational. Euphemism and indirect implication is the norm in American communication, much to my dismay.

It can wrap around to extremes sometimes, too.

https://sneak.berlin/20191201/american-communication/


This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.

I'm also sometimes surprised by how effectively a simple statement like "I don't want to spend money on that" can shut down even a pushy salesman. Or even the simple "No." can work wonders.


>This reminds me of the younger generation sometimes perceiving messages ending in a period as rude.

I've never seen any comments regarding a single period, but I've seen comments (and sometimes agree with them) regarding the perceived rudeness when ending messages in ellipses.

"Good job..." seems almost sarcastic compared to "Good job.".


    Good job.
seems almost patronizing in comparison with

    good job
Then again, I grew up on IRC.


Studies have shown the same.


Language is so weird.


Interesting: IME, it's the Americans who are called rude and overly direct. Go to Japan and give it a try, for example.

Edit: Reading your link: First, that's well-written and insightful; thank you.

However, it seems like a common (young, if I dare guess) frustration with human communication, especially among geeks (if I dare guess, here on HN, and including myself as one): Communication is not transmission of information, but a social interaction. You have to think about all these other things (where many geeks feel out of their depth), and in fact those other things are more consequential than the information (with which many geeks feel very confident). In other words, it sucks to have all the information, to be a master at it, and find that it doesn't matter so much.

Tip: Don't try to dismiss it; it's human nature and won't change; learn the skills. 'Skill' #1: learn to not objectify the other party (they aren't an endpoint device in your communication network), and the best tools for that: curiosity about them - about their unique universe in their mind, their own wants and perspectives, completely unrelated to yours - and compassion: they have a difficult life too. (Of course, that's just my perspective! :) )


> a lot of Americans interpret direct, no-wasted-word statements as aggressive or confrontational.

I think this can be said of the British too. Though we would probably make the mistake of interpreting it as rude rather than aggressive. As someone who doesn't communicate particularly directly, I often make this mistake myself.

Though I'm not sure which side of "the pond" is worse in this respect.


It's possible it's cultural. Sometimes strong, absolute rebuttals come across as someone just trying to shut down a conversation and deny. Other times a direct answer is just a direct answer. The problem is that context and tone are helpful here.


as a native, I can say this is absolutely true and also horrible for productive communication. particularly the pacific and mountain west is really rough.

Especially when discussing politics it can be confusing as hell trying to figure out what somebody really believes/wants because the tip toeing around egg shells can make the words impossible to decode.


[flagged]


> I blame the participation trophy era of kids.

As one Gen Y/Zer once commented: Those trophies weren't handing out themselves, and they weren't the kids' idea. Whose idea were they?


You're engaging in it yourself right now by using euphemism/implication to suggest a certain type of person instead of just saying "whiny, overly sensitive children" (which I assume is what you mean).

You should remove the plank from your own eye, first.

You don't get to blame others for your voluntarily chosen courses of action.


There is a difference between euphemism and idiom, nicknames and handwave identifiers.


All my sites running through Cloudflare Tunnel are very very slow, but still just about online.

As a side note, I'll take this opportunity call out the superb Checkmk monitoring system which alerted me to this. I don't see Checkmk mentioned on HN that often...

https://checkmk.com

EDIT: Seems to be fixed. Good job!


CheckMK offers pretty solid monitoring our of the box, but I think it falls quite short of the mark when you want to add more than the default monitoring. It's still probably the easiest solution for monitoring a bunch of basic VMs, servers, etc. You can set it up and get a solid idea of how it works in only a few hours.


I don’t think I’d use it even if they have a self-hosted version. The way they set their pricing based on “number of services” seems like the kind of tactic you use when you want to intentionally make things confusing so you can extract as much value as possible.

What happened to honest businesses with fair, easy to understand pricing?


> What happened to honest businesses with fair, easy to understand pricing?

Well in my case the pricing is very easy to understand. It's free!

I only have < 25 hosts so I self-host the open source version on a $5/month DigitalOcean instance (ironically also reverse proxied through Cloudflare)

So I certainly don't think that's exactly dishonest or unfair. It's been rock solid since I've used it. I don't know how many services you'd need to monitor but the starting prices for Standard and Enterprise seem pretty reasonable to me?

It probably doesn't scale to a very large operation - but then it's not really "cloud first" monitoring akin to something like Prometheus, so perhaps their target audience isn't really likely to have a huge number of services to monitor.


I'm impressed they've updated their status page so quickly, unlike some of their cloud competitors


It's part of our process. Here's the internal timeline

    T+0 Automatic comms thread created
    T+1 XXX Is this a P0, do we need a status page?
            @YYY
    T+1 YYY Eyes on
    T+4 ZZZ Yes
            let's get super-generic status page up
            @XXX / @YYY - you have one handy?
            I see it now thx


It's definitely a differentiator. cough


I would love if their customer service was so fast. They keep ghosting us for 7 days. Poking the on a live chat results in "hey we'll look at this".

Truly loving the service but we had to "unproxy" our website. When it works, it brings so much value. I'm guessing our issue isn't trivial to solve though.


Email me (jgc) the ticket number.


This impacted just about every API I regularly request for ~20 minutes, but it seems to be fixed now. Hopefully permanently.


Seems to be a regional thing because i'm not experiencing any issues with Cloudflare hosted things, reaching the SYD PoP


CircleCI still very slow/returning 504s, not sure if related?


Is this Telia (1299) related maybe ?


Yeah seems to apply to all the sites I've tried so far.


Maybe because they're powered by Clickhouse?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: