Hacker News new | past | comments | ask | show | jobs | submit login

Yes, not worldwide but a lot of places. Problem with our backbone. We know what. Rollbacks etc. happening. Bring it back up in chunks.

Should be back up everywhere.




We are not using cloud flare. But our domain is also not accessible. We are using digital ocean's DNS service for propagating our IP. Does the DigitalOcean's DNS service depend on Cloudflare service?



That link isn't accessible from where I am right now.

Alanis Morissette agrees that this is ironic.


Yes, this is actual irony, unlike “rain on your wedding day.”


but in regards to singing a whole song about things that are not irony being irony: isn't it ironic?

don't you think?


Yes! IRL Alanis is smart and eloquent, and those that think that song's lyrics are evidence to the contrary are missing the joke.


Moved all my domains to DO specifically to stop donating traffic data to Cloudflare. Absolutely stunned I didn't notice this earlier.

It's bullshit all the way down.

Are there any companies left offering free DNS usable from Terraform that aren't part of the "Internet Five Eyes"?

edit: looks like Linode may be the next best 'not terrible' option


I have bad news for you, Linode's authoritative DNS service also uses Cloudflare DNS Firewall.

  $ dig +short ns1.digitalocean.com aaaa
  2400:cb00:2049:1::adf5:3a33
  $ dig +short ns1.linode.com aaaa
  2400:cb00:2049:1::a29f:1a63


So much for whoever downthread said "not all websites use cloud flare"


Same here, except DNSimple.


Yes.


CF SRE team need to rethink their published SLA of 100%. This is not reasonable. https://www.cloudflare.com/business-sla/


An SLA of 100% simply means you agree to compensate your customers (as specified, usually with credit) if your service is down at all, nothing more.

Also, SRE here but not for Cloudflare -- I've never seen SREs directly involved in externally published SLAs, they usually come from legal. We deal with SLOs on more fine grained SLIs than overall uptime


> SLA […] SRE […] SLOs […] SLIs

I made it to SLA (which I believe stands for service level agreement). What do the other abbreviations stand for?


SRE - Site Reliability Engineer (a term Google came up with that's been adopted elsewhere) Google defined it approximately as what happens when you apply software engineering practices to what was traditionally an operations function.

SLO - Service Level Objective - the service level you strive for. If it's higher you have room for experimentation, etc.

SLI - Service Level Indicator - the actual metric(s) you use to measure a service level (latency, error rate, throughput, etc.)


SLA - correct. That’s the contract between the operator and the users which describes the penalties for not meeting agreed-upon SLO

SLO - service level objective, the stated availability (or latency or durability etc) of the service. Usually expressed as a value over a period of time (e.g 99.9% availability as measured over a moving 30 average). The SLO is measured by the SLI.

SLI - service level indicator. Simply, the direct measurement of the service (i.e metrics)

SRE - Site Reliability Engineer, usually a member of a team who is responsible for the continued availability of the service and the poor sap who gets paged when it breaches SLO or has an outage or other impactful event.


SRE: Site reliability engineer

SLI: Service level indicator (Metric to measure the health of a service. For example successful requests per interval / total requests per interval.)

SLO: Service level objective (what performance you expect eg. the previously mentioned SLI is >= 99.5%)

SLA: Servicelevel agreement (legal agreement that defines what happens if a SLO is not met)


Yep, I'd promise 99.95 at a stretch, never 100%.

They are not being honest with themselves here


I'm not sure you and your parent understand what an SLA means. It's an agreement that, when broken, incurs a penalty.

They aren't saying they guarantee 100% uptime. They're saying they'll pay you for any downtime. It's literally the 3rd paragraph:

> 1.2 Penalties. If the Service fails to meet the above service level, the Customer will receive a credit equal to the result of the Service Credit calculation in Section 6 of this SLA.

(Most people I know consider them meaningless marketing BS that's really just meant to trick people or satisfy some make-work checkbox)


> They aren't saying they guarantee 100% uptime

> Cloudflare ("Company") commits to provide a level of service for Business Customers demonstrating: [...] 100% Uptime. The Service will serve Customer Content 100% of the time without qualification.

This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime and defining penalties for failing to meet that guarantee. The fact that a penalty is defined does not stop it from being a guarantee.


No, this SLA is a legal commitment to give you credits when Service uptime falls below a certain threshold. The threshold could be anything - 99%, 50%, 100%, etc. Importantly, Cloudflare is not under a legal obligation to provide the Service at or above the agreed threshold, it's under a legal obligation to give you Credits when the Service uptime is below that threshold.

"Service Credits are Customer’s sole and exclusive remedy for any violation of this SLA."


> This is a legal commitment to provide 100% uptime. They are guaranteeing 100% uptime

I don't think you know what a guarantee is.

For example when you buy a new car you get a guarantee that it won't break down. Are they claiming it won't break down? No, of course not. What a guarantee means is that they'll fix it or compensate you if it does.


Looks like it supports parent opinion: commit - bind to a certain course if policy. It's legal obligation, not a statement about guarantees in physical world (like "this alloy won't melt below t°C")


I completely can understand your emotion. But even the top CDNs can have outages of some form or the other. If site uptime is important, check out https://www.cdnreserve.com/ - it's built on the design principle that the likelihood of two separate platforms having an outage at the same time is close to zero.


that just means they're willing to pay for the marketing number, not that they will actually achieve it


Thanks for being here with timely updates! I knew to come to Hacker News once the alert triggered and a few users started complaining.


Good thing HN doesn’t use them then!


Enterprise support has been useless as is the status page. Got more info here


The status page shows about as much information as the post here.


It didn’t at the time.

Their phone line kept cutting us off and then the people there were not too helpful.


Status page was down for me in Sydney


Just wanted to also reiterate how thankful I am that you took your time to let us know. It speaks volumes.


Agreed- couldn’t figure out what was going on… finally checked here and - ah now I can sleep


Cloudflare going down is one of the things which keeps me awake, My main complaint about Cloudflare is that they are very good at everything they offer that we've become reliant on them for everything.


Happens to everybody sometime. AWS seemed to have a major outage a couple of times a year for a while there.


Exactly but the likelihood of two networks going down at the same time is close to Zero. Check out: https://www.cdnreserve.com/ We rolled it out to complement top CDNs.


True, They're usually due to issues with BGP routes.

It's common to see CF being the DNS/CDN for applications across AWS, GCP, Azure etc. So perhaps CF being down affects more applications than individual cloud platforms?


"do no evil" springs to mind -- once burned, etc.


Yeah, What's up with the competition to Cloudflare? What's the real barrier for entry?

It's not infrastructure anymore, As there is a new PaaS startup every week offering distributed hosting and So why bundling in DNS, DDOS detection+mitigation, cloud workers... with it is so hard?


This is just my take, but Cloudflare looks to be building a "moat" to make entry hard. This is built around two things: 1. economies of scale, 2. a network effect.

-

https://en.wikipedia.org/wiki/Economies_of_scale

As Cloudflare gets bigger, they can provide services more cheaply. This is because (a) they can more fully utilise their data centres and other physical capital investments, (b) they can divide their fixed software costs over more users and (c) they get process efficiencies and discounts with scale.

A new entrant will struggle to match cost unless they're able to obtain similar scale. The bigger Cloudflare gets, the bigger the scale that a new entrant needs to hit before they can match them on cost.

-

https://en.wikipedia.org/wiki/Network_effect

Second they're aiming to build a network effect through having huge number of locations. The more locations, the more appealing to new customers as they can be close to more users. A competitor will have to build a similar number of locations to match Cloudflare's proposition.

A new entrant cannot provide as much value, and therefore cannot charge as high a price, without building a similar sized network. This again requires the entrant to invest heavily before they can charge a similar price.

-

The combination of these two things mean that when Cloudflare is operating at a large scale with a large network it can offer a more valuable service (and charge a higher price) than a new entrant, and earn more profit because it can operate at a lower cost.

Also, Cloudflare has the option of lowering its price and still being profitable due to lower costs at its scale, so it can deter entrants from trying to compete by the threat of being able to lower prices below what is profitable for new entrants.

The only players who can compete may be those who already have comparable size - Amazon, Google, Microsoft, Facebook, CDNs, etc, since they will already have addressed the issues of scale and network effects. However, they may not want to cannibalise their existing markets. It will be hard for other new entrants to compete.


There are many noteworthy players - Akamai, Fastly etc., and Edge plaoviders like ourselves (Zycada) who complement top CDNs like Akamai, Cloudflare, Fastly.


The main difference between Cloudflare and the others mentioned is the price; One can start with CF for a side project for free and continue to use it free till it becomes a viable startup.

Others at best offer a limited trial plan, But most are just 'Speak to expert/ Contact us' for pricing which means haggling with a sales rep while we can just build things. Even the paid plans of CF is reasonable when compared with others with better features.


It's hard at scale.


Isn't everything harder at scale? That's not a barrier for entry though.


Building a CDN absolutely is hard to do at scale.

You can't build a Cloudflare competitor in AWS/Azure/Linode/DO/etc. You need your own data centers. Multiple of them across the country, ideally around the world if you want to serve the whole world.

This is insanely hard.


for a global cdn... it quite literally is


Point taken, For a Global CDN - scale is the point of entry.


hats off to you sir - would not want to be in your shoes right now but thanks for the updates


I don't have shoes on.


That's the point, two people can't fit in them.


In tumultuous times (fix wasn't implemented at this point), the Cloudflare CTO still has time for some wit. Love it!


It's back wooo!

And I'm saying this for the last time: no one type google into google!


Thanks for the update, just curious if we will get a report on what happened ? In as much detail as can be of course - morbid curiosity mainly. I love the post reports these events usually bring.


Cloudflare are usually pretty good with posting post mortems. https://blog.cloudflare.com/tag/postmortem/



Thank you! This little comment just saved me an hour of investigation. Good luck for getting the system back up asap.


All my sites behind Cloudflare had come back up, and have now gone back to serving 500 errors.

The Cloudflare Dashboard is also no longer fully loading.


Where are you located?


Raleigh, NC, USA

Sites are gradually reappearing as I type this. Some of my sites, and doordash.com, were returning 500 errors again just a minute ago. They just came back up, followed by the CF dashboard loading again.


I'm from Turkey, and I also have been seeing intermittent errors for the last 10 minutes. Seems ok now.


Should probably do slower roulouts next time.


Thanks for letting us know


Thank you for your very fast support!


Which undersea cable was cut ;)


Can't really roll back that change


Sure you can, but physically rolling back more cable takes a bit longer :P


DR much ?


I don't know what that means.


DR means "disaster recovery," it is a formal plan used to respond to and mitigate potential risks to the business. Things like having a communications plan for an incident, or a backup office outside of your main office natural disaster zone.


Ah. Just one more reason I hate acronyms. They obscure what the person is trying to say.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: