This is one of those things I keep hearing but in my case I experienced the opposite. I have been using OVH for a long time, and their paid support is actually really good.
One of our projects used to get ddosed and as a small company the only reason we managed to survive was thanks to OVH. When the attackers kept changing their attacks, we were direct to talk to their engineers and they helped fix the issue, all we had to do was send a bunch of packet captures to them.
Over the years since then we had a few issues and every single time their customer support was really helpful.
I have used paid support from google, azure and amazon, but I have had a better customer service experience with OVH overall.
not really, from Octave’s twitter it seems that DDoS attempts have increased which is usual right before an IPO. They were installing new gear because of the higher DDoS.
Could you please elaborate on this? I’ve never heard of that being related to an IPO before, so I’m wondering if this is a standard practice for competitors to get known groups to do this sort of thing, or if this is an economy destabilization measure that’s in the playbook of some nation state actors. Any insight there?
Like other said. Its blackmail for money. Essentially "would be a shame if your website is down during your IPO, maybe you should pay us so we don't DDoS you to hell".
I imagine this is especially common here since one of the services offered by OVH is DDoS protection. Hence being down from a DDoS would be even worse press.
It's more likely to be a "DDoS until you pay a ransom" sort of deal where the attacker thinks the company will value uptime more highly during that period.
You've clearly never run a web service of any sort... you can expect DDoS in response to pretty much anything you do. Not seeing in increase in DDoS attacks before an IPO would be more concerning.
> A nation is a community of people with a common culture, language, ethnicity
Nitpick: The belief of the existence of the said common ground is actually more important than its actual existence. France is considered a nation state even though there are several ethno-linguistic groups in France. A few illustrations:
- less than half of the French population during the French revolution spoke French as their native language.[1]
- The second Nobel Prize of Literature from France (Frédéric Mistral) didn't wrote its work in French, but in Occitan[2].
- My family has lived in France for as long as genealogy could trace back, yet half of my great-grandparents only learnt French in school, they didn't spoke it at home.
[1]: I don't have an online source for that, nor the exact figure: I read it in Frernad Braudel's L’Identité de la France
Following a human error during the reconfiguration of the network in our DC in VH (US-EST), we had an issue on all the backbone. We're gonna isolate the DC VH then fix the configuration
Those last few days, DDoS attacks intensity has grown a lot. We decided to increase our DDoS handling capacity by adding new infrastructures in our DC VH (US-EST). A bad router configuration caused the network crash.
I can ping the status IP but no HTTPS available from it, so routing is working externally.
72 bytes from 5.135.138.70: icmp_seq=1 ttl=249 time=276 ms
72 bytes from 5.135.138.70: icmp_seq=2 ttl=249 time=1041 ms
72 bytes from 5.135.138.70: icmp_seq=3 ttl=249 time=313 ms
72 bytes from 5.135.138.70: icmp_seq=4 ttl=249 time=198 ms
72 bytes from 5.135.138.70: icmp_seq=5 ttl=249 time=226 ms
72 bytes from 5.135.138.70: icmp_seq=6 ttl=249 time=390 ms
72 bytes from 5.135.138.70: icmp_seq=7 ttl=249 time=865 ms
Exact same? I doubt it. The only similarity (that we know of) is that there was a router configuration change. In Facebook's case, there wasn't even supposed to be a configuration change, just a look at some data.
It looks like only IPv4 is affected. IPv6 is working fine to a couple of test endpoints.
3 2a02:c28:1:6506::106 2.131 ms 2.124 ms 2.117 ms
4 2a02:c28:11:6::100 15.695 ms 15.689 ms 15.682 ms
5 2a02:c28:1:1900::19 15.998 ms 15.991 ms 15.985 ms
6 2a02:c28:0:1819::18 15.259 ms 15.051 ms 16.975 ms
7 2a02:c28:0:1718::17 16.968 ms 16.403 ms 15.619 ms
8 2a02:c28:0:1731::31 15.406 ms 18.620 ms 18.554 ms
9 2001:7f8:4::3f94:2 12.025 ms * *
10 * * *
11 2001:41d0:aaaa:100::5 21.622 ms 38.813 ms 2001:41d0:aaaa:100::3 37.231 ms
12 * * *
13 * 2001:41d0::25f1 19.507 ms 2001:41d0::c68 20.892 ms
14 2001:41d0::513 19.685 ms 19.674 ms 2001:41d0::50d 18.248 ms
15 2001:41d0:0:50::5:10a1 19.092 ms 19.917 ms 2001:41d0:0:50::5:10a5 20.126 ms
16 2001:41d0:0:50::1:143f 19.375 ms 2001:41d0:0:50::1:143b 24.609 ms 2001:41d0:0:50::1:143d 18.933 ms
I was just going for a peaceful morning walk, trying to start the day with some sunlight. As soon as I sit on the first bench with sun I can find, the dreaded message comes: "everything is down".
Thank you! I had almost forgotten about my beloved provider since the last outage :)
Our servers in Strasburg were down for around 10 minutes, now they are fully operational but it looks like some of their sites are still experiencing issues [1]. On the positive side: it was my second day as on-call and I already had the worst incident(no ssh, provider site is down).
I don't think this counts as a worst incident. Since there is nothing you can do you might as well go take a nap and wait for it to pass.
On the other hand something like Gitlab's "We deleted the production database and our backups were not working for months" sounds much more stressful to me.
Azure is also having issues with deploying and starting virtual machines.
EDIT: To those having issues starting/deploying Windows virtual machines in Azure and are using ARM to do so: change the OS type to Linux instead of Windows. This seems to resolve the following issue:
> Error: No version found in the artifact repository that satisfies the requested version '' for VM extension with publisher 'Microsoft.WindowsAzure.GuestAgent' and type 'CRPProd'
That may be an interesting effect. People test their failover solutions if one of their provider is down when the provider is not actually down. But when it is actually down, everyone else is executing their failover plan at the same time, and compete for other clouds resources.
Do you have any numbers/sources to back that up? I find that hard to believe, mainly because US-east1 goes down so much. It's never been hard-down across three AZs level of being down with no EC2 no VPC no nothing, but S3 being down in the whole region is pretty freaking close!
For each cloud provider spare capacity is portion of their actual capacity so considering anything beyond GCP is not even a thing. Now since most of the spare capacity is being utilized as spot instances there is not that much in reality as many spot workloads are actually cost optimizations and would shift to other types if spot instances are not available.
Interesting indeed but I doubt that "everyone else" is executing their failover plan, simply because most don't have a failover plan including running on another cloud provider
looks like funny routing.. packets to ovh.nl go to Asia, and then suddenly find a faster way back to me
3 0.ae21.xr4.1d12.xs4all.net (194.109.7.169) 7.324 ms
0.ae21.xr3.3d12.xs4all.net (194.109.7.173) 5.929 ms
0.ae21.xr4.1d12.xs4all.net (194.109.7.169) 5.864 ms
4 0.et-1-1-0.xr1.tc2.xs4all.net (194.109.5.7) 7.562 ms 5.487 ms
0.et-7-1-0.xr1.tc2.xs4all.net (194.109.5.5) 5.551 ms
5 asd-s8-rou-1041.nl.as286.net (134.222.94.216) 6.034 ms 5.759 ms 5.608 ms
6 ae11.cr6-ams1.ip4.gtt.net (213.200.117.178) 6.620 ms 7.938 ms 6.670 ms
7 80.231.85.162 (80.231.85.162) 7.348 ms 6.170 ms 6.320 ms
8 if-ae-45-2.tcore2.av2-amsterdam.as6453.net (80.231.152.50) 310.705 ms 257.678 ms 257.278 ms
9 if-ae-14-2.tcore2.l78-london.as6453.net (80.231.131.160) 257.217 ms 256.759 ms 259.044 ms
10 if-ae-2-2.tcore1.l78-london.as6453.net (80.231.131.2) 258.850 ms 256.723 ms 256.735 ms
11 if-ae-12-2.tcore2.mlv-mumbai.as6453.net (180.87.39.22) 257.349 ms 255.042 ms 257.338 ms
12 if-ae-16-2.tcore1.svw-singapore.as6453.net (180.87.12.226) 248.550 ms 257.559 ms 329.240 ms
13 if-ae-2-2.tcore2.svw-singapore.as6453.net (180.87.12.2) 307.149 ms 255.582 ms
be101.mrs-mrs1-sbb1-nc5.fr.eu (54.36.50.135) 181.205 ms
14 be101.mrs-mrs2-sbb1-nc5.fr.eu (54.36.50.159) 180.612 ms * 179.738 ms
15 * * par-gsw-sbb1-nc5.fr.eu (54.36.50.228) 186.377 ms
I don’t know how is OVH doing on average nowadays in relation to downtimes, but I worked for a company around 8 years ago that had part of its infrastructure running on OVH. That thing was a sad joke and we ended up migrating everything to AWS. Pretty much every month we had issues with them.
Downtimes due to incidents on their data centres. We used to joke saying things like “oh, that must be the OVH guys having some fun on the datacenter again”.
Hetzner is the cheapest and their cloud offering is solid. Heavily under ratted provider. Price to value, interface, documentation, API, libraries. Everything top notch.
With them recently announcing that they will start charging tax for US sales, `ash1-speed.hetzner.com` resolving to an IP address (yet unroutable) under a different Hetzner ASN, and this one[0], I believe they are already in for a US region possibly in Ashburn, Virginia.
Thanks, it's about time this reputation on french sloppiness ends.. What would people have said if "move fast and break things" moto was coming from a french company.
LOL. OVH has turned into the laughing stock of the industry. Its a shame really, I always used to gravitate towards their offerings vs competition because I liked the fact that they have servers literally everywhere...but what good are servers everywhere when u get a major outage every few months?
Perhaps I'd tolerate the constant outages if they were actually any good at communicating. Their status page sucks. They rarely update it properly, and when they do, its hard to find the information you need and that you know relates to your servers.
Well, some products/businesses don't need 99.999% uptime and that's fine. Hell, I'm working on a small shop and we had to update some service that is not used constantly by our customers and when I suggested to have a downtime of 5 minutes everybody looked at me as if I were a caveman. The other alternative was to come up with some custom strategy to have two instances of the same service running at the same time and kill one of them when we know the other is healthy. Now, for a service that is constantly used that's a good idea, but for our scenario a 5 min. downtime is fine... we are not Google.
Minus the fire incident, can't recall any outage. Not the cheapest offering out there but been running many things on it on autopilot. Support is solid too, always had a good experience.
I'm not affiliated with them in any way, but historically, Online.net (now rebranded as "Scaleway") has always been the main competitor of OVH in France. They were known for their "Dedibox" when I was younger.
OVH kinda won because their VPS pricing was always dirt-cheap compared to the cheap dedicated servers provided by Online.net/Scaleway. Since then OVH VPSes got expensive, and Scaleway started offering cheap VPSes, the prices got aligned. I know a few people who used to be OVH-fanboys and they all migrated to Scaleway and Vultr the last 4 years.
But I agree, the competition for unmetered reliable VPSes is getting thinner and thinner… People are even started to consider metered VPS solutions like Vultr…
Rasmus Lerdorf (the author of PHP) did an objective non-sponsored comparison[1] in 2019, I'm using IONOS personally, but since then strato.de appeared. (Both are German) I'm thinking about adding another node from them to avoid relying on only one vendor.
Following a human error during the reconfiguration of the network on our DC to VH (US-EST), we have a problem on the entire backbone. We will isolate the DC VH and then fix the conf.
This seems to have brought down Snapchat as well... (not the target audience particularly, here, but still relevant); didn't expect snapchat to rely on OVH at all, so it could be a coincidence.
As someone that runs an uptime monitoring service, it was wild to see so many sites go down that I wondered whether it was my monitoring that was screwing up!
Has OVH gotten substantially worse in the past couple of years? We ran a decent cluster in GRA-3 for a few years, and I can't recall having any real downtime. But lots of comments here lead me to believe that other people have very different experiences with OVH.
What's the rough downtime an email server can handle without losing any emails? The greylist filter relies on most people's emails resending after a few mins if the first is rejected, so guessing a short enough downtime has no impact?
Defaults depend on the application and actual settings depend on the admin/postmaster. So retry queues will typically be somewhere between 3 to 7 days, depending. High volume sending servers may be shorter. It is rare for short outages to cause emails to be lost unless the outage is due to a fire and the server holding your message melted.
What could be the best way for status page apart from third party services? I remember one project that used IPFS as distributed status page. https://news.ycombinator.com/item?id=16273609
That third party service has still got to be hosted somewhere though. What happens when the cloud they're running on goes down?
The rule should be host your status page on your competitor's cloud. If you're AWS, host it in Azure, if you're Azure, host it in GCP, if you're GCP, host it in AWS. (Linode, Digital Ocean, OVH, etc can do their own dance.)
It seems like (only?) their name servers are down. I got my website back up by avoiding to use these (it's hosted on a dedicated server at OVH). They'll probably have the same kind of issues as facebook did while trying to get things back up.
Does anyone have an alternative for a hobbyist VPS provider (around 5€/month) ? OVH keeps reducing their offering and bumping their prices. Plus they dropped support for Arch images.
I use hetzner, their smallest VPS is around 2,xx Euro. Has been very reliable for me in the last couple of years. I'm not sure whether they support arch, though.
I'm using cheap Contabo servers and they've been surprisingly reliable for the price they ask. Their €5 offering (excluding VAT) comes with 4 cores, 8GiB of RAM and either 50GiB of NVME storage or 200GiB "ssd" storage. That's a lot better than anything I can find elsewhere.
The only problem I've found is that unlike most VPS providers, they don't seem to do gigabit speeds in their cheaper tiers. I'm content with the 100mbit offering (which is 200mbit if you sign up these days, I think) so that's not a problem for my use case.
If you only run very basic stuff (basically a raspberry pi in the cloud) then Oracle offers two VPS servers for free (not "free for a year", "free with a purchase", actually "free forever", which was a huge surprise to me). You get very little in terms of performance but for a website or small project that should be sufficient.
They have many locations and I used their VPS for a long while before I moved to colocation. Had no issues apart from when I accidentally deleted /etc/.
I've been using SSDNodes for some stuff for the past year. Had an issue with my VPS in Dallas, contacted support, they fixed it right away - response in a few minutes, fixed within half an hour.
Click through to actually ordering, and when you select your OS, you're given a choice of only Fedora/Centos/Debian/Ubuntu/Windows Server. When I was looking for a host that supported Gentoo I found similar false advertising from them (and many others), they don't actually do it. In theory some of them will give you a recovery shell so you can then bootstrap whatever distro you want, but once you do that you're in very unsupported territory.
Prior to the outage I received an erroneous email from them about an old account, followed by an apology about it. It seems they were having other issues than just the router config.
Which is stupid. If I say I'm walking to the store and will be gone for 30 minutes, then get hit by a truck and spend a week in the hospital, nobody would say "30 minutes, huh?"
Well, that's a bit like saying SMTP or IMAP is always the issue when your email stops working.
If they screwed up routing, BGP was very likely to be involved, but the cause was probably human error, failure of process design, failure of system design, etc etc.
> Yes, because no network issue ever happened in the past with the "greybeards".
These issues, one might think, should have turned into tomes of extensive information on why they happened and how to avoid them, and become an integral part of showing the ropes to new sysadmins, or operation persons, you name them. It seems, however, that by and large the knowledge of the actual working systems does get irretrievably lost once some Kevin gets retired.
P. S. Would you be just as cavalier if lives were lost as a result of such incidents?
Not gatekeeping, rather lament that the next generation doesn't tend to dive deeper into the nuts and bolts of how those things actually work and what the failure modes are.
Maybe the greybeards don't do a good job at educating their successors, or maybe it's the "tl;dr, YOLO" attitude at work. I don't know. I suspect a combination of both.
[1]: https://ipo.ovhcloud.com/sites/default/files/2021-10/OVHclou...