Hacker News new | past | comments | ask | show | jobs | submit login
InfluxDB Cloud shuts down in Belgium; some weren't notified before data deletion (influxdata.com)
443 points by PaulAA on July 9, 2023 | hide | past | favorite | 266 comments



It seems this probably happened due to some regulation or other. The sunset date for the service should have been a month prior so that influx could have kept the data legally until the 30th in case of this situation happening.

They wanted to have the euros flowing in right until the last minute.

1. Flash messages on all user facing consoles. 2. No new resource able to be created for 6 mo this. 3. Emails. 4. Service end date should have been at least a month prior to mandatory shut down. 5. Any people still running workloads in May should have had aggressive contact attempts made to ensure they were aware. 6. The console in the region should have switched to a final backup that can be exported by the user or moved to another region. This should have been available for 30 days.

You don’t do this because it’s fun, you do this because you need to save reputation. If I can’t trust you with business critical data then why would I use you for my critical business?

Also, as someone who works for a large enterprise, if you really believe email is the way to inform them of these changes, well I’d reconsider your beliefs.


There's no regulatory consideration involved as far as I can tell. On Slack at https://influxcommunity.slack.com/archives/CH8TV3LJG/p168894... they explain the shutdown thus:

> "The region did not get enough usage or growth to make it economically viable to operate, so it became necessary for InfluxData to discontinue service in those regions."

So it's worse than you believe. Yes, the handling is a scandal for all the reasons you say. But they weren't even pushed into this by some regulatory issue; it's pure cost-cutting.


Given they shut down two DCs half a world apart, it's not regulations. It's cost.


But its a paid service right? Is it a pricing issue? If it is, isn't it better to increase the price?


You wonder if we will see more of this from all these high burnrate SaaS startups right? It saves money to shutdown even paid services if they are cashflow negative. The difference between paid services and unpaid only matters if costs are below prices they sell the services for..


It's a cost-cutting measure that reeks of a company trying to cut costs as fast as possible.


They still should have at the very least done a backup of each customer DB in those regions and created an option to download and/or restore to a new region and kept those for at least 30-90 days.

A scream test would have been a better option in addition to the above.


And make no mistake some people will still miss the notification after all these warnings


Perhaps service shutdown is also the only valid case where it can be okay to intermittently fail API requests?


companies generally want to be paid for their costs of holding your data liabilities, yes.


“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”


It should not be understated how bad this is. Your #1 expectation as a cloud database provider is to keep data safe and recoverable.

I hope for at least their sake they took a backup of everyone's DB that could be restored in another region, but based on the fact that they didn't do a scream test, I doubt they thought about this either.

This must have been forced upon by upper management, because there is no way someone along the chain to actually delete data did not suggest a scream test. No way someone didn't say "this is a terrible idea, email is not reliable".

Adding Influx right next to GCP of providers I'm never using. Self-hosting is the way, and use ClickHouse.


In case anyone else is wondering:

> The Scream Test is simple – remove it and wait for the screams. If someone screams, put it back. The Scream Test can be applied to any product, service or capability – particularly when there is poor ownership or understanding of it’s importance.

https://www.v-wiki.net/scream-test-meaning/


Maybe this is the scream test…just done badly


that's not the scream test, that's the nuke test - nuke it and see if anyone complains, if they do, it's already too late


No, it's gone. For it to be a scream test they'd have to be able to retrieve the data somehow and they can not.


I agree. This should be an indication to all current users that they should no longer trust InfluxData with their business.

The CTO seems to have been checked out for a long time (just look at how little developer engagement there is on here) and the CEO seems to have no idea how to run a DBaaS. The fact that nobody else from the company has stepped in to try and defuse this should terrify anyone who has data on InfluxData's cloud.

This is the beginning of the end. It seems like all of the good people have left the company, and being willing to destroy credibility to cut costs is a clear sign that the company is running on fumes.

So, now is the time - find your alternative, whether it's Timescale, QuestDB, VictoriaMetrics, ClickHouse, or just self-hosting.


The CTO's blog post is pretty half-assed: https://www.influxdata.com/blog/update-from-influxdata-paul-...

It's the same "we 'tried'" message they have here. Even worse, this wasn't a regulatory shut-down, this was a lack of demand decision. They had 100% control over the timing and means of the shut-down. They didn't even keep backups! They just deleted everything.

Some highlights from the blog. It reads like a "cover my ass" to the board, rather than fixing problems for customers.

* > Over the years, two of the regions did not get enough demand to justify the continuation of those regional services.

  * In other words, they had no external pressure. They just shut this down entirely on their own accord.
* Immediately, blames customers for not seeing notifications. Explaining "how rigorous" their communication was.

* > via our community Slack channel, Support, and forums, we soon realized that our communication did not register with everyone

  * In other words, "we didn't look at any metrics or usage data. How could we have possibly known people were still relying on this?"
* > Our engineering team is looking into whether they can restore the last 100 days of data for GCP Belgium. It appears at this time that for AWS Sydney users, the data is no longer available.

  * That's literally unbelievable. They didn't even keep backups!  They deleted those too! Even it the region is going down, I'd expect backups to be maintained for their SLA.
* Lastly, a waffling "what we could have done better" without any actual commitment to improvement. Insane.


This is pretty much corporate suicide. I really don't understand what they are trying to achieve with this and their attitude in this thread is baffling.


I completely agree with you regarding corporate suicide. The rest of my post is complete speculation.

The least nonsense thing I can think of is that they weren't paying their bills. They weren't paying rent, the landlord locked them out and repo'd their servers, or something similar. (perhaps they were inspired by Elon Musk's recent antics?)

If that were the case, they would not disclose that that's what happened. If they disclosed that, all of their other customers would immediately begin migrating their data; not tomorrow, not next week, now.

If there were any excuse they would give it. "We were hacked!" "It was a disgruntled ex-employee!" "The datacenter burned down!" "It's those dirty EU data laws!" etc.

Shutting down the data center and deleting all the data (without migrating) at the same exact time and that was Plan A--nah I don't believe that.


This was announced months in advance (albeit not in a way that could possibly guarantee that most customers would ever discover it) so I don't think your speculation is true. As best I can tell from the information publicly available, they really did shut down the data center and delete all data simply to cut costs with no external push whatsoever.


I agree with your comments about how Influx handled this shutdown.

The several things you might mean by self-hosting have their own pros and cons. The right choice is very context-specific, and assuming that it’s always the right choice is wrong. It certainly can be, though.

As for ClickHouse, that mention seems like a throwaway comment, unless you are advocating a boycott of even the open source InfluxDB due to its corporate author’s behavior and view ClickHouse as the closest alternative.

This incident has nothing to do with the comparison of the open source InfluxDB vs the open source ClickHouse, nor would it impugn the viability of InfluxDB hosted by a more responsible data custodian than Influx the company.

And GCP hasn’t done any similar inadequately notified shutdown of service with immediate and irreversible data loss, as far as I know.

(Disclosure: I have worked for Google in the past, including GCP, but not in over 8 years. I’m speaking only for myself here. I’ve never worked for Influx ClickHouse.)


This kind of thing really does need a cooling off period.

Assume that your users won't see your emails. How do you help them avoid data loss when you shut down a service like this?

One option that I like is to take the service down (hence loudly breaking things that were depending on it) but keep backed up copies of the data for a while longer - ideally a month or more, but maybe just two weeks.

That way users who didn't see your messaging have a chance to get in touch and recover any data they would otherwise lose.

I'm not sure how best to handle the liability issues involved with storing backups of data for a period of time. Presumably the terms and conditions for a service can be designed to support this kind of backup storage "grace period" for these situations.


you start with reliability brownouts. first fail 0.1% requests, then after a week 1%, then after a month 5%.


Much better is to stop the service but add button "Resume" that re-enables service for two more weeks with no data loss. That way you give users opportunity to gracefully migrate away.

Stopping service and immediately delete data is just callous.


When ovh sunsetted a class of VPSen and I'd completely failed to notice they were going to do that, I asked nicely in the support ticket I'd sent in and they turned it back on for a few days while I shifted the data to a replacement (which was still an ovh VPS, it had been Just Working long enough that I didn't feel like I'd been mistreated, more lulled into complacency by the lack of problems).

I think requiring a ticket might be a worthwhile trade-off compared to just adding the button, because that allows you to engage with customers to make sure they can (in a case like this) migrate to a different region of your own service, and the activation energy of sending a ticket means a customer's less likely to click 'Resume' and then forget about it again until it's too late.


I mean this is why you do these projects on two different timelines: The internal timeline and the external timeline.

Externally you communication: Different announcements each month, final notices at T+5M, System will be deleted at T+6M, data will be lost at that point and so on.

Internally (at least at work) such a timeline is more that at T+6M, we cut access to the systems. Afterwards, systems not accessed for 2-4 weeks are removed periodically and the hard removal is planned for T+9M. Customer support and account managers can manage if systems need to be accessed. If a customer needs the system for a longer time, they can, but then they pay for it. Entirely with all necessary infrastructure, not renting a few licenses on the system.

Call it a bit callous, but this allows our customer support to appear nice and in control. And it leaves the customer happy and relieved that we have left some slack and leeway. But they've been shaken and woken up and can get to migrating.

The biggest challenge here is to stay on it and to not allow customers to become complacent again. This can be done by e.g. limiting the reactivation time to a week or so so they have to get on it.


Yep, and in certain in house situations it's best to keep a backup around for ~13 months in case there's an obscure business process that only gets done once a year. (I'm aware that some people reading that sentence are going to go wtf at the idea that that's anybody's problem except whoever didn't tell you said business process even existed, but if it's a sufficiently critical finance or HR thing it tends to rapidly become everybody's problem so I like to have options)

Agree absolutely wrt complacency, I believe I asked for less than a week because I actively preferred a situation where I had to get on it immediately.


That seems like the worst of both worlds, during the brown-out you have to keep paying for the compute while your customers don't get a reliable service, even if they have a plan to migrate.

Also you probably can't keep charging customers for that period since you offer a crippled service on purpose.


If you are shutting it down, you can pay for the grace period. Period.


Just shut it down for real (after proper early warnings), so you save on compute and no one is confused about the state, and offer data retrieval for the grace period.

Brownouts are great for API changes, but not very useful before a full shutdown.


You assume warnings reach users. Some people miss emails. Fewer miss a service going offline. Keeping data after shutdown is a good backup.


That's why I'm saying to take it offline. Purposefully broken service is not very valuable, can't really be sold, and yet can still be missed; it also costs you money.


Hi, cofounder and CTO here. We notified everyone via email on February 23, April 6 and May 15th. We also offered to help migrate all users. I realize that it's not ideal that we've shut down this system, but we made our best efforts to notify affected users and give them options to move over to other regions. If you've been impacted by this, please email me personally and I will do my best to help out: paul at influxdata.com.


Paul: I'm surprised you didn't do a scream test. Not everyone is going to see those emails and even those that do may not understand what they are reading.

Internally at my company we always do scream tests as part of our EOL process because we know we can't reach everyone, even our own employees.

https://www.microsoft.com/insidetrack/blog/microsoft-uses-a-...

Fun story: my mortgage got sold last year. Not the first time. I got emails from the old mortgage company and the new mortgage company about the sale, but I skimmed them. I got letters via USPS from the old and new mortgage companies, but I mostly ignored those because 95% of what mortgage companies send me via USPS is junk. So I missed the fact that my automatic payments didn't transfer over. The new mortgage company let me get four months in arrears before they finally FedEx'd me something overnight. That got my attention. I was like: you guys should've FedEx'd me this in the first place. For all they knew, I wasn't getting their emails or letters in the first place because nothing had been sent signature required.


> Not everyone is going to see those emails and even those that do may not understand what they are reading.

If that's the case, these companies/people have no business using cloud services. Fair enough that you might not understand the ramification, in that case you contact support. If you don't see those email... that's on you. We operate out of a number of datacenters, they all communicate via email, giving us one month to three notice regarding service windows. If we fail to plan for an outage because we didn't see an email, that's our problem. I don't know why anyone would expect more from a SaaS company.

For really large customers, I would assume that they have a customer service representative and yes that person should have called. If you're just a small customer (even if you might be big in your own mind) and just have an account that get billed to a company credit card each month, it's a little naive to think you'd get anything more than an email.

We've already seen a number SaaS company just shutting of customers for little to no reason, even AWS has done this. Running things in the cloud is a risk, and it's you job as the operations team to stay on top of things, have backup plan, because you cannot expect cloud vendors to care about some random customer who just signed up using a credit card and a nondescript email. They should, but they don't.

A good rule is: Don't expect a SaaS/cloud company to put in more effort contacting you than you did signing up.


> > Not everyone is going to see those emails and even those that do may not understand what they are reading.

>

> If that's the case, these companies/people have no business using cloud services.

Cloud services are responsible for this. I've signed up to many cloud services where I purposefully unchecked all the newsletter/updates/... notifications.

But I still receive notifications for stuff unrelated to what I use. These emails are full of marketing/PR jargon, where it's unclear whether I'm affected by the change or whether there is even a change!

Cloud services are lazy, don't look at their customer use, spam everybody, and blame their customers when they missed an important update due to noise-fatigue.

This is the main reason why I stopped using SaaS.


Unless the cloud provider can provide proof that the person received and read such a notice, then they can still be sued for actual damages... and I'd be surprised if that doesn't happen in this case.

The fact is, there are many options from a cooldown, scream test, automated backup for migration/recovery... this organization did none of those things and absolutely deserves to lose massively as a result. This is a DATABASE as a Service... RETENTION should be one of the highest priorities.

For that matter, it would have been better if they auto-migrated in an OFF status, or otherwise backed up... just hitting the DELETE ALL DATA button is wrong. Several of the posters in the thread indicate they received no such emails.


> I was like: you guys should've FedEx'd me this in the first place. For all they knew, I wasn't getting their emails or letters in the first place because nothing had been sent signature required.

I love the scream test, but the analogy you bring up actually - this seems unfair. The cost of Fedex'ing everyone is astounding (for many businesses).

But I like the concept. Definitely a sort of "shut off the server for like an hour" and then see who yells.

Phone calls for any account that still operating on the location.

100% agreed 3 emails is... hardly anything.


When you buy a large, registered debt such as a mortgage the cost of Fedex'ing everyone should be factored into the sale. If that's too much money you shouldn't be buying such assets. Notifying those that are affected properly would seem to be the least you can do in such situations.


Exactly and they’re find sending junk mail all the time and FedEx isn’t that expensive for large businesses like that.


At least here in Germany junk mail senders actually get reduced rates.


At the end of the day you are the one who pays for the fedex though.


Not really, no. Mortgage payments are interest and principal, not administrative fees beyond what you paid when you signed the original contract.


Whatever costs are imposed broadly an industry are covered by customers of that industry. If mortgages are more costly to buy, they’ll be more costly to originate.


The consumer doesn't bear 100% of the costs though. If mortgages cost $1mm to buy, they wouldn't cost $1mm to originate. They'd cost a little extra by being illiquid - the same amount as if they cost $2mm to buy.


The consumer nearly always bears 100% of the costs - because most companies sell at a profit.

The final price is a combination of all costs incurred + a profit percentage imposed on the consumer.

If all customers would receive fedex mail, then the costs of this fedex would be pushed on them somehow. Probably by making them pay slightly higher interest rates or by introducing some one off handling fee.


Profits aren't a fixed number.


Of course not, but financial products are priced and offered with a financial outcome (usually a margin) in mind. If you make mortgage processing more expensive, you'll find the offers for origination are worse than if mortgage processing were less expensive.


Sorry, it's a ridiculous argument. Mortgages get bought and sold all the time and clearly the buyers are on the hook for the communications costs and these do not pre-emptively get priced into future products by the sellers. I'm sure that there are situations where your argument has merit but this isn't one of those.


No one is saying that existing mortgages would be re-priced. Those are contracts and you can't unilaterally change them. What I and others are saying is that if you changed the obligations of buyers such that mortgages became more expensive to service, that those servicing cost increases would ultimately be borne by the mortgage borrowers rather than eaten by mortgage lenders out of the goodness of their hearts.

> these [communications costs] do not pre-emptively get priced into future products by the sellers

They 100% do get priced in. Whenever you buy a product, you're paying all the costs of that product. When someone originates a mortgage, they're aware of the secondary market for mortgages. If that secondary market is eroded by a significant increase in communications costs, that reduces the willingness of a secondary buyer to bid for your book of mortgages. That erosion reduces your projected profit on originating, so you take a little longer to lower your offered rate to 5.250%, or you charge a bit higher origination fee, or whatever to ensure you maintain a viable business. So long as these fees hit the entire market, the other originators are all making the same calculations.

It seems odd that you [seem to] think that money for these costs would just result in reduced profits for the financial services companies rather than in increased borrowing costs.


> What I and others are saying is that if you changed the obligations of buyers such that mortgages became more expensive to service, that those servicing cost increases would ultimately be borne by the mortgage borrowers rather than eaten by mortgage lenders out of the goodness of their hearts.

I got what you and others are saying but I've been trying - and failing, apparently - to point out that the costs for a mortgage in case of a sale are born by the buyers who are not even the same kind of institutions as the parties that sell them, and so they are in no position to charge the subjects, nor are the sellers going to price the mortgages any higher in the future because the costs aren't born by them.

There is no such thing as 'projected profits' that go into this because mortgages can be sold (and are sold) more than once, the number of times is not known when they are issued first. And it is going to be only a small fraction of the audience that is going to be hard to reach for whatever reason. The presumption that there is some kind of free market mechanism that will ultimately pass those costs back to the original mortgage underwriters is not in any way evidenced by present day mortgage prices. On the sum total of mortgages out there and the - exorbitant - profits they create for the lenders we're talking about such small amounts that it will make zero difference.

A 'significant increase in communications costs' would translate into that being something that is some noticeable percentage of the total yield over the remaining time and it just simply isn't. Typical mortgage rates and amounts utterly dwarf the costs of a one time notification, especially if you don't have to notify everybody like that, and you can try cheaper channels first until you have a hit. Besides, the original mortgage 'service charge' is already a large multiple of the various costs and tends to be mostly pure profit for the initial lender.

It seems odd that you [seem to] think that mortgages are priced such that the mere cost of communications is going to show up in the prices, they are amongst the most profitable financial products.


> On the sum total of mortgages out there and the - exorbitant - profits they create for the lenders we're talking about such small amounts that it will make zero difference.

If it made zero difference, they would Fedex the documents, but they dont.


The point of the story was that if you have something critical to communicate, you can't do it using the same methods that are also used for low-priority mostly junk and expect it to be acted upon. Surely a mortgage servicing company can afford to FedEx an envelope. I wouldn't expect Influx to do so.


For that matter, registered mail is another option that isn't as expensive as fedex... there's other options that are slightly more costly, and more noticeable than sending via the same structure that junk mail goes through.

I expressly don't use automated payments for my mortgage and auto loan(s) as I don't want to have an account miss... I didn't know my mortgage was sold/changed a couple months ago until I went online to change. The same happened with an auto loan a few years back as well.


>I love the scream test, but the analogy you bring up actually - this seems unfair. The cost of Fedex'ing everyone is astounding

I wouldn't be surprised if the cost difference is negligible, maybe a couple bucks per mortgage? FedEx bulk discounts can be pretty massive


The price doesn't matter in this analogy since it still mostly works


> scream test

I didn't realize there was an actual name for this, so I'll add another for the thread that I haven't seen brought up: email is evil in operations (EIEIO, like the nursery rhyme)


I think the analogy is to station people at various floors in a big office building while you disconnect (unmarked) cables in a legacy patch panel to see who starts screaming. Then you plug it back in and label the cable...


Or the more general version, where you shutdown a server/service and see if anyone screams. If it's still silent after $days, you can decomission it.


Something similar happened to me when I lived at a condo in Boston and the management company changed and I somehow missed the memo.

I kept sending checks to the old management company for a few months before the problem was discovered. Unfortunately, someone at the old management company apparently had a bit of a gambling habit and petty-thefted that money away, and I never saw it again even though she lost in a lawsuit.



Was the employee the defendant, not the company?


In this case, yes, it was the employee; I forgot the details as to how/why that was


You know, I don't really care that mortgages get sold, but the fact that I need to go out of my way every time to make sure I update my information in their system is astounding.


Technically that shouldn't even be your problem.


Neither should be a random third party defrauding a bank, but banks are masters at outsourcing work and responsibility onto their customers, and that's why it's called "identity theft" and suddenly I am at fault somehow.


I even (sort of) do this when I'm deprecating something which my team is the only user of, because sometimes it's hard to tell if something's really unused! First shut off the VPC access while leaving all the other infrastructure and data intact, wait a week or two and see if everything breaks, then get rid of everything else


There is another variant of this: if you can show that the code you are deleting never worked, there is no need to do a scream test. That is, if anyone cared about the code you are deleting, they would have already been screaming.


I would be careful with that. Maybe they did scream, but you haven't heard it, and they worked around the issue. Or maybe they did their workaround without saying anything. Or maybe you're wrong about your code not working. It actually may be working in some way that you don't know of, but is useful to someone.

To use an ecosystem analogy, once you expose your software to the world beyond your own dev environment, even internally, you'll eventually find that something colonized it - much like everything on this planet that isn't being actively and regularly scrubbed.

In my own career, I've seen cases of this. For example, once we were tweaking a little embedded database that supported a half-finished feature meant for internal use, and only then we (as in everyone in the dev team) learned that somehow, the QA & deployment support people got wind of it, and were scripting against exposed parts of that DB for a good year. And, it turns out, it wasn't the only part of the software that we thought of as incidental phenotype (or didn't think of at all), and the other team considered stable behavior.

See also the so-called Hyrum's Law: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviours of your system will be depended on by somebody."


Re: Hyrum's law This is why user facing code should have (at least) two classes of tests: 1) is it doing what the developer intended it to do 2) is it doing the same thing it did on the last release version with typical user requests?

That sounds the same, but it is not.

The first class is a set of simpler "happy paths" of intended specific behaviors.

The second class is like wargaming. A good way to do this is to replay user requests against your API and see that they return the same results release to release. You may also uncover interesting unintended behavior / conversations to have with users this way.


Clearly, it's easier to say sorry than do things right up front


LOL, have they even said sorry though?


Sounds like the most cost effective way, however.


Wow, that's pretty pathetic and your attitude "we can't help our customers" is even more damning. Email is not reliable enough to simply rely on a few email blasts for this.

I would expect:

* Those 3 "email blast" notifications. I'm guessing one of two things happened here:

  * You sent them as an "email blast" from a marketing-type email service. These hit email filters because they came from a known spam IP.

  * You sent them as a transactional email, but blasted them too quickly and got pegged for spam. Never hit the inbox.
* Increasingly common "you haven't migrated emails" if you still detect traffic on these instances. This is pretty critical since some companies might not realize they have affected They should, but things get complex.

* Ideally, an automated transfer to another region with automated forwarding. It's okay to have poor performance, but it's not okay to go "poof" entirely.

* A soft-delete at the deadline, with 90 to 180 days to finalize migration. If this is costing you dearly, then drive prices up, but don't hard delete data.

Frankly, the last one is the real issue. It's literally unbelievable that a database provider didn't soft-delete. Further, I would expect that you'd be able to migrate these to another region to get customers back up an running.


Another problem is that service providers frequently poison the email channel with important sounding engagement dreck and we are now conditioned to ignore it.

* Important: migrate to this new feature immediately or you risk missing out!!

Vs

* Important: migrate your data immediately or you risk losing it!!


Plus, many companies abuse what I assume is an exemption to spam rules by pretending that their marketing emails are “service related”.

Xfinity is the worst about this. They’ll send me a so-called service-related email exhorting me to download their app. Same with Capital One and their monthly emails asking me to turn on automated texts.


You worded that better than I could have.

I ignore most of my vendor's emails because they're simply trying to spam me at this point.


I was also thinking that salesmen will cold-call/email me at least three times before they give up. People with whom I have no business relationship try harder than a corporation taking someone's money.


Welcome to modern economy. This applies to B2C, and as we can see it, B2B too. Getting a new customer is all the shitty vendors care about. Keeping existing customers is apparently just cost.


I went to a conference in 2018, gave out my work email. Still get pestered by them


It's crazy the last email blast was ~6 weeks ago.

And the shutdown isn't even a month / quarter end, so seems even less like a billing cycle thing.

There's so many more mature ways to do a graceful paid service shutdown. Disable reads first so people get errors & contact support. Then disables writes as well. Somewhere a few weeks later you can consider deleting data.

As others have said, I've worked at megacorps with internal systems that had more mature migrations than this. Honestly I have run internal apps at sub-1000 employee firms where we did various forms of scream tests and soft deletes before MONTHS later, even daring to delete data on disk.


> Ideally, an automated transfer to another region with automated forwarding. It's okay to have poor performance, but it's not okay to go "poof" entirely.

If the data is moving between countries then this is not an option. Your clients may have legal or contractual obligations with respect to data location.


We are talking EU here. apart from some rare archaic remains of past rules or really "national security related" (and of this last one I don't think anyone is using a cloud provider) clients should be free to store their data in any Country inside the EU common market.


From the sounds of it, it was impossible for them to keep data after the 30th of June, so not sure a soft-delete at the deadline would work. Of course the better option probably would've been a soft delete before the deadline, but you know that some customers would've said "why'd you do this before the deadline, you could've waited an extra month!"


I’m certain they wouldn’t. They have no influence over it anyway. The only thing they can do is migrate.

Many people don’t have that option now since their data is dead and gone.


This is insane.

> We notified everyone via email on February 23, April 6 and May 15th. We also offered to help migrate all users. I realize that it's not ideal that we've shut down this system, but we made our best efforts to notify affected users and give them options to move over to other regions.

What other communication methods were attempted beyond just emails? Big, red obnoxious banners and warnings in various UIs? Phone calls?

The fact that it seems as though quite a few customers didn't get your emails, what was the thought process when looking at the workloads that were clearly still active before nuking it from orbit? Or was there no check and it was just assumed that people got the email and migrated?

Of the customers who were in that region, how many actually migrated? Was someone tracking these statistics and regularly reporting them to leadership to adjust tactics if there weren't enough migrations or shutdowns happening?

This screams either gross incompetence or straight up negligence. This is such a solvable problem (as many here have already mentioned various solutions), but I'm honestly just flabbergasted that this is a problem that is even being discussed here right now.

As a DBaaS, the data of your customers should be your number one priority. If its not, y'all need to take a hard look at what the heck your value proposition is.

We weren't impact by this directly, but you can be sure that this is going to be one of the topics for discussion amongst my teams this week. Mostly how we can either move off InfluxDB Cloud or ensure that our DR plans are up to date for the rug being pulled out from under us from you guys in the future.



It says "The UI was updated with a closure message for these regions."

Depends on where and how this message was added.

It also means that they had no monitoring in place to see how many people migrated.

Edit: They also say that this is reflected on the status page. Here's how their page looks: https://i.imgur.com/xlO4Ik2.png

Yup. It's literally a green status page that no one would give a second glance. That unreadable white on green? Oh. It's a deprecation message. It even has a subscribe link so that people would immediately and completely dismiss it as an ad due to ad/banner blindness.

Edit 2: Someone replied in the thread and added more context for the absolute lack of communication.


Yeah, why is the deprecation message in green? Why not in red? Who is picking colours over there and with what criteria?


Email only is not even close to best effort. I know it’s standard to only do email for tech companies, but all other types of companies usually do physical mail and phone calls on top of emails for important notifications.

I am not a customer, but it’s really annoying me how tech companies repeatedly think sending emails is somehow anything but the absolute minimum, most lazy option.


However, tech companies will often not have your physical address--unlike your bank. And I'd probably block phone calls from some tech company I was a customer of.


If they want business customers in Europe they need to create proper invoices, which contain the physical address.


We get an email address because we need to contact our customers. After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss. I realize that shutting down a region isn’t good. It’s not what we would have preferred, but we had to do it for the business. And we made an honest effort to contact all customers to help move them.


As a CTO, you should make yourself aware of what a scream test is

https://www.v-wiki.net/scream-test-meaning/

Basically, you just turn stuff off, and make people scream, while they can turn it back on. You could have did this a month ago, as a critical warning of impending termination of systems.

You didn't do that.

Instead, you sent a few emails, which itself isn't even a guaranteed delivery. Again, as a CTO, you should know that email is not a guaranteed deliverable.


You can't turn system off if customer paid for it. That's the breach of contract.

If you keep customer data after payment period, you're losing money.

So scream test will cost you money.

They decided that this cost is not appropriate payment to compensate for possible reputation loss.

Whether that's a good business decision or not, time will tell.


>You can't turn system off if customer paid for it. That's the breach of contract.

It sounds like they did turn the system off for paying customers though, why would it be any worse to do a scream test a few days before they pulled the rug out?

If it's just a matter of billing cycles (does everyone's billing cycle end on the same day?) , it seems like they could've handled it better. Just give impacted users a prorated and shortened final month (or even given it out for free for goodwill)


Good customer service costs money.

It takes years to build a good reputation and minutes to destroy it.


In this case we’re talking minutes to destroy your reputation for all current and future customers. That must be considered worth it.


The comments made by the company CTO here like opposite of good crisis management.

"We did our best - we have sent 3 emails". I wonder how shitty the product was when sending few emails is their best.

The guy either sounds like a full fleged VC psychopath, or someone very unexperienced.


Ignore for a moment what you think your customers should have done, and look at the actual outcome. Some customers did not know about the shutdown and deletion, and have now lost data. You telling them "well you should have read your email" is not going to satisfy them, even if you think it should.

All you've done is told your customers that their data isn't safe with your service. This was an easily-avoidable "own goal" situation.


> All you've done is told your customers that their data isn't safe with your service.

And not only Influx's current customers, but also their future customers. I really like Influx for my homelab. But with this attitude, I would be really hesitant for a real world production deployment.


What future customers? After seeing this astoundingly terrible behaviour for a company with "DB" in their main product's name, I can't imagine anyone ever making the decision to trust InfluxData again. I know I certainly won't, nor will any company I work for.


Taking the region off line and making the data inaccessible at the same time was a big wrong call. I won’t hash over the other good suggestions here, but I would throw out there you should have turned off all data plans APIs for at least a month before deleting a byte of data. Nothing wakes up the customers attention than everything suddenly failing.

I think the attitude that “you pay us money so you better read every email we send” is at odds with reality.

1) why do you believe a single human being has that email address? As a company of any size I would never assign a human to a vendor email address. Turnover and rogue employee risks are too high for that. Usually these vendor emails are black holes only used to establish the account and recover credentials if needed. Or, it ends up in the hands of a vendor relations person who is more an accountant than engineer. Do you get the emails from GCP and AWS directly in your inbox?

2) because I pay you money I expect the opposite of the relationship you articulated. I’m not here to read your emails. If it’s really important then use my account manager to contact me. Make a phone call. Email is for spam - ESPECIALLY when it’s from a vendor. I view it as “I’m paying you money, vendor, so you need to go out of your way to give me excellent service” not “I’m paying you money so your emails are incredibly important to me I hang on your every dispatch”.

What baffles me is why on earth did you guys delete the data? I get you couldn’t afford to run the region for whatever reason. But you should have retained all the data. Storage isn’t that expensive.


> After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss.

Using billing contacts for this is a mess. The billing contact could be an accounts payable department that will check the invoice against the contract or PO and pay it. It could be an outsourced office that has no idea what a database is. It could be someone who only catches up on email once every few weeks.

What is isn’t in a technical contact who knows what the shutdown of a database means.


I like that Zoom allows to provide a developer email so as a developer, I only receive emails about API changes. I've never missed an email from them.


As a buyer I have come to expect good vendors to design systems so mistakes (my team or yours) don't cost me sleep or you business.[3] i.e.

- they do soft-deletes before hard

- have robust access control systems and partitioning - so we don't have to give access to everyone in the org to object model with full r/w

- don't instantly nuke the account if a payment goes astray or delayed - try to reach out before to a point of contact before pulling the plug, payment systems can be messy for all sorts of reason, ask before assuming the worst.

- customer managers who can connect couple of times a year which usually benefits the vendor as upsells happens on good % of those connects.

- also small things like training, certification

- Deprecation of service is handled slowly(1 Yr would be expected) and in multiple phases with multiple modes of communication.

Not all companies can move fast to plan and execute a major change in location like this in 4 months, bare minimum you would have to consider

  - End customers (your customer's customers) may need to be notified and may need to sign off

  - Compliance and GDPR DPA changes - both end customers and internal ones

  - DR, BCP concerns have to be planned for , not all GCP regions are equivalent.

  - Documentation and certifications like SoC, ISO, PCI, HIPAA etc usually mean ton of paperwork to modify

  - SRE/Devops may have to move other services along with telemetry on InfluxDB, may need network whitelisting from their  customers, things typically break when moving, need to plan dry runs, rollbacks and so on.
A better way to handle service closure would be to shut down but not delete on the planned date[1] , and offer data export separately for few weeks/month after[2].

You can definitely do better than shutting down service and deleting data at the same time .

[1] I would do this for internal customers let alone external paying ones

[2] You could have even charged for this to offset any costs, most customers wouldn't have a problem paying if they really needed it.

[3] Not trying to imply InfluxDB is doing these things, or isn't a good vendor, these are some criteria I have come to measure new vendors by.


> don't instantly nuke the account if a payment goes astray or delayed

Hetzner deleted my server just one week after my payment due date. My credit card failed the payment for some reason. I didn’t notice this because I was ill with Covid. They sent me one email (or at least, I received only one email) as a warning. I only realized the server was gone when my services stopped working. I’m not sure if such a short warning time is common practice among hosting companies, or if it’s unique to Hetzner.


I've had the exact same experience with them. After 10 years using that server, one payment failed, about ten days and they nuked the machine. German efficiency I suppose.


I has an almost identical experience with https://virmach.com/. I will never recommend them.

After 5 years, they deleted everything 2 weeks after the first payment failure.

Sure it was the cheapest VPS. But still, you don't just delete your customers' data.

I was away from emails and the service during those two weeks. As far as I can tell it might have been some race condition in their payement processing system. They couldn't figure it out. They had no backup. They refused to reinstate the service anew to restore my own backup.


It doesn't look too far fetched from their point of view, they saw a payment failure and they may have assumed that you decided to stop paying and didn't bother to send a cancellation request.

It looks like you had your own backup, which is always a good idea, hopefully you were able to restore your data elsewhere.


I've been 3 weeks late for a Hetzner payment (also for medical reasons) more than once, and my servers are still running. They sent several emails, one was a reminder to pay and another was a warning about what date they would shut down service. So I guess their notice system isn't as straightforward as one week for everyone.

Perhaps it's because I pay for several bare metal servers, or because I have a business account with them. Perhaps it's because I pay their invoices by bank transfer manually instead of by credit card. Who knows! You have made me wary of changing to a credit card now, because those do fail from time to time!

What worries me more is Hetzner's reputation for suddenly dropping customers with no warning and no way to retrieve data from the servers. That's always on the back of my mind.


> Hetzner deleted my server just one week after my payment due date.

Strange. I've been at least eight days late with a VPS payment at Hetzner (3 euro) and the server is still up.


Some companies dont mark you as a debtor if you are under a certain threshold (say 10 dollars), because the cost of processing this unpaid amount is not worth their time.

Also it is smart to have the threshold set to at least 1 cent, because this way you dont ask someone to pay you is supposed to pay you a fraction of a cent due to some rounding error. There are those stories where a company sends you a registered mail, where they ask you to repay a fraction of a cent - what is impossible. Also the cost of the letter (snail mail) made it not worth it. Even if you get an email that is "free" you cant pay 0.0001 cent. I mean you can pay a whole one and then ask to get 0.999 back - the time required by the bookkeeper to process it, then pay it (probably with a fee) is not worth it.


> After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss.

I can empathize with this, but also would expect a good product organization to consider failure modes here and work around them.

Did anyone consider that bob@company.com left months ago, but since autopay still works, no one considered potential problems?

Did anyone consider Bob in accounting is paying the bill but ignores email that doesn’t have “balance due” in it?

… and a million other scenarios that are quite likely and need consideration.


> Did anyone consider that bob@company.com left months ago, but since autopay still works, no one considered potential problems?

Everyone knows it happens, meanwhile every single company with high turnover is like this (those I've had personal encounters with):

Datadog: our domain has changed, but I cannot change my login. I've changed the email address in my profile, but I'm not sure if my login (which is an email address) is just a name or it may be used as an email address in some context.

Intuit: good luck changing your name

Apple Developer: still addresses the account as Bob No-Longer-Working-Here. It's not very clear how to change that name.

Apple ID: no, you cannot change the email address that had been primary back when you created it. And it better be a valid email address.

Orange: my address has changed twice, they are aware of that, they swore they updated my address everywhere; the invoices are coming with my old old address in their headers despite everything. Good thing they are sending them electronically at least so at least I receive them.

You likely can change the data there if you really need to, but it's very involved.

Someone should tell the IT/CRM drones that sometimes people not only leave the company, but also get incapacitated or die. In their Teletubby universes it doesn't seem to happen to anyone ever.


Wow.

You've literally just told the world "you shouldn't rely on us for your data. When our business needs to drop you, we will and you might not receive notice."


I find your tone here quite condescending. We've never received those mails you've mentioned and you make it out like it's our fault that we didn't react. I mean, you managed to send us marketing mails in april and failed to mention you're gonna discontinue the service. So much for honest effort...


> ... if people can’t respond to vendors they pay money to, we’re really at a loss.

You're hurting your company's reputation by denigrating your customers like that


I know that this is a stressful time and it’s all hindsight, but there are two different contact methods that don’t rely on email available to you in migrations like this:

Going read-only, waiting 2 weeks, and then deleting. The contact method is peoples’ alerting systems as writes stop working.

Putting a message on your service dashboard indicating the upcoming action. The contact method is exactly what it sounds like, and it’s the only other place you can stick text and know for sure all your customers can access it.

It will probably help customer relations if you don’t hide behind the defense of only having email - there are a few strategies for this that you can use in the future. Best of luck on the road ahead - I know this must be a particularly stressful time.


>After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss

This is going to sound counterintuitive, like the Birthday Problem or Bayes Rule but at least for me it's true: most of the spam that gets into my inbox is from vendors I have a relationship with. Email isn't always ideal. Did you consider doing any other methods, like turning off writes a day or so before reads were disabled? That would trigger a much more immediate "oh shit" response than an email (unless the subject line is super clear and informative, and the email doesn't go to my span folder)


> if people can’t respond to vendors they pay money to, we’re really at a loss

No, you're legally obliged to keep the service running. They are paying customers and even if you can't reach them through email there are other means of communications. If a business fails on account of your one-sided deleting of the data then you're going to be in for a very hard time, for instance a damage claim for gross negligence and breach of contract. This isn't just going to blow over. The onus for reliable communications is on you and if the channel you've got fails then you seek another one.


I once worked for a company that after a merger had a bank account running just to pay services it didn't know what they were for.

If you grow, have a merger, people moving, it is easy for email addresses to no longer be read. Yes there are best practices to prevent this, but most companies I have seen don't do that.

In one company important emails went to the email address (private!) of the founder, who left after M&A.


Well I guess all the bad publicity was worth it for the "business"


Should get a phone number too. Shit like that happens. Next time lay down the law to the marketing drones from UX that want to "reduce friction".

Explain carefully so their thick minds will understand that NOBODY is lazy enough to quit subscribing to your service just because you added an additional field to your onboarding.


> Should get a phone number too

Which most people would be reluctant to provide because everyone hates sales spam, and what else would a SaaS need your phone number for in regular times (impeding data deletion is a good one)? On HN making a phone number for a signup is regularly criticised.


A UDP email contact is a very unreliable and careless way of making communication.

Please make sure you can implement a TCP-type communication with your customers for this kind of critical movement.


At my company we used to only ask for email address. No names, no phone numbers, no mailing addresses. Because we understood that technical people don't like spam and don't like to give out their data. So we don't ask for them.

We also didn't send any reminders for them to check whether their email address was up to date. No account update reminders. To prevent annoying people with spam.

So other than sending emails and hoping that they read it, there was nothing else we could do.


> So other than sending emails and hoping that they read it, there was nothing else we could do.

but there are other ways. you can put a big red popup that can only be dismissed by typing "i agree" when the customer logins, you can put the service into read only mode, even with email you can send daily reminders for the last 30 days with a subject like "your data will be deleted in 21 days", etc there are so many things that could have been done.


What we sold was software that customers deploy locally. We don't have any of their data. But the software would stop working if their license is no longer valid, resulting in downtime. That already made people angry enough.

Now we have changed it so that the software never turns off even if license has expired (though it will continue to nag an email address). Updates also cannot be installed.


Why do customers believe that they don’t need to read their emails?


I missed an email that a train I was scheduled for had changed schedules on a trip earlier in the year. Why? Because Eurostar sends me maybe weekly email marketing messages that get filtered to one of my Google tabs because I maybe take them every couple years. It's probably unreasonable to expect that I'll see a reasonably last minute update though I'm not sure what a good alternative is.

I get probably 100+ emails a day that hit my inbox in some form and occasionally fairly important ones get mixed in with the mostly dross though Gmail does a pretty good job overall.


At least you got a notification your schedule had changed… I was just 5-6 hours late.

Had the same issue with amazon though. In the flood of “Information about your order” emails one had some slightly different content (but the same subject): “We haven’t received your entire return. Please contact us in 14 days or we’ll trash it and charge you.”

When I contacted them a month later I was not very pleased.


Because the suppliers have abused this form of communication to the point it's not useful for serious communication anymore. I cant read you 1000 marketing emails to find out that one single important service related one.


Lmao are you serious? What about emails buried in spam? What if contact x left the company and the emails are black holes? There are a million valid reasons for emails to go poof. "But we emailed you" is weak.


You actually need to read your spam to check and if a company didn’t bother transition an employee out properly (i.e figure out what their email address was attached to), why is that on the supplier?

Why do they need to move mountains so that you can avoid any seriousness about your own operations?


If a vendor can't properly notify me of major changes, I'm going to find a different vendor.

I have far bigger fish to fry than monitoring my inbox for shitty practices.


Because their reputation is the most important asset a cloud provider has. You're asking customers to run their business on your computers, after all.

Deleting their data first and then complaining that your customers don't run a serious enough operation is not the way to keep the best reputation.


Because I’m the customer.

> Why do they need to move mountains so that you can avoid any seriousness about your own operations?

You won’t stay in business for long with that attitude.


I guess you have never directly worked with a client and hope it stays that way.

This kind of attitude will ensure you lose your customers.


This attitude you have is not appropriate for a vendor. Clients do not care that you think your substandard practices are fair, they will find someone else with actual "seriousness."


I get slammed with emails. In fact, nearly all of my email is automated content or junk. I would hope that these emails would catch my attention, but I can easily see how'd I'd miss them.

Further, they might be going to some alias/group that's not frequently monitored. If a vendor is going to delete all of my data, I expect way more noise than 3 random emails blasts.


Because it got marked as spam?

Because it's burried among lots of other similar looking emails that's just marketing garbage?

Because the email wasn't delivered?

Because the customer was on vacation and they expect a reply very quickly?

There's plenty of valid reasons to miss an email.


Because if a company has 20000 engineers which one is the one that gets the vendors email? Answer is usually none and the email to that address goes to /dev/null.

Or suppose an employee did have the email on file and left the company.

Or suppose people assume vendor email are spam because they’re almost always spam.


What? Each and everyone of our suppliers have a dedicate address their notification are sent to. Those automatically goes into the service desk as a ticket and is read by the service desk team, which can either correct billing information if required, or escalate to the correct team if action is require or there's any doubt about the content of the email.

If you have 20.000 engineers (or even 200) you have a functional service desk and I assure you that no individual engineers email is given as the email for vendors to contact. Even for large contract where you have a preferred contact on each end, there's an escalation path.


Congratulations, but this is not how it works in many places. If you're providing services to companies you have to account for that.


Really? I mean good on you, but I’ve never seen such an arrangement in my 30 years of working in various mega corps. Usually they end up in the hands of vendor relations, but typically in no one’s hands and the expectation is the vendor works with us through TAMs. Most companies have their vendor relationship model based around negotiated licensing agreements and software delivery, and the saas delivery model is fairly recent. At a certain scale and age these things are pretty hard to change, so things like this aren’t well accounted for. It gets more complex when we have a federated model where we have one global relationship with the vendor but teams use the saas individually. Then the email address on the account is subsumed either in some automation or onboarding process used to ensure no engineer has the ability to reset credentials unilaterally.

Your model is a smart one. It’s smart enough it tells me you’re either a small company or a newer company, or both, or a company with a rarely together vendor management team.


Are you trolling or never used email?


[flagged]


Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.


[flagged]


You can't attack another user like this here. If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it. Note this:

"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."

Plenty of past explanation here:

https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...


> we made our best efforts to notify affected users

You call three emails (the last of which was almost 2 months ago) "best efforts"?

I had to read your message three times because this is so reality-defying preposterous I just couldn't believe I didn't miss anything. How about warnings on the dashboard? How about an intentional error (or limited service interruption) so that people would log in to their dashboard?


> How about warnings on the dashboard?

They did have a warning on the dashboard, problem is a lot of people don't check the dashboard because they don't need to, as they just view everything through grafana, etc.

They also had a notice on the status page


Getting those people's attention would be what the intentional errors are for.


Hi Paul, email is one-way communication and not guaranteed to be delivered. At a minimum you should have monitored who did and did not respond to the email with some kind of action and those that did not should have more effort expended to be able to reach them. Finally, you should have kept the data for a reasonable amount of time (say 90 days) post shut-down so users that did not get the notification could download it. What you've done is super rude and if I were still a customer in an unaffected region it would definitely be reason enough to leave because it's pointless to sit and wait to see how you'll deal with my data when the time comes. Better to preempt that and leave while I still have control.


Paul, are you actually for real right now? Did you really just say "We deleted all your data, and its your fault. We did whisper into the wind three times, you should have heard it. No, there is no chance of recovery"?

You might have literally deleted people's whole businesses, companies, who employ real people, who have families, now need to figure out how to continue. Not least of which, your own. If the company survives until Christmas I will be shocked; no one can trust your company ever again - your core business is storing other people's data, and you deleted it, for many, completely without warning.

I guess people still use Mongo even after finding it doesn't achieve any property of the CAP theorem, maybe some people will keep using a database provider with a track record of intentionally deleting their paying customers' data.

There just aren't enough adjectives for astonishment to adequately describe this situation.

I hope you offer Jay Clifford some support, he's clearly been put in the awful situation of having to explain the decisions of others and deliver the awful news. If I were him, I would be in need of serious mental health support, this is an absolutely awful thing to have responsibility for without any ability to rectify.


Contrary to the majority of the thread here, I find this to be an architectural issue. For whatever reason the system was designed without a way to communicate important service and maintenance issues to the customer. That’s part of the good architectural design of a system – it must include human factors, communication among them.


I’ve worked at companies that aren’t even in the tech sector with less than 10 people and brownouts were SOP.

This is just regular old incompetence/negligence/greed.


Multiple comments in the linked issue report not receiving an email.

Did you use the same email you use for spam/"marketing" for this notification?

The correct course of action is to shutdown the service and give people time to fetch data, not to erase the data as the first indication of shutdown.

A few emails are not sufficient if the end result is dataloss, a comment in documentation or release notes is not sufficient (the only reference at least one person in the referenced issue found).

truly mind blowing behavior.


Former Belgium user here. Checked my inbox, no emails from Influx since June 2022.

Then again, I was only using the free tier, so I guess I got what I paid for.


Why did you feel need to send 3 emails, and not just 1? Is it because you find emails not reliable enough?


> I realize that it's not ideal that we've shut down this system

Not ideal???

You backed up everyone's DB and moved that to another region so they can just restore and change DB endpoints, right?

I don't believe that someone along the chain didn't suggest a scream test or similar. If they did, they must have been ignored.


If you are responsible for this the very least you can do is own up to it and apologize.

Trying to assert that you were doing what you thought was right only presents the image that your company is run poorly.

The correct thing to do is to admit that your best efforts were not aligned with best practices, and look into remediation.

Not “well, we tried”


You could have just responded with ¯\_(ツ)_/¯ and saved a lot of typing.


Lol great attitude. why would anyone pay money to any company you manage now or in the future if you deliberately trash user data and justify it with 'but we emailed you a couple of times first'


Then the title is misleading. Not everyone saw the email, many expected more, e.g. a scream test.


Three emails are not "best efforts". 4.5 months notice is not "best effort".

My opinion on best effort: I founded, ran, and sold a SaaS company used by some of the most well known companies in the world. Our "best effort" was a minimum of 12 months notice, with a six month grace period afterwards. Emails weekly. Phone calls at least once a month. Reach out to customer leadership if no response. Then scream test as others suggested.


How many times have you said influxdb is about managing the data lifecycle.

It is astonishing that you have literally completely ignored one of the primary USPs of your product.


Judging from this link, if that's your best effort, I'm afraid to know what it's like when you're slacking :)


I should not have to say this, but "best efforts" is not enough and is very offensive to every user still relying on your services. You had a duty of results, not merely "best efforts", to reach every-single-one of your active users before hitting shift-delete on their data.


You did enough to help, just ignore these ungrateful whiners


Unfortunately we’ve been bitten by influx operation issues a few times before. We adopted influxDB long time ago and always had to deal with breaking changes for each upgrade and every time we had an issue their answer would be: upgrade the latest version and see if it continues.

Then recently they made a change to Telegraf that broke all our data collection because they changed the environment variable replacer and their Jsonnet parser broke.

Now this. Shutting down a region without brownouts and only emails is not operationally acceptable.

We’ve moved on from influxDB for a while and only rely on telegraf now.


We self host influxdb, never had this problem.


Which problem? Of the massive breaking changes between 0.8 and later, and then between 1.x and 2.x? Not to mention InfluxQL to Flux?

Also, they did remove clustering in the open source version which was a very poor move from a PR perspective. And in my view, they have never recovered from it - years ago it was Prometheus vs InfluxDB for (non-SaaS) observability metrics, nowadays the only question is which backend for Prometheus to choose.


And now they've released 3.0.0 in their cloud, which they claim is backwards compatible, but lets see.

I sometimes wonder if vendors realize that they put their customers into a buying mode when they do this, when our options are:

- Upgrade to the new version of product X.

- Change to vendor or tooling completely as we're already changing everything.

We might pick another system if we feel like it is more stable.


If it's backwards compatible why would they bump the major version?

Either there should have been a breaking change in there or they don't understand symantic versioning


Not sure why the downvotes here, a database service misusing semantic version is itself a red flag in my opinion. If major releases don't indicate a breaking change I'm not as confident in what might be part of a minor or patch release.


To be fair, they have a bunch of associated tooling like their query language and UI, so it might have been those that have merited a bump to 3.x


Yeah that could be, but I think that would still only merit the bump it there was a breaking change in the public API


> they did remove clustering in the open source version which was a very poor move from a PR perspective. And in my view, they have never recovered from it

I still remember this. We were ready to standardize on InfluxDB when we got a taste of their business practices.


Same, we were just about to choose InfluxDB, and ended up using it only for a niche low criticity scenario (VMware vSphere metrics, mostly for troubleshooting). We were never going to purchase Enterprise though, so they haven't lost anything outside of mindshare and champions, which can be evaluated to between 0 and infinity.


Which Prometheus backend would you recommend?


VictoriaMetrics. Don't have experience with InfluxDB, but had done rudimentary evaluation for popular backend. VictoriaMetrics stood out mainly due to low comparative operational maintenance.


At AWS, the hierarchy of service priorities is crystal clear: Security, Durability, and Availability. In that order. Durability, the assurance that data will not be lost, is a cornerstone of trust, only surpassed by security. Availability, while important, can vary. Different customers have different needs. But security and durability? They're about trust. Lose that, and it's game over. In this regard, InfluxDB has unfortunately dropped the ball.

Deprecation of services is a common occurrence at AWS and many other tech companies. But it's never taken lightly. A mandatory step in this process is analyzing usage logs. We need to ensure customers have transitioned to the alternative. If they haven't, we reach out. We understand why. The idea of simply "nuking" customer data without a viable alternative is unthinkable.

The InfluxDB incident brings to light the ongoing debate around soft vs. hard deletion. It's unacceptable for a hard delete to be the first step in any deprecation process. A clear escalation process is necessary: notify the customer, wait for explicit acknowledgement, disable their APIs for a short period, extend this period if necessary, soft delete for a certain period, notify again, and only then consider a hard delete.

The so-called ["scream test"](https://www.v-wiki.net/scream-test-meaning/) is not a viable strategy for a cloud service provider. Proactive communication and customer engagement are key.

This incident is a wake-up call. It underscores the importance of data durability and effective, respectful customer communication in cloud services and platform teams. Communication is more than three cover-your-ass emails; it's caring about your customers.


> Security, Durability, and Availability. In that order.

The ordering of security and durability very much depends on the needs of the customer.

Some data is vastly more valuable to malicious actors than it is to you, e.g. ephemeral private keys. If lost you can simply replace them, but if (unknowingly) stolen it can be disastrous.

Other data is vastly more valuable to your than to malicious actors, e.g. photos of sentimental events.


> At AWS, the hierarchy of service priorities is crystal clear: Security, Durability, and Availability. In that order. Durability, the assurance that data will not be lost, is a cornerstone of trust, only surpassed by security. Availability, while important, can vary. Different customers have different needs. But security and durability? They're about trust. Lose that, and it's game over. In this regard, InfluxDB has unfortunately dropped the ball.

Interestingly, this is also how I'd allocate tasks to new admins. Like, sure, I'd rather have my load balancers running, but they are stateless and redeploy in a minute. The amount of damage you can do there in less critical environments is entirely acceptable for teaching experiences. Databases or filestores though? Oh boy. I'd rather have someone shadow for a bit first because those are annoying to fix and will always cause unrecoverable loss, even with everything we do against it. Hourly incremental backups still lose up to 59 minutes of data if things go wrong.

> The InfluxDB incident brings to light the ongoing debate around soft vs. hard deletion. It's unacceptable for a hard delete to be the first step in any deprecation process. A clear escalation process is necessary: notify the customer, wait for explicit acknowledgement, disable their APIs for a short period, extend this period if necessary, soft delete for a certain period, notify again, and only then consider a hard delete.

Agreed. At work, I'm pushing that we have two processes: First, we need a process of deprecating a service and migrating customers to better services. This happens entirely at a product management and development level. Here you need to consider the value provided for the customer, how to provide it differently - better - and how to decide to fire customers if necessary. And afterwards, you need a good controlled process to migrate customers to the new services, ideally supported by customer support or consultants. No one likes change, so at least make their change an improvement and not entirely annoying.

And then, if a system or an environment is not needed anymore, leadership can trigger a second process to actually remove the service. I'm however maintaining that this is a second process which is entirely operational between support, operations and account management. It's their job to validate the system is load-free (I like the electricians term here), or that we're willing to accept dropping that load. And even then, if we just see a bunch of health checks on the systems by customers, you always do a scream test at that point and shut it down for a week, or cut DNS or such. And only then you drop it.

It's very, very careful, I'm aware. But it's happened 3-4 times already that a large customer suddenly was like "Oh no we forgot thingy X and now things are on fire and peeps internally are sharpening knifes for the meeting, do anything!" And you'd be surprised how much goodwill and trust you can get as a vendor by being able to bring back that thing in a few minutes. Even if you have to burn it then to turn up the heat to get them off of that service, since it'll be around forever otherwise.


Wow, the incredibly callous 3-word explanation of the issue by pointing to a docs link with no other context. Really gives off "it's your fault for not reading the wiki." Is this how InfluxDB treats their customers?

Incidentally at work we've been evaluating a new hosted observability provider, looks like we can rule out Influx as an option.


> Really gives off "it's your fault for not reading the wiki." Is this how InfluxDB treats their customers?

I don't see any indication that the person who posted that is associated with InfluxDB. In fact, it doesn't seem like any staff member has posted in that forum in the past week. Up to you if you consider that better or worse.


It's weekend and I assume people designated to answer questions on forums do not work 24/7.

It could be them not sending notification about it early enough or at all, it could be that note being stuck in spam or something, it could be person complaining not reading the e-mails.

I wouldn't jump to conclusions here


I used Influx Enterprise at a previous startup. Support was so bad after our first year we switched back to the OSS Influx and added HAProxy with manual replication and round robin load balanced. Was so much smoother, wish we had done that from the start since that was our original plan.


5% interest rate is breaking the tech companies. If you are dependent on a SAAS service for your infra, ensure that it is either

  - self-hosted
  - provided by a big deep-pocketed cloud infra
Otherwise, the service might shut down with a 30-day or so notice.


Hard to reverse actions need multiple safety switches, for example, turning off the machines in that region for 2 weeks before deleting them, which would bring support issues to attention ahead of the no-going-back step of deleting data.


So many easy ways that this could have been avoided. sigh

- Phone calls

- Scream tests

- Monitor services still in use. Contact these customers individually

- ...

Not a single individual said "Gee, people are still using that DC, should we really destroy it?"

Either this shows Influx is really naive and inexperienced or... they are in deep trouble cash-wise and were working in panic mode to cut costs.


> According to the support, the notification emails to the users were sent on Feb 23, Apr 6 and May 15th. However, we did not receive those at all.

If true, this is concerning. One message getting lost in spam understandable. But three over 6 months would imply they're being blacklisted and/or their mail sender is simply broken.

Do serious companies not have canaries or other checks in place to ensure their notifications are correctly delivered to customers?


If a spam service erroneously marks one email as spam, chances are it will also marks other very similar emails as spam. So it's not too surprising all three were marked.

For these kind of automated emails getting all emails consistently being delivered to everyone is really hard, almost impossible.

The problem here isn't really that emails weren't being delivered, it's that they seem to have tried only one method to contact people, didn't check how successful that was (e.g. by seeing how many customers were still on those regions), and seemingly never tried anything else (such as notifications on the dashboard, a temporary brown-out to alert people, etc.) – "we tried one way to contact you and that didn't work, so we just deleted your service sucks to be you lol kthxfuckitybye"


It could end up in the promotions tab or update tab that no one checks.


If you send out spammy notices, yep.


Even if all three emails were properly delivered, that is not sufficient notice for a storage service. Why is there not also a reminder on the dashboard?

A financial service I use was recently purchased by another. The company has been aggressive in keeping me on the loop about what is upcoming. Maybe six months before the actual event a heads up. Again at two months. Then at one month. Then every week, along with countdowns to the deadline. "Are you ready? This is really happening. Here are relevant docs on how to ensure your transition goes smoothly."


Even the dashboard isn't really enough... if I'm running a one-off application or many, I may not log into every dashboard for every single thing regularly. A scream test would have been most appropriate, combined with a backup and at least 30 days retention for migration.


> Do serious companies not have canaries or other checks in place to ensure their notifications are correctly delivered to customers?

Even with a canary, you have no guarantee that a particular customer will receive your email. In this case, maximum CYA is a certified letter sent return receipt requested. There are services out there to mass mail those.


Mature ops use brownouts for this exact reason. Shut the service off for a few hours and you have the attention of anyone still using the service.


A check on the expected migration progress and usage patterns in the region should have also rung a bell.


Customers ignore emails newsflash. For our SASS service it isn’t uncommon for customers who are administrative contacts and technical contacts to spam their emails and/or generally ignore emails on all varieties of topics. If they’ve supplied valid phone numbers typically they don’t work either.


But three over 6 months would imply they're being blacklisted and/or their mail sender is simply broken.

Plus some enterprises change their mail policy out of nowhere. I have a lot of subscribers from enterprise companies, but sometimes every mail delivered to some company comes back with something like "Your address is not on our whitelist. We only accept approved emails." and every subscriber is hard bounced off the list and that's that. (They aren't paying me, though, so less of a big deal.)


It's tough to believe that the turning off of the service couldn't have involved at least a week of 'soak' time, where if you contacted them they then helped you move to another location. After all the additional cost/benefit ratio of keeping the VMs around, but not using CPU or bandwidth vs. retaining a few customers would indicate it is the right thing to do for both the customers and the business.


I've seen better communications around company-internal services that have been deprecated and for which a replacement exists that we need to migrate to. Heck, I've seen this a couple of times.

We had even tried out Influx a few different times. It was always ok, but never quite good enough. Now with this, I think, this seals the deal on me ever considering Influx either as a product or as a service.


>I've seen better communications around company-internal services

Our team maintains an internal CRM. When we plan to delete data or deprecate features, what we usually do (beyond sending emails):

- hide features/data from the UI without actually deleting them - if no one complains, after a few weeks, proceed with the removal

- for critical data, make sure there are backups, store them for about a month - if no one requests them, delete them


What did you choose over Influx?


GP here.

First time: we chose Timescale over Influx (and a few other competitors). I really liked Timescale. This was ~2.5 years ago and obviously Timescale was much less mature than today.

We also sent data to AWS Timestream. We're heavy AWS users and got to try out the product. I found it ok, but it was expensive even for us (Disney+).

Second time: team migrated from Influx to Clickhouse for server and network metrics. Services had been relying on that data being correct and timely in order to route traffic, and well...there were issues. The simplest solution was simply to replace Influx with a product more suited to handling high volumes of metrics...yeah, ironic.


(not GP)

I've been bitten by the old Influx and had to migrate to something we could trust... (Influx basically tacitly admitted that the original architecture was pretty poo and they've since swapped out the engine (twice?), but it smells a bit like mongodb trying to reinvent itself and distance itself from all the early web-scale claims, so I'm kinda skeptical).

So I've rolled our own with MySQL+tokudb, but that's not a good choice for a new system as tokudb is disappearing. When I tried to migrate to MyRocks we discovered the newer kids like MyRocks don't really work nearly so well for specifically this kind of use-case.

Something I haven't personally tried, but have heard rave reviews of, is Timescale. Its a special storage engine for Postgres and it has a lot of nice features like auto-maintained rollups. And they have lots of deep technical blogposts that I find myself agreeing with, so it must be good! :D


I use Timescale and I can recommend it. The reason why I'm still using influx too (1.7) is because it's unmatched in its data storage efficiency and query performance. You can get close with Timescale, but its main power is having the query power of PostgreSQL, if you have room for the extra hardware resources it requires.


(My memory is that we kept on influx including 1.7, but it was a while ago now so memory might be fuzzy)

I guess influx perf and efficiency is really depending a lot on your data shape then :)

Our experience was that performance dropped off a cliff if you had too much data, too much tagset cardinality, or else your query was too broad. And when it failed, it lost data.

In fact, it lost data generally. When we were replacing it we dual-ran an ACID DB version (which, with tokudb, was fast enough to keep up (although we didn't index every tag column)). So we did a diff and discovered just small random holes in the influx data that we'd never noticed before.

We had other considerations when we went mysql, as in, we were already using it. If shopping for a standalone solution to start a new project on, I'm thinking Timescale is the go-to these days?


I’ve been self-hosting InfluxDB in the hundreds of GB range for several years. I wouldn’t say I’m super happy with it, but… let’s say we’ve reached an understanding, the software and I. We’re on the latest patch of 1.8 and content to stay there.

I agree with GP about storage efficiency, which is superb. Query performance is good as long as a single query doesn’t deal with more than ~dozens of series. And $deity help you if you want to do hourly roll-ups of all series for a short time range, as RAM usage is wildly unpredictable. Storage is optimized for long reads of a single series, not for short reads of many series (but in fairness, you have to choose one or the other, that’s just the physics of the thing).

If I were starting from scratch, I’d probably pick Timescale. Or maybe DuckDB… I wonder if it would work for our use case.


> And $deity help you if you want to do hourly roll-ups of all series for a short time range, as RAM usage is wildly unpredictable.

I think I went properly mad while trying to troubleshoot this. The same query sometimes pulls 5GB, sometimes 20GB, sometimes 50GB and sometimes OOMs at 200GB memory pulled beyond base load of the system. And there's no query planner, no execution log, no metrics to help you. And most documentation or threads about it can be summarized as "well sucks to be you, eh? Maybe less data would be an option I guess"

We don't do that anymore and just roll up a very small select number of metrics.

And yeah, we've committed to Postgres as our main DB 2 years ago or so, and currently time is clearing up to start work with TimescaleDB. Zabbix is supposed to be great with it.


I have pretty much exactly the same experience.

However, I do feel that they are trying to really do the right thing with the new 3.0 architecture, addressing the deficiencies (most importantly performance and full-fledged SQL) while keeping the stuff that works (InfluxQL for simple and legacy queries). Also leveraging open-source projects and contributing to their upstream is a plus. Thus I’m hoping for them to succeed delivering on that promise.


Agreed, embracing battle-hardened open source tech will be a win for them and for customers.

However, once your storage layer is parquet and your query layer is SQL, well... DuckDB is also basically parquet+SQL, and it won't be long before there's a nice Postgres wire protocol adapter in front of it. What's the advantage of continuing to use InfluxDB if you don't need clustering or HA?


It's unusual to read that InfluxDB is fast and efficient. Did you try VictoriaMetrics? It usually needs 10x less RAM than InfluxDB for the same workload, especially when the number of active time series is high. It also uses less CPU and disk space on the same production workload. [1]

[1] https://valyala.medium.com/insert-benchmarks-with-inch-influ...


I've heard of VictoriaMetrics before but haven't had time to play with it. InfluxDB is also now ingrained in a production system so replacing it is not straightforward. The query language is also different meaning everything that uses it will need to be updated too, and coming from a mainly SQL background, PromSQL/MetricsQL looks oddly weird.


Agreed that PromQL and MetricsQL have limited querying capabilities comparing to SQL or InfluxQL. But they cover the most of use cases for analyzing time series measurements, and allow writing much simpler queries than InfluxQL or Flux for these particular cases [1].

[1] https://valyala.medium.com/promql-tutorial-for-beginners-9ab...


I like Influx, and have been using it for all our TSDB needs for ~7 years. It sounds like there have been a lot of shake-ups internally, all their hopes for Flux (the InfluxQL language) have been aborted. I'm wondering about how they're doing.

This situation doesn't seem like it is specifically them doing anything wrong, if indeed they did send out multiple notifications over 6 months. It sounds like it's caught many people by surprise though, which makes one wonder if there was a problem with their announcements of this change. Definitely seems like it could have been handled better than "deletion with no recovery" though, some sort of "shutdown, wait a week or month, delete" would have been better.


3 emails over six months is pretty lousy notification rate. At minimum, something sent the month of the shutdown would be appropriate. More ideally, would be an aggressive notification push of increasing frequency the closer to the shutdown period. Was there any notification of the shutdown in the admin panels? The person with the ability to migrate data may not be the person receiving the messages.

EDIT: Also, does the company have a Sales team? Why is that not a top line item for representative to contact those who have active service in the region to emphasize the shutdown? Or was the Sales team similarly in the dark about the messaging?


I would be expecting hourly notifications for the last 7 days if traffic is still detected.

I literally get more notification for DNS expirations (and those are scheduled and predictable).


It is pretty amazing. Business of storing other people's data, and evidently notice of deletion is treated as a non-event. Instead of a Huge Big Deal that should be the top line communication to impacted customers until a migration plan has been enacted.


I received a response from their support and it’s hillarious.

TLDR: “We don’t plan to delete your data again in the forseeable future”

Full quote: “You can sign up for a new account here InfluxDB Cloud using a different region. We want to assure you that there are no more scheduled shutdowns planned. Therefore, once you have created the new account and begin writing to it, we do not foresee any data loss going forward”

No more information was provided in the mail.


This is just unbelievable. And plans do have a habit to change unexpectedly. There are certain red lines that a data storage company should never cross. Total loss of credibility going forward, at least until fresh, unaware users flock in. Anyway, I’m migrating all workloads from the TICK stack, for I can’t fathom the consequences for InfluxDB OSS, particularly if they now lose some of their funding, given their expenditures on the rewrite for InfluxDB 3, for example. Lightbend Akka caused enough trouble already.


This could have been easily mitigated with a giant red ugly banner "YOUR DATA WILL BE LOST IN X DAYS. MIGRATE NOW".

Three emails clearly wasn't enough right? Now their name is in the dumps, customers are pissed and my only exposure to influxdb is a negative one.

I hope other saas guys learn this very expensive lesson.


According to other comments, apparently there was a banner, but it was tiny and green instead. Pure genious.


Amazing


Why would you even need to log in if your service is running swimmingly? I run some things off of Backblaze and I haven't seen their console in over 2 years.


What a terrible start for InfluxDB 3. And that incomprehensible justification on their side… what a disappointment… I’ve been anticipating InfluxDB 3 going GA later this year and was just about to subscribe to their cloud offering since they’re making it available only over there presently. And I was going to migrate more workflows to the TICK stack. But they’ve just nuked their credibility in my eyes. Hope they can still recoup the dev costs for InfluxDB 3, but I’m now going to be very cautious about that company going forward. Hope Influx OSS remains viable. Inconceivable…


A lesson for us and senior managements of different companies, considering cloud hosting. If you are saving money by moving things away from on-premise or using a managed service to reduce employee costs, then you need to factor in these disaster scenarios and have procedures in place. If the senior management folks at time only thought about cost savings and not business continuity, then the blame of this fiasco should also go on their heads, as they took credit for saving money and now is time to take credit for data loss.


A total of 3 emails sent, lol.

Hope they didn’t have any big corp customers impacted. Some big corps would very easily use that to cancel contracts and void out payments. Then let the lawyers deal with it


For cases like this email is not enough. You need to do trial shutdowns where you disable access to the db unless someone acknowledges via web. Do it for maybe 24 hours so you know you catch everyone.


Well that sucks big time. Thankfully we switched from InfluxDB to QuestDB, and this news here just make me more convict that we made the right call.

We have a netflow analyzer with more than 350 b2b customers (regional ISPs) and we use to run InfluxDB as TSDB. A few things were bugging us though:

1) influxdb is being rewrited for the third time in less than 5 years;

2) v2 in hindsight was actually a downgrade from v1 in regards of performance (e.g: drop shard mechanism);

3) will they finally solve cardinality in v3? that was a major issue to be solved by v2...

I was just not confident sticking with influxDB. Thankfully another OSS project really surprised me in terms of performance and reliability, which is questdb.

Now we migrated more than 100 of our base and hopefully we will get all migrations done by the end of the year.

For my use case, there's one feature left to be add which is the inet type, in order to store IP addresses more efficiently.


This seems really poor. They should have been able to see what customers are still using the service in that region, send them additional reminders that they need to get off, and only after several of those and a grace period remove access. Maybe even have a phase of read-only access before full removal.


How about: "How odd, many haven't migrated yet, shouldn't we doublecheck to see if they are aware of this shutdown?"


Wow. That's amazingly bad.


What I really don't get is why do people even put their data into these types of cloud services which have no proven track record?

What value proposition were they providing that is so great that you were willing to risk your critical data by putting it into their service when they have done nothing to prove that they are worthy of your trust? what kind of thought process leads to this kind of decisions? someone, enlighten me. I really want to understand.


https://community.influxdata.com/t/getting-weird-results-fro...

>Unfortunately @RoyalBlock at this time it does look like the data is lost for Sydney. We are still looking into it but I am 90% sure this is sadly the case.

https://community.influxdata.com/t/getting-weird-results-fro...

>Data recovery is still in process. If you where part of the Belgium customer you will receive the last 90 days of your data. Support will email you directly once this process is ready.

Better than nothing for the Belgium tenants, I suppose.


I worked at a wireless telecom company 10 years ago. The local switches did not have a consistent decom procedure. One switch removed two routers and had it on a truck to leave that was used for a critical service all within a few hours. It would have taken a week or longer to fix if we were not able to get it off the truck and put back into the rack. We then forced all regions to have a 10 day policy of powering off a device 10 days before they remove any cabling or the device it self.


The link to Slack in this post by "developer advocate" Jay Clifford is a rabbit hole worth diving into a bit: https://community.influxdata.com/t/getting-weird-results-fro...

Find your way into that Slack workspace via https://www.influxdata.com/blog/introducing-our-new-influxda... and you'll discover some interesting things:

* That the only reason being given in that thread for the shutdown is "The region did not get enough usage or growth to make it economically viable to operate, so it became necessary for InfluxData to discontinue service in those regions.". i.e. there's no regulatory issue here like other answers speculated - just pure cost-cutting.

* That on July 5th, a couple of hours before they started shutting everything down (based on the shutdown timeline at https://status.influxdata.com/), that same "Developer Advocate" announced that they were suspending their live "office hours" sessions for July.

* Multiple people are asking for help after finding that they can't connect and getting ABSOLUTELY NOTHING in the way of support from the company. It's literally falling to _other users_ to tell them that all their data is gone.

* One person chiming in, Matthew Allen, DID have a colleague who saw the notification email, but notes that...

- it was a pain for him to migrate has data to a different region due to InfluxDB Cloud's rate limiting, but that he did it anyway

- ... but that the documented migration process doesn't seem to have worked properly anyway (given that some of his data points have ended up as nulls)

- plus on top of all of THAT, even after doing everything he was supposed to do, he still can't log into influx cloud after the migration because when he logs in he gets automatically redirected to the no-longer-existing cluster and hits an error screen

What a clusterfuck. Shame on everyone from Influx who had a hand in this - the CTO Paul Dix who's turned up here on Hacker News to blame his customers for Influx's negligence, Jay Clifford the Developer Advocate for a spectacular failure of developer advocacy, and anyone else on the team who was close to this and didn't push for brownouts, blog posts, a mention in the newsletter, retention of data for some window after the final shutdown date, or any of the other obvious measures that could've made this less of a catastrophe. The multiple people noting that they were able to receive the company's newsletter but did not receive the notification that it was about to delete all their data tell me everything I need to know about this company and its priorities. I will never willingly do business with them again (although we used them at a startup I worked at once) and will advocate against any time I hear a colleague suggest using them.


Don't blame the developer advocate for the company's mistakes. Developer advocates rarely have any part in these kinds of decisions, but are left holding the bag whenever they go badly.


I can understand the outrage about deleted data, but has anyone figured out why they are shutting down in Belgium?


What I don't understand is why they have not auto migrated all customer data to the closest region. Why ask the customers to do this migration instead of transparently handling it for them? yes the cost may be different but this was their decision, they should absorb these costs until the customers decide.


Auto migrating data to the closest region cannot be done without legal implications if that region is another country.


99% of cases there are no legal implication of moving data in another country inside EU. Amsterdam is the closest in Geographic and Political sense. These are cloud data. already subject to EU Policy and not country specific. Those that really have the country condition can not even use a cloud provider but need to rent/collocate servers (not services) in the country.


Last time I checked, Sydney is not in the EU.


Sydney is not but this thread (at least the title) was for europe-west1 in Belgium which is in EU.


Given they shut down (sorry, duscontinued) both Belgium DCs and Oceania DCs, I'm guessing "cost".

I know ap-southeast tends to be pricier, I'm assuming Belgium was the same.


There's a small increase in cost compared to the cheapest US regions, but the European regions are still significantly cheaper than the expensive Australia/Brazil regions.

https://www.concurrencylabs.com/blog/choose-your-aws-region-...


Yeah, for AWS, but the European region they closed down was GCP, I'm not sure how the pricing compares.


I haven’t seen a reason yet but I suspect they’re reducing the regions they’re hosting in to save costs.


I suggest designing APIs so that they can return warnings and still return useful data, as opposed to only success with data, or no errors. (https://docs.influxdata.com/influxdb/cloud/api/#tag/Response...)

Out-of-band data, like emails, is bound to be ignored by some.

I'd incorporate warnings in the application-specific responses; you can also return a different response code to make sure many clients do not blindly ignore warnings. The HTTP standard includes a 207 Multi-Status response code, which is mostly used with WebDAV.


The funniest thing about this to me is how they officially responded that if you want to find out the reason why this happened, you can look it up on their Slack chat.


Similar reports have also appeared on the InfluxDB Slack.


I wonder if they’re open in other EU regions? If you wanted to shut down a region, as a database provider, is it even possible to send snapshots to another region without user consent?

It feels like that could be a good practice, or not, depending on the laws in question.


Their migration guides does include help for customers to migrate data from one region to another EU region (Frankfurt).

They might not have been able to do it automatically as the region name were hardcoded in the hostname.


For anyone interested, Self-Hosted InfluxDB3.0 IOx builds & containers are available here: https://github.com/metrico/iox-community


Reminds me of the time my bank got bought and send me a single email they would shut down my account in x days….


Do the affected customers have legal recourse to get back their data AND a hefty compensation for the downtime?


Imagine if github was fired upon from the Death Star. The disturbance of the force would be epic as millions of voices suddenly cried out in terror and were suddenly silenced.


Luckily git is actually distributed underneath. Unluckily the issues aren't in the git repo.


utterly categorical communication failure. wow.


It's almost like using bullshit hype-hype-hype databases, especially their cloud offerings, is a horrible idea.


Nobody is complaining about the database, they're complaining about poor communication. It could happen with anything that depends on a 3rd party.



yeah no, the people complaining about their deleted data clearly wanted to keep using it - so much so, they didn't expect anything close to migrating away to another location




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: