InfluxDB Cloud shuts down in Belgium; some weren't notified before data deletion

StopHammoTime · on July 10, 2023

It seems this probably happened due to some regulation or other. The sunset date for the service should have been a month prior so that influx could have kept the data legally until the 30th in case of this situation happening.

They wanted to have the euros flowing in right until the last minute.

1. Flash messages on all user facing consoles. 2. No new resource able to be created for 6 mo this. 3. Emails. 4. Service end date should have been at least a month prior to mandatory shut down. 5. Any people still running workloads in May should have had aggressive contact attempts made to ensure they were aware. 6. The console in the region should have switched to a final backup that can be exported by the user or moved to another region. This should have been available for 30 days.

You don’t do this because it’s fun, you do this because you need to save reputation. If I can’t trust you with business critical data then why would I use you for my critical business?

Also, as someone who works for a large enterprise, if you really believe email is the way to inform them of these changes, well I’d reconsider your beliefs.

XCabbage · on July 10, 2023

There's no regulatory consideration involved as far as I can tell. On Slack at https://influxcommunity.slack.com/archives/CH8TV3LJG/p168894... they explain the shutdown thus:

> "The region did not get enough usage or growth to make it economically viable to operate, so it became necessary for InfluxData to discontinue service in those regions."

So it's worse than you believe. Yes, the handling is a scandal for all the reasons you say. But they weren't even pushed into this by some regulatory issue; it's pure cost-cutting.

EdwardDiego · on July 10, 2023

Given they shut down two DCs half a world apart, it's not regulations. It's cost.

miyuru · on July 10, 2023

But its a paid service right? Is it a pricing issue? If it is, isn't it better to increase the price?

steveBK123 · on July 10, 2023

You wonder if we will see more of this from all these high burnrate SaaS startups right? It saves money to shutdown even paid services if they are cashflow negative. The difference between paid services and unpaid only matters if costs are below prices they sell the services for..

shin_lao · on July 10, 2023

It's a cost-cutting measure that reeks of a company trying to cut costs as fast as possible.

tracker1 · on July 10, 2023

They still should have at the very least done a backup of each customer DB in those regions and created an option to download and/or restore to a new region and kept those for at least 30-90 days.

A scream test would have been a better option in addition to the above.

raverbashing · on July 10, 2023

And make no mistake some people will still miss the notification after all these warnings

anileated · on July 10, 2023

Perhaps service shutdown is also the only valid case where it can be okay to intermittently fail API requests?

ikiris · on July 10, 2023

companies generally want to be paid for their costs of holding your data liabilities, yes.

jjgreen · on July 9, 2023

“But look, you found the notice, didn’t you?”

“Yes,” said Arthur, “yes I did. It was on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard.”

dangoodmanUT · on July 10, 2023

It should not be understated how bad this is. Your #1 expectation as a cloud database provider is to keep data safe and recoverable.

I hope for at least their sake they took a backup of everyone's DB that could be restored in another region, but based on the fact that they didn't do a scream test, I doubt they thought about this either.

This must have been forced upon by upper management, because there is no way someone along the chain to actually delete data did not suggest a scream test. No way someone didn't say "this is a terrible idea, email is not reliable".

Adding Influx right next to GCP of providers I'm never using. Self-hosting is the way, and use ClickHouse.

d33 · on July 10, 2023

In case anyone else is wondering:

> The Scream Test is simple – remove it and wait for the screams. If someone screams, put it back. The Scream Test can be applied to any product, service or capability – particularly when there is poor ownership or understanding of it’s importance.

https://www.v-wiki.net/scream-test-meaning/

heyoni · on July 10, 2023

Maybe this is the scream test…just done badly

proctrap · on July 10, 2023

that's not the scream test, that's the nuke test - nuke it and see if anyone complains, if they do, it's already too late

jacquesm · on July 10, 2023

No, it's gone. For it to be a scream test they'd have to be able to retrieve the data somehow and they can not.

influxisdead · on July 10, 2023

I agree. This should be an indication to all current users that they should no longer trust InfluxData with their business.

The CTO seems to have been checked out for a long time (just look at how little developer engagement there is on here) and the CEO seems to have no idea how to run a DBaaS. The fact that nobody else from the company has stepped in to try and defuse this should terrify anyone who has data on InfluxData's cloud.

This is the beginning of the end. It seems like all of the good people have left the company, and being willing to destroy credibility to cut costs is a clear sign that the company is running on fumes.

So, now is the time - find your alternative, whether it's Timescale, QuestDB, VictoriaMetrics, ClickHouse, or just self-hosting.

SkyPuncher · on July 10, 2023

The CTO's blog post is pretty half-assed: https://www.influxdata.com/blog/update-from-influxdata-paul-...

It's the same "we 'tried'" message they have here. Even worse, this wasn't a regulatory shut-down, this was a lack of demand decision. They had 100% control over the timing and means of the shut-down. They didn't even keep backups! They just deleted everything.

Some highlights from the blog. It reads like a "cover my ass" to the board, rather than fixing problems for customers.

* > Over the years, two of the regions did not get enough demand to justify the continuation of those regional services.

  * In other words, they had no external pressure. They just shut this down entirely on their own accord.

* Immediately, blames customers for not seeing notifications. Explaining "how rigorous" their communication was.

* > via our community Slack channel, Support, and forums, we soon realized that our communication did not register with everyone

  * In other words, "we didn't look at any metrics or usage data. How could we have possibly known people were still relying on this?"

* > Our engineering team is looking into whether they can restore the last 100 days of data for GCP Belgium. It appears at this time that for AWS Sydney users, the data is no longer available.

  * That's literally unbelievable. They didn't even keep backups!  They deleted those too! Even it the region is going down, I'd expect backups to be maintained for their SLA.

* Lastly, a waffling "what we could have done better" without any actual commitment to improvement. Insane.

jacquesm · on July 10, 2023

This is pretty much corporate suicide. I really don't understand what they are trying to achieve with this and their attitude in this thread is baffling.

nwallin · on July 10, 2023

I completely agree with you regarding corporate suicide. The rest of my post is complete speculation.

The least nonsense thing I can think of is that they weren't paying their bills. They weren't paying rent, the landlord locked them out and repo'd their servers, or something similar. (perhaps they were inspired by Elon Musk's recent antics?)

If that were the case, they would not disclose that that's what happened. If they disclosed that, all of their other customers would immediately begin migrating their data; not tomorrow, not next week, now.

If there were any excuse they would give it. "We were hacked!" "It was a disgruntled ex-employee!" "The datacenter burned down!" "It's those dirty EU data laws!" etc.

Shutting down the data center and deleting all the data (without migrating) at the same exact time and that was Plan A--nah I don't believe that.

XCabbage · on July 11, 2023

This was announced months in advance (albeit not in a way that could possibly guarantee that most customers would ever discover it) so I don't think your speculation is true. As best I can tell from the information publicly available, they really did shut down the data center and delete all data simply to cut costs with no external push whatsoever.

jkaplowitz · on July 10, 2023

I agree with your comments about how Influx handled this shutdown.

The several things you might mean by self-hosting have their own pros and cons. The right choice is very context-specific, and assuming that it’s always the right choice is wrong. It certainly can be, though.

As for ClickHouse, that mention seems like a throwaway comment, unless you are advocating a boycott of even the open source InfluxDB due to its corporate author’s behavior and view ClickHouse as the closest alternative.

This incident has nothing to do with the comparison of the open source InfluxDB vs the open source ClickHouse, nor would it impugn the viability of InfluxDB hosted by a more responsible data custodian than Influx the company.

And GCP hasn’t done any similar inadequately notified shutdown of service with immediate and irreversible data loss, as far as I know.

(Disclosure: I have worked for Google in the past, including GCP, but not in over 8 years. I’m speaking only for myself here. I’ve never worked for Influx ClickHouse.)

simonw · on July 9, 2023

This kind of thing really does need a cooling off period.

Assume that your users won't see your emails. How do you help them avoid data loss when you shut down a service like this?

One option that I like is to take the service down (hence loudly breaking things that were depending on it) but keep backed up copies of the data for a while longer - ideally a month or more, but maybe just two weeks.

That way users who didn't see your messaging have a chance to get in touch and recover any data they would otherwise lose.

I'm not sure how best to handle the liability issues involved with storing backups of data for a period of time. Presumably the terms and conditions for a service can be designed to support this kind of backup storage "grace period" for these situations.

SSLy · on July 9, 2023

you start with reliability brownouts. first fail 0.1% requests, then after a week 1%, then after a month 5%.

dfadsadsf · on July 9, 2023

Much better is to stop the service but add button "Resume" that re-enables service for two more weeks with no data loss. That way you give users opportunity to gracefully migrate away.

Stopping service and immediately delete data is just callous.

mst · on July 10, 2023

When ovh sunsetted a class of VPSen and I'd completely failed to notice they were going to do that, I asked nicely in the support ticket I'd sent in and they turned it back on for a few days while I shifted the data to a replacement (which was still an ovh VPS, it had been Just Working long enough that I didn't feel like I'd been mistreated, more lulled into complacency by the lack of problems).

I think requiring a ticket might be a worthwhile trade-off compared to just adding the button, because that allows you to engage with customers to make sure they can (in a case like this) migrate to a different region of your own service, and the activation energy of sending a ticket means a customer's less likely to click 'Resume' and then forget about it again until it's too late.

tetha · on July 10, 2023

I mean this is why you do these projects on two different timelines: The internal timeline and the external timeline.

Externally you communication: Different announcements each month, final notices at T+5M, System will be deleted at T+6M, data will be lost at that point and so on.

Internally (at least at work) such a timeline is more that at T+6M, we cut access to the systems. Afterwards, systems not accessed for 2-4 weeks are removed periodically and the hard removal is planned for T+9M. Customer support and account managers can manage if systems need to be accessed. If a customer needs the system for a longer time, they can, but then they pay for it. Entirely with all necessary infrastructure, not renting a few licenses on the system.

Call it a bit callous, but this allows our customer support to appear nice and in control. And it leaves the customer happy and relieved that we have left some slack and leeway. But they've been shaken and woken up and can get to migrating.

The biggest challenge here is to stay on it and to not allow customers to become complacent again. This can be done by e.g. limiting the reactivation time to a week or so so they have to get on it.

mst · on July 10, 2023

Yep, and in certain in house situations it's best to keep a backup around for ~13 months in case there's an obscure business process that only gets done once a year. (I'm aware that some people reading that sentence are going to go wtf at the idea that that's anybody's problem except whoever didn't tell you said business process even existed, but if it's a sufficiently critical finance or HR thing it tends to rapidly become everybody's problem so I like to have options)

Agree absolutely wrt complacency, I believe I asked for less than a week because I actively preferred a situation where I had to get on it immediately.

remram · on July 9, 2023

That seems like the worst of both worlds, during the brown-out you have to keep paying for the compute while your customers don't get a reliable service, even if they have a plan to migrate.

Also you probably can't keep charging customers for that period since you offer a crippled service on purpose.

scosman · on July 9, 2023

If you are shutting it down, you can pay for the grace period. Period.

remram · on July 10, 2023

Just shut it down for real (after proper early warnings), so you save on compute and no one is confused about the state, and offer data retrieval for the grace period.

Brownouts are great for API changes, but not very useful before a full shutdown.

scosman · on July 10, 2023

You assume warnings reach users. Some people miss emails. Fewer miss a service going offline. Keeping data after shutdown is a good backup.

remram · on July 11, 2023

That's why I'm saying to take it offline. Purposefully broken service is not very valuable, can't really be sold, and yet can still be missed; it also costs you money.

pauldix · on July 10, 2023

Hi, cofounder and CTO here. We notified everyone via email on February 23, April 6 and May 15th. We also offered to help migrate all users. I realize that it's not ideal that we've shut down this system, but we made our best efforts to notify affected users and give them options to move over to other regions. If you've been impacted by this, please email me personally and I will do my best to help out: paul at influxdata.com.

js2 · on July 10, 2023

Paul: I'm surprised you didn't do a scream test. Not everyone is going to see those emails and even those that do may not understand what they are reading.

Internally at my company we always do scream tests as part of our EOL process because we know we can't reach everyone, even our own employees.

https://www.microsoft.com/insidetrack/blog/microsoft-uses-a-...

Fun story: my mortgage got sold last year. Not the first time. I got emails from the old mortgage company and the new mortgage company about the sale, but I skimmed them. I got letters via USPS from the old and new mortgage companies, but I mostly ignored those because 95% of what mortgage companies send me via USPS is junk. So I missed the fact that my automatic payments didn't transfer over. The new mortgage company let me get four months in arrears before they finally FedEx'd me something overnight. That got my attention. I was like: you guys should've FedEx'd me this in the first place. For all they knew, I wasn't getting their emails or letters in the first place because nothing had been sent signature required.

mrweasel · on July 10, 2023

> Not everyone is going to see those emails and even those that do may not understand what they are reading.

If that's the case, these companies/people have no business using cloud services. Fair enough that you might not understand the ramification, in that case you contact support. If you don't see those email... that's on you. We operate out of a number of datacenters, they all communicate via email, giving us one month to three notice regarding service windows. If we fail to plan for an outage because we didn't see an email, that's our problem. I don't know why anyone would expect more from a SaaS company.

For really large customers, I would assume that they have a customer service representative and yes that person should have called. If you're just a small customer (even if you might be big in your own mind) and just have an account that get billed to a company credit card each month, it's a little naive to think you'd get anything more than an email.

We've already seen a number SaaS company just shutting of customers for little to no reason, even AWS has done this. Running things in the cloud is a risk, and it's you job as the operations team to stay on top of things, have backup plan, because you cannot expect cloud vendors to care about some random customer who just signed up using a credit card and a nondescript email. They should, but they don't.

A good rule is: Don't expect a SaaS/cloud company to put in more effort contacting you than you did signing up.

acatton · on July 10, 2023

> > Not everyone is going to see those emails and even those that do may not understand what they are reading.

>

> If that's the case, these companies/people have no business using cloud services.

Cloud services are responsible for this. I've signed up to many cloud services where I purposefully unchecked all the newsletter/updates/... notifications.

But I still receive notifications for stuff unrelated to what I use. These emails are full of marketing/PR jargon, where it's unclear whether I'm affected by the change or whether there is even a change!

Cloud services are lazy, don't look at their customer use, spam everybody, and blame their customers when they missed an important update due to noise-fatigue.

This is the main reason why I stopped using SaaS.

tracker1 · on July 10, 2023

Unless the cloud provider can provide proof that the person received and read such a notice, then they can still be sued for actual damages... and I'd be surprised if that doesn't happen in this case.

The fact is, there are many options from a cooldown, scream test, automated backup for migration/recovery... this organization did none of those things and absolutely deserves to lose massively as a result. This is a DATABASE as a Service... RETENTION should be one of the highest priorities.

For that matter, it would have been better if they auto-migrated in an OFF status, or otherwise backed up... just hitting the DELETE ALL DATA button is wrong. Several of the posters in the thread indicate they received no such emails.

irjustin · on July 10, 2023

> I was like: you guys should've FedEx'd me this in the first place. For all they knew, I wasn't getting their emails or letters in the first place because nothing had been sent signature required.

I love the scream test, but the analogy you bring up actually - this seems unfair. The cost of Fedex'ing everyone is astounding (for many businesses).

But I like the concept. Definitely a sort of "shut off the server for like an hour" and then see who yells.

Phone calls for any account that still operating on the location.

100% agreed 3 emails is... hardly anything.

jacquesm · on July 10, 2023

When you buy a large, registered debt such as a mortgage the cost of Fedex'ing everyone should be factored into the sale. If that's too much money you shouldn't be buying such assets. Notifying those that are affected properly would seem to be the least you can do in such situations.

dawnerd · on July 10, 2023

Exactly and they’re find sending junk mail all the time and FedEx isn’t that expensive for large businesses like that.

mschuster91 · on July 10, 2023

At least here in Germany junk mail senders actually get reduced rates.

rvba · on July 10, 2023

At the end of the day you are the one who pays for the fedex though.

jacquesm · on July 10, 2023

Not really, no. Mortgage payments are interest and principal, not administrative fees beyond what you paid when you signed the original contract.

sokoloff · on July 10, 2023

Whatever costs are imposed broadly an industry are covered by customers of that industry. If mortgages are more costly to buy, they’ll be more costly to originate.

dmurray · on July 10, 2023

The consumer doesn't bear 100% of the costs though. If mortgages cost $1mm to buy, they wouldn't cost $1mm to originate. They'd cost a little extra by being illiquid - the same amount as if they cost $2mm to buy.

rvba · on July 10, 2023

The consumer nearly always bears 100% of the costs - because most companies sell at a profit.

The final price is a combination of all costs incurred + a profit percentage imposed on the consumer.

If all customers would receive fedex mail, then the costs of this fedex would be pushed on them somehow. Probably by making them pay slightly higher interest rates or by introducing some one off handling fee.

jacquesm · on July 10, 2023

Profits aren't a fixed number.

sokoloff · on July 10, 2023

Of course not, but financial products are priced and offered with a financial outcome (usually a margin) in mind. If you make mortgage processing more expensive, you'll find the offers for origination are worse than if mortgage processing were less expensive.

jacquesm · on July 10, 2023

Sorry, it's a ridiculous argument. Mortgages get bought and sold all the time and clearly the buyers are on the hook for the communications costs and these do not pre-emptively get priced into future products by the sellers. I'm sure that there are situations where your argument has merit but this isn't one of those.

sokoloff · on July 10, 2023

No one is saying that existing mortgages would be re-priced. Those are contracts and you can't unilaterally change them. What I and others are saying is that if you changed the obligations of buyers such that mortgages became more expensive to service, that those servicing cost increases would ultimately be borne by the mortgage borrowers rather than eaten by mortgage lenders out of the goodness of their hearts.

> these [communications costs] do not pre-emptively get priced into future products by the sellers

They 100% do get priced in. Whenever you buy a product, you're paying all the costs of that product. When someone originates a mortgage, they're aware of the secondary market for mortgages. If that secondary market is eroded by a significant increase in communications costs, that reduces the willingness of a secondary buyer to bid for your book of mortgages. That erosion reduces your projected profit on originating, so you take a little longer to lower your offered rate to 5.250%, or you charge a bit higher origination fee, or whatever to ensure you maintain a viable business. So long as these fees hit the entire market, the other originators are all making the same calculations.

It seems odd that you [seem to] think that money for these costs would just result in reduced profits for the financial services companies rather than in increased borrowing costs.

jacquesm · on July 10, 2023

> What I and others are saying is that if you changed the obligations of buyers such that mortgages became more expensive to service, that those servicing cost increases would ultimately be borne by the mortgage borrowers rather than eaten by mortgage lenders out of the goodness of their hearts.

I got what you and others are saying but I've been trying - and failing, apparently - to point out that the costs for a mortgage in case of a sale are born by the buyers who are not even the same kind of institutions as the parties that sell them, and so they are in no position to charge the subjects, nor are the sellers going to price the mortgages any higher in the future because the costs aren't born by them.

There is no such thing as 'projected profits' that go into this because mortgages can be sold (and are sold) more than once, the number of times is not known when they are issued first. And it is going to be only a small fraction of the audience that is going to be hard to reach for whatever reason. The presumption that there is some kind of free market mechanism that will ultimately pass those costs back to the original mortgage underwriters is not in any way evidenced by present day mortgage prices. On the sum total of mortgages out there and the - exorbitant - profits they create for the lenders we're talking about such small amounts that it will make zero difference.

A 'significant increase in communications costs' would translate into that being something that is some noticeable percentage of the total yield over the remaining time and it just simply isn't. Typical mortgage rates and amounts utterly dwarf the costs of a one time notification, especially if you don't have to notify everybody like that, and you can try cheaper channels first until you have a hit. Besides, the original mortgage 'service charge' is already a large multiple of the various costs and tends to be mostly pure profit for the initial lender.

It seems odd that you [seem to] think that mortgages are priced such that the mere cost of communications is going to show up in the prices, they are amongst the most profitable financial products.

rvba · on July 10, 2023

> On the sum total of mortgages out there and the - exorbitant - profits they create for the lenders we're talking about such small amounts that it will make zero difference.

If it made zero difference, they would Fedex the documents, but they dont.

js2 · on July 10, 2023

The point of the story was that if you have something critical to communicate, you can't do it using the same methods that are also used for low-priority mostly junk and expect it to be acted upon. Surely a mortgage servicing company can afford to FedEx an envelope. I wouldn't expect Influx to do so.

tracker1 · on July 10, 2023

For that matter, registered mail is another option that isn't as expensive as fedex... there's other options that are slightly more costly, and more noticeable than sending via the same structure that junk mail goes through.

I expressly don't use automated payments for my mortgage and auto loan(s) as I don't want to have an account miss... I didn't know my mortgage was sold/changed a couple months ago until I went online to change. The same happened with an auto loan a few years back as well.

Rebelgecko · on July 10, 2023

>I love the scream test, but the analogy you bring up actually - this seems unfair. The cost of Fedex'ing everyone is astounding

I wouldn't be surprised if the cost difference is negligible, maybe a couple bucks per mortgage? FedEx bulk discounts can be pretty massive

hoffs · on July 10, 2023

The price doesn't matter in this analogy since it still mostly works

pmh · on July 10, 2023

> scream test

I didn't realize there was an actual name for this, so I'll add another for the thread that I haven't seen brought up: email is evil in operations (EIEIO, like the nursery rhyme)

jacquesm · on July 10, 2023

I think the analogy is to station people at various floors in a big office building while you disconnect (unmarked) cables in a legacy patch panel to see who starts screaming. Then you plug it back in and label the cable...

law_enforcement · on July 10, 2023

Or the more general version, where you shutdown a server/service and see if anyone screams. If it's still silent after $days, you can decomission it.

pmarreck · on July 10, 2023

Something similar happened to me when I lived at a condo in Boston and the management company changed and I somehow missed the memo.

I kept sending checks to the old management company for a few months before the problem was discovered. Unfortunately, someone at the old management company apparently had a bit of a gambling habit and petty-thefted that money away, and I never saw it again even though she lost in a lawsuit.

js2 · on July 10, 2023

Semi-related HOA horror story:

https://abc11.com/nc-hoa-foreclose-sell-house-woman-didnt-kn...

Kinrany · on July 10, 2023

Was the employee the defendant, not the company?

pmarreck · on July 10, 2023

In this case, yes, it was the employee; I forgot the details as to how/why that was

WirelessGigabit · on July 10, 2023

You know, I don't really care that mortgages get sold, but the fact that I need to go out of my way every time to make sure I update my information in their system is astounding.

jacquesm · on July 10, 2023

Technically that shouldn't even be your problem.

TeMPOraL · on July 10, 2023

Neither should be a random third party defrauding a bank, but banks are masters at outsourcing work and responsibility onto their customers, and that's why it's called "identity theft" and suddenly I am at fault somehow.

jlund-molfese · on July 10, 2023

I even (sort of) do this when I'm deprecating something which my team is the only user of, because sometimes it's hard to tell if something's really unused! First shut off the VPC access while leaving all the other infrastructure and data intact, wait a week or two and see if everything breaks, then get rid of everything else

MarkSweep · on July 10, 2023

There is another variant of this: if you can show that the code you are deleting never worked, there is no need to do a scream test. That is, if anyone cared about the code you are deleting, they would have already been screaming.

TeMPOraL · on July 10, 2023

I would be careful with that. Maybe they did scream, but you haven't heard it, and they worked around the issue. Or maybe they did their workaround without saying anything. Or maybe you're wrong about your code not working. It actually may be working in some way that you don't know of, but is useful to someone.

To use an ecosystem analogy, once you expose your software to the world beyond your own dev environment, even internally, you'll eventually find that something colonized it - much like everything on this planet that isn't being actively and regularly scrubbed.

In my own career, I've seen cases of this. For example, once we were tweaking a little embedded database that supported a half-finished feature meant for internal use, and only then we (as in everyone in the dev team) learned that somehow, the QA & deployment support people got wind of it, and were scripting against exposed parts of that DB for a good year. And, it turns out, it wasn't the only part of the software that we thought of as incidental phenotype (or didn't think of at all), and the other team considered stable behavior.

See also the so-called Hyrum's Law: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviours of your system will be depended on by somebody."

steveBK123 · on July 10, 2023

Re: Hyrum's law This is why user facing code should have (at least) two classes of tests: 1) is it doing what the developer intended it to do 2) is it doing the same thing it did on the last release version with typical user requests?

That sounds the same, but it is not.

The first class is a set of simpler "happy paths" of intended specific behaviors.

The second class is like wargaming. A good way to do this is to replay user requests against your API and see that they return the same results release to release. You may also uncover interesting unintended behavior / conversations to have with users this way.

yard2010 · on July 10, 2023

Clearly, it's easier to say sorry than do things right up front

steveBK123 · on July 10, 2023

LOL, have they even said sorry though?

Axsuul · on July 10, 2023

Sounds like the most cost effective way, however.

SkyPuncher · on July 10, 2023

Wow, that's pretty pathetic and your attitude "we can't help our customers" is even more damning. Email is not reliable enough to simply rely on a few email blasts for this.

I would expect:

* Those 3 "email blast" notifications. I'm guessing one of two things happened here:

  * You sent them as an "email blast" from a marketing-type email service. These hit email filters because they came from a known spam IP.

  * You sent them as a transactional email, but blasted them too quickly and got pegged for spam. Never hit the inbox.

* Increasingly common "you haven't migrated emails" if you still detect traffic on these instances. This is pretty critical since some companies might not realize they have affected They should, but things get complex.

* Ideally, an automated transfer to another region with automated forwarding. It's okay to have poor performance, but it's not okay to go "poof" entirely.

* A soft-delete at the deadline, with 90 to 180 days to finalize migration. If this is costing you dearly, then drive prices up, but don't hard delete data.

Frankly, the last one is the real issue. It's literally unbelievable that a database provider didn't soft-delete. Further, I would expect that you'd be able to migrate these to another region to get customers back up an running.

imglorp · on July 10, 2023

Another problem is that service providers frequently poison the email channel with important sounding engagement dreck and we are now conditioned to ignore it.

* Important: migrate to this new feature immediately or you risk missing out!!

Vs

* Important: migrate your data immediately or you risk losing it!!

jlund-molfese · on July 10, 2023

Plus, many companies abuse what I assume is an exemption to spam rules by pretending that their marketing emails are “service related”.

Xfinity is the worst about this. They’ll send me a so-called service-related email exhorting me to download their app. Same with Capital One and their monthly emails asking me to turn on automated texts.

SkyPuncher · on July 10, 2023

You worded that better than I could have.

I ignore most of my vendor's emails because they're simply trying to spam me at this point.

fbdab103 · on July 10, 2023

I was also thinking that salesmen will cold-call/email me at least three times before they give up. People with whom I have no business relationship try harder than a corporation taking someone's money.

TeMPOraL · on July 10, 2023

Welcome to modern economy. This applies to B2C, and as we can see it, B2B too. Getting a new customer is all the shitty vendors care about. Keeping existing customers is apparently just cost.

xyst · on July 10, 2023

I went to a conference in 2018, gave out my work email. Still get pestered by them

steveBK123 · on July 10, 2023

It's crazy the last email blast was ~6 weeks ago.

And the shutdown isn't even a month / quarter end, so seems even less like a billing cycle thing.

There's so many more mature ways to do a graceful paid service shutdown. Disable reads first so people get errors & contact support. Then disables writes as well. Somewhere a few weeks later you can consider deleting data.

As others have said, I've worked at megacorps with internal systems that had more mature migrations than this. Honestly I have run internal apps at sub-1000 employee firms where we did various forms of scream tests and soft deletes before MONTHS later, even daring to delete data on disk.

devrand · on July 10, 2023

> Ideally, an automated transfer to another region with automated forwarding. It's okay to have poor performance, but it's not okay to go "poof" entirely.

If the data is moving between countries then this is not an option. Your clients may have legal or contractual obligations with respect to data location.

cowl · on July 10, 2023

We are talking EU here. apart from some rare archaic remains of past rules or really "national security related" (and of this last one I don't think anyone is using a cloud provider) clients should be free to store their data in any Country inside the EU common market.

fastball · on July 10, 2023

From the sounds of it, it was impossible for them to keep data after the 30th of June, so not sure a soft-delete at the deadline would work. Of course the better option probably would've been a soft delete before the deadline, but you know that some customers would've said "why'd you do this before the deadline, you could've waited an extra month!"

Aeolun · on July 10, 2023

I’m certain they wouldn’t. They have no influence over it anyway. The only thing they can do is migrate.

Many people don’t have that option now since their data is dead and gone.

SentinelRosko · on July 10, 2023

This is insane.

> We notified everyone via email on February 23, April 6 and May 15th. We also offered to help migrate all users. I realize that it's not ideal that we've shut down this system, but we made our best efforts to notify affected users and give them options to move over to other regions.

What other communication methods were attempted beyond just emails? Big, red obnoxious banners and warnings in various UIs? Phone calls?

The fact that it seems as though quite a few customers didn't get your emails, what was the thought process when looking at the workloads that were clearly still active before nuking it from orbit? Or was there no check and it was just assumed that people got the email and migrated?

Of the customers who were in that region, how many actually migrated? Was someone tracking these statistics and regularly reporting them to leadership to adjust tactics if there weren't enough migrations or shutdowns happening?

This screams either gross incompetence or straight up negligence. This is such a solvable problem (as many here have already mentioned various solutions), but I'm honestly just flabbergasted that this is a problem that is even being discussed here right now.

As a DBaaS, the data of your customers should be your number one priority. If its not, y'all need to take a hard look at what the heck your value proposition is.

We weren't impact by this directly, but you can be sure that this is going to be one of the topics for discussion amongst my teams this week. Mostly how we can either move off InfluxDB Cloud or ensure that our DR plans are up to date for the rug being pulled out from under us from you guys in the future.

schoolornot · on July 10, 2023

It seems a banner was added to the UI: https://community.influxdata.com/t/getting-weird-results-fro...

troupo · on July 10, 2023

It says "The UI was updated with a closure message for these regions."

Depends on where and how this message was added.

It also means that they had no monitoring in place to see how many people migrated.

Edit: They also say that this is reflected on the status page. Here's how their page looks: https://i.imgur.com/xlO4Ik2.png

Yup. It's literally a green status page that no one would give a second glance. That unreadable white on green? Oh. It's a deprecation message. It even has a subscribe link so that people would immediately and completely dismiss it as an ad due to ad/banner blindness.

Edit 2: Someone replied in the thread and added more context for the absolute lack of communication.

radiator · on July 10, 2023

Yeah, why is the deprecation message in green? Why not in red? Who is picking colours over there and with what criteria?

chillfox · on July 10, 2023

Email only is not even close to best effort. I know it’s standard to only do email for tech companies, but all other types of companies usually do physical mail and phone calls on top of emails for important notifications.

I am not a customer, but it’s really annoying me how tech companies repeatedly think sending emails is somehow anything but the absolute minimum, most lazy option.

ghaff · on July 10, 2023

However, tech companies will often not have your physical address--unlike your bank. And I'd probably block phone calls from some tech company I was a customer of.

wongarsu · on July 10, 2023

If they want business customers in Europe they need to create proper invoices, which contain the physical address.

pauldix · on July 10, 2023

We get an email address because we need to contact our customers. After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss. I realize that shutting down a region isn’t good. It’s not what we would have preferred, but we had to do it for the business. And we made an honest effort to contact all customers to help move them.

pierat · on July 10, 2023

As a CTO, you should make yourself aware of what a scream test is

https://www.v-wiki.net/scream-test-meaning/

Basically, you just turn stuff off, and make people scream, while they can turn it back on. You could have did this a month ago, as a critical warning of impending termination of systems.

You didn't do that.

Instead, you sent a few emails, which itself isn't even a guaranteed delivery. Again, as a CTO, you should know that email is not a guaranteed deliverable.

vbezhenar · on July 10, 2023

You can't turn system off if customer paid for it. That's the breach of contract.

If you keep customer data after payment period, you're losing money.

So scream test will cost you money.

They decided that this cost is not appropriate payment to compensate for possible reputation loss.

Whether that's a good business decision or not, time will tell.

Rebelgecko · on July 10, 2023

>You can't turn system off if customer paid for it. That's the breach of contract.

It sounds like they did turn the system off for paying customers though, why would it be any worse to do a scream test a few days before they pulled the rug out?

If it's just a matter of billing cycles (does everyone's billing cycle end on the same day?) , it seems like they could've handled it better. Just give impacted users a prorated and shortened final month (or even given it out for free for goodwill)

manquer · on July 10, 2023

Good customer service costs money.

It takes years to build a good reputation and minutes to destroy it.

windmark · on July 10, 2023

In this case we’re talking minutes to destroy your reputation for all current and future customers. That must be considered worth it.

rvba · on July 10, 2023

The comments made by the company CTO here like opposite of good crisis management.

"We did our best - we have sent 3 emails". I wonder how shitty the product was when sending few emails is their best.

The guy either sounds like a full fleged VC psychopath, or someone very unexperienced.

kelnos · on July 10, 2023

Ignore for a moment what you think your customers should have done, and look at the actual outcome. Some customers did not know about the shutdown and deletion, and have now lost data. You telling them "well you should have read your email" is not going to satisfy them, even if you think it should.

All you've done is told your customers that their data isn't safe with your service. This was an easily-avoidable "own goal" situation.

alex3305 · on July 10, 2023

> All you've done is told your customers that their data isn't safe with your service.

And not only Influx's current customers, but also their future customers. I really like Influx for my homelab. But with this attitude, I would be really hesitant for a real world production deployment.

axman6 · on July 11, 2023

What future customers? After seeing this astoundingly terrible behaviour for a company with "DB" in their main product's name, I can't imagine anyone ever making the decision to trust InfluxData again. I know I certainly won't, nor will any company I work for.

fnordpiglet · on July 10, 2023

Taking the region off line and making the data inaccessible at the same time was a big wrong call. I won’t hash over the other good suggestions here, but I would throw out there you should have turned off all data plans APIs for at least a month before deleting a byte of data. Nothing wakes up the customers attention than everything suddenly failing.

I think the attitude that “you pay us money so you better read every email we send” is at odds with reality.

1) why do you believe a single human being has that email address? As a company of any size I would never assign a human to a vendor email address. Turnover and rogue employee risks are too high for that. Usually these vendor emails are black holes only used to establish the account and recover credentials if needed. Or, it ends up in the hands of a vendor relations person who is more an accountant than engineer. Do you get the emails from GCP and AWS directly in your inbox?

2) because I pay you money I expect the opposite of the relationship you articulated. I’m not here to read your emails. If it’s really important then use my account manager to contact me. Make a phone call. Email is for spam - ESPECIALLY when it’s from a vendor. I view it as “I’m paying you money, vendor, so you need to go out of your way to give me excellent service” not “I’m paying you money so your emails are incredibly important to me I hang on your every dispatch”.

What baffles me is why on earth did you guys delete the data? I get you couldn’t afford to run the region for whatever reason. But you should have retained all the data. Storage isn’t that expensive.

amluto · on July 10, 2023

> After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss.

Using billing contacts for this is a mess. The billing contact could be an accounts payable department that will check the invoice against the contract or PO and pay it. It could be an outsourced office that has no idea what a database is. It could be someone who only catches up on email once every few weeks.

What is isn’t in a technical contact who knows what the shutdown of a database means.

kgeist · on July 10, 2023

I like that Zoom allows to provide a developer email so as a developer, I only receive emails about API changes. I've never missed an email from them.

manquer · on July 10, 2023

As a buyer I have come to expect good vendors to design systems so mistakes (my team or yours) don't cost me sleep or you business.[3] i.e.

- they do soft-deletes before hard

- have robust access control systems and partitioning - so we don't have to give access to everyone in the org to object model with full r/w

- don't instantly nuke the account if a payment goes astray or delayed - try to reach out before to a point of contact before pulling the plug, payment systems can be messy for all sorts of reason, ask before assuming the worst.

- customer managers who can connect couple of times a year which usually benefits the vendor as upsells happens on good % of those connects.

- also small things like training, certification

- Deprecation of service is handled slowly(1 Yr would be expected) and in multiple phases with multiple modes of communication.

Not all companies can move fast to plan and execute a major change in location like this in 4 months, bare minimum you would have to consider

  - End customers (your customer's customers) may need to be notified and may need to sign off

  - Compliance and GDPR DPA changes - both end customers and internal ones

  - DR, BCP concerns have to be planned for , not all GCP regions are equivalent.

  - Documentation and certifications like SoC, ISO, PCI, HIPAA etc usually mean ton of paperwork to modify

  - SRE/Devops may have to move other services along with telemetry on InfluxDB, may need network whitelisting from their  customers, things typically break when moving, need to plan dry runs, rollbacks and so on.

A better way to handle service closure would be to shut down but not delete on the planned date[1] , and offer data export separately for few weeks/month after[2].

You can definitely do better than shutting down service and deleting data at the same time .

[1] I would do this for internal customers let alone external paying ones

[2] You could have even charged for this to offset any costs, most customers wouldn't have a problem paying if they really needed it.

[3] Not trying to imply InfluxDB is doing these things, or isn't a good vendor, these are some criteria I have come to measure new vendors by.

ssd532 · on July 10, 2023

> don't instantly nuke the account if a payment goes astray or delayed

Hetzner deleted my server just one week after my payment due date. My credit card failed the payment for some reason. I didn’t notice this because I was ill with Covid. They sent me one email (or at least, I received only one email) as a warning. I only realized the server was gone when my services stopped working. I’m not sure if such a short warning time is common practice among hosting companies, or if it’s unique to Hetzner.

sberder · on July 10, 2023

I've had the exact same experience with them. After 10 years using that server, one payment failed, about ten days and they nuked the machine. German efficiency I suppose.

bombela · on July 10, 2023

I has an almost identical experience with https://virmach.com/. I will never recommend them.

After 5 years, they deleted everything 2 weeks after the first payment failure.

Sure it was the cheapest VPS. But still, you don't just delete your customers' data.

I was away from emails and the service during those two weeks. As far as I can tell it might have been some race condition in their payement processing system. They couldn't figure it out. They had no backup. They refused to reinstate the service anew to restore my own backup.

quicksilver03 · on July 10, 2023

It doesn't look too far fetched from their point of view, they saw a payment failure and they may have assumed that you decided to stop paying and didn't bother to send a cancellation request.

It looks like you had your own backup, which is always a good idea, hopefully you were able to restore your data elsewhere.

jlokier · on July 10, 2023

I've been 3 weeks late for a Hetzner payment (also for medical reasons) more than once, and my servers are still running. They sent several emails, one was a reminder to pay and another was a warning about what date they would shut down service. So I guess their notice system isn't as straightforward as one week for everyone.

Perhaps it's because I pay for several bare metal servers, or because I have a business account with them. Perhaps it's because I pay their invoices by bank transfer manually instead of by credit card. Who knows! You have made me wary of changing to a credit card now, because those do fail from time to time!

What worries me more is Hetzner's reputation for suddenly dropping customers with no warning and no way to retrieve data from the servers. That's always on the back of my mind.

cobbaut · on July 10, 2023

> Hetzner deleted my server just one week after my payment due date.

Strange. I've been at least eight days late with a VPS payment at Hetzner (3 euro) and the server is still up.

rvba · on July 10, 2023

Some companies dont mark you as a debtor if you are under a certain threshold (say 10 dollars), because the cost of processing this unpaid amount is not worth their time.

Also it is smart to have the threshold set to at least 1 cent, because this way you dont ask someone to pay you is supposed to pay you a fraction of a cent due to some rounding error. There are those stories where a company sends you a registered mail, where they ask you to repay a fraction of a cent - what is impossible. Also the cost of the letter (snail mail) made it not worth it. Even if you get an email that is "free" you cant pay 0.0001 cent. I mean you can pay a whole one and then ask to get 0.999 back - the time required by the bookkeeper to process it, then pay it (probably with a fee) is not worth it.

apgwoz · on July 10, 2023

> After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss.

I can empathize with this, but also would expect a good product organization to consider failure modes here and work around them.

Did anyone consider that bob@company.com left months ago, but since autopay still works, no one considered potential problems?

Did anyone consider Bob in accounting is paying the bill but ignores email that doesn’t have “balance due” in it?

… and a million other scenarios that are quite likely and need consideration.

WesolyKubeczek · on July 11, 2023

> Did anyone consider that bob@company.com left months ago, but since autopay still works, no one considered potential problems?

Everyone knows it happens, meanwhile every single company with high turnover is like this (those I've had personal encounters with):

Datadog: our domain has changed, but I cannot change my login. I've changed the email address in my profile, but I'm not sure if my login (which is an email address) is just a name or it may be used as an email address in some context.

Intuit: good luck changing your name

Apple Developer: still addresses the account as Bob No-Longer-Working-Here. It's not very clear how to change that name.

Apple ID: no, you cannot change the email address that had been primary back when you created it. And it better be a valid email address.

Orange: my address has changed twice, they are aware of that, they swore they updated my address everywhere; the invoices are coming with my old old address in their headers despite everything. Good thing they are sending them electronically at least so at least I receive them.

You likely can change the data there if you really need to, but it's very involved.

Someone should tell the IT/CRM drones that sometimes people not only leave the company, but also get incapacitated or die. In their Teletubby universes it doesn't seem to happen to anyone ever.

SkyPuncher · on July 10, 2023

Wow.

You've literally just told the world "you shouldn't rely on us for your data. When our business needs to drop you, we will and you might not receive notice."

The_CK · on July 10, 2023

I find your tone here quite condescending. We've never received those mails you've mentioned and you make it out like it's our fault that we didn't react. I mean, you managed to send us marketing mails in april and failed to mention you're gonna discontinue the service. So much for honest effort...

goodoldneon · on July 10, 2023

> ... if people can’t respond to vendors they pay money to, we’re really at a loss.

You're hurting your company's reputation by denigrating your customers like that

gemstones · on July 10, 2023

I know that this is a stressful time and it’s all hindsight, but there are two different contact methods that don’t rely on email available to you in migrations like this:

Going read-only, waiting 2 weeks, and then deleting. The contact method is peoples’ alerting systems as writes stop working.

Putting a message on your service dashboard indicating the upcoming action. The contact method is exactly what it sounds like, and it’s the only other place you can stick text and know for sure all your customers can access it.

It will probably help customer relations if you don’t hide behind the defense of only having email - there are a few strategies for this that you can use in the future. Best of luck on the road ahead - I know this must be a particularly stressful time.

Rebelgecko · on July 10, 2023

>After that we make best efforts but if people can’t respond to vendors they pay money to, we’re really at a loss

This is going to sound counterintuitive, like the Birthday Problem or Bayes Rule but at least for me it's true: most of the spam that gets into my inbox is from vendors I have a relationship with. Email isn't always ideal. Did you consider doing any other methods, like turning off writes a day or so before reads were disabled? That would trigger a much more immediate "oh shit" response than an email (unless the subject line is super clear and informative, and the email doesn't go to my span folder)

jacquesm · on July 10, 2023

> if people can’t respond to vendors they pay money to, we’re really at a loss

No, you're legally obliged to keep the service running. They are paying customers and even if you can't reach them through email there are other means of communications. If a business fails on account of your one-sided deleting of the data then you're going to be in for a very hard time, for instance a damage claim for gross negligence and breach of contract. This isn't just going to blow over. The onus for reliable communications is on you and if the channel you've got fails then you seek another one.

KingOfCoders · on July 10, 2023

I once worked for a company that after a merger had a bank account running just to pay services it didn't know what they were for.

If you grow, have a merger, people moving, it is easy for email addresses to no longer be read. Yes there are best practices to prevent this, but most companies I have seen don't do that.

In one company important emails went to the email address (private!) of the founder, who left after M&A.

qaq · on July 10, 2023

Well I guess all the bad publicity was worth it for the "business"

elzbardico · on July 10, 2023

Should get a phone number too. Shit like that happens. Next time lay down the law to the marketing drones from UX that want to "reduce friction".

Explain carefully so their thick minds will understand that NOBODY is lazy enough to quit subscribing to your service just because you added an additional field to your onboarding.

sofixa · on July 10, 2023

> Should get a phone number too

Which most people would be reluctant to provide because everyone hates sales spam, and what else would a SaaS need your phone number for in regular times (impeding data deletion is a good one)? On HN making a phone number for a signup is regularly criticised.

pinglin · on July 11, 2023

A UDP email contact is a very unreliable and careless way of making communication.

Please make sure you can implement a TCP-type communication with your customers for this kind of critical movement.

FooBarWidget · on July 10, 2023

At my company we used to only ask for email address. No names, no phone numbers, no mailing addresses. Because we understood that technical people don't like spam and don't like to give out their data. So we don't ask for them.

We also didn't send any reminders for them to check whether their email address was up to date. No account update reminders. To prevent annoying people with spam.

So other than sending emails and hoping that they read it, there was nothing else we could do.

zuppy · on July 10, 2023

> So other than sending emails and hoping that they read it, there was nothing else we could do.

but there are other ways. you can put a big red popup that can only be dismissed by typing "i agree" when the customer logins, you can put the service into read only mode, even with email you can send daily reminders for the last 30 days with a subject like "your data will be deleted in 21 days", etc there are so many things that could have been done.

FooBarWidget · on July 10, 2023

What we sold was software that customers deploy locally. We don't have any of their data. But the software would stop working if their license is no longer valid, resulting in downtime. That already made people angry enough.

Now we have changed it so that the software never turns off even if license has expired (though it will continue to nag an email address). Updates also cannot be installed.

throwawaysleep · on July 10, 2023

Why do customers believe that they don’t need to read their emails?

ghaff · on July 10, 2023

I missed an email that a train I was scheduled for had changed schedules on a trip earlier in the year. Why? Because Eurostar sends me maybe weekly email marketing messages that get filtered to one of my Google tabs because I maybe take them every couple years. It's probably unreasonable to expect that I'll see a reasonably last minute update though I'm not sure what a good alternative is.

I get probably 100+ emails a day that hit my inbox in some form and occasionally fairly important ones get mixed in with the mostly dross though Gmail does a pretty good job overall.

Aeolun · on July 10, 2023

At least you got a notification your schedule had changed… I was just 5-6 hours late.

Had the same issue with amazon though. In the flood of “Information about your order” emails one had some slightly different content (but the same subject): “We haven’t received your entire return. Please contact us in 14 days or we’ll trash it and charge you.”

When I contacted them a month later I was not very pleased.

cowl · on July 10, 2023

Because the suppliers have abused this form of communication to the point it's not useful for serious communication anymore. I cant read you 1000 marketing emails to find out that one single important service related one.

noveltyaccount · on July 10, 2023

Lmao are you serious? What about emails buried in spam? What if contact x left the company and the emails are black holes? There are a million valid reasons for emails to go poof. "But we emailed you" is weak.

throwawaysleep · on July 10, 2023

You actually need to read your spam to check and if a company didn’t bother transition an employee out properly (i.e figure out what their email address was attached to), why is that on the supplier?

Why do they need to move mountains so that you can avoid any seriousness about your own operations?

SkyPuncher · on July 10, 2023

If a vendor can't properly notify me of major changes, I'm going to find a different vendor.

I have far bigger fish to fry than monitoring my inbox for shitty practices.

Scarblac · on July 10, 2023

Because their reputation is the most important asset a cloud provider has. You're asking customers to run their business on your computers, after all.

Deleting their data first and then complaining that your customers don't run a serious enough operation is not the way to keep the best reputation.

wiseowise · on July 10, 2023

Because I’m the customer.

> Why do they need to move mountains so that you can avoid any seriousness about your own operations?

You won’t stay in business for long with that attitude.

o1y32 · on July 10, 2023

I guess you have never directly worked with a client and hope it stays that way.

This kind of attitude will ensure you lose your customers.

flangola7 · on July 10, 2023

This attitude you have is not appropriate for a vendor. Clients do not care that you think your substandard practices are fair, they will find someone else with actual "seriousness."

SkyPuncher · on July 10, 2023

I get slammed with emails. In fact, nearly all of my email is automated content or junk. I would hope that these emails would catch my attention, but I can easily see how'd I'd miss them.

Further, they might be going to some alias/group that's not frequently monitored. If a vendor is going to delete all of my data, I expect way more noise than 3 random emails blasts.

lawn · on July 10, 2023

Because it got marked as spam?

Because it's burried among lots of other similar looking emails that's just marketing garbage?

Because the email wasn't delivered?

Because the customer was on vacation and they expect a reply very quickly?

There's plenty of valid reasons to miss an email.

fnordpiglet · on July 10, 2023

Because if a company has 20000 engineers which one is the one that gets the vendors email? Answer is usually none and the email to that address goes to /dev/null.

Or suppose an employee did have the email on file and left the company.

Or suppose people assume vendor email are spam because they’re almost always spam.

mrweasel · on July 10, 2023

What? Each and everyone of our suppliers have a dedicate address their notification are sent to. Those automatically goes into the service desk as a ticket and is read by the service desk team, which can either correct billing information if required, or escalate to the correct team if action is require or there's any doubt about the content of the email.

If you have 20.000 engineers (or even 200) you have a functional service desk and I assure you that no individual engineers email is given as the email for vendors to contact. Even for large contract where you have a preferred contact on each end, there's an escalation path.

vultour · on July 10, 2023

Congratulations, but this is not how it works in many places. If you're providing services to companies you have to account for that.

fnordpiglet · on July 10, 2023

Really? I mean good on you, but I’ve never seen such an arrangement in my 30 years of working in various mega corps. Usually they end up in the hands of vendor relations, but typically in no one’s hands and the expectation is the vendor works with us through TAMs. Most companies have their vendor relationship model based around negotiated licensing agreements and software delivery, and the saas delivery model is fairly recent. At a certain scale and age these things are pretty hard to change, so things like this aren’t well accounted for. It gets more complex when we have a federated model where we have one global relationship with the vendor but teams use the saas individually. Then the email address on the account is subsumed either in some automation or onboarding process used to ensure no engineer has the ability to reset credentials unilaterally.

Your model is a smart one. It’s smart enough it tells me you’re either a small company or a newer company, or both, or a company with a rarely together vendor management team.

wiseowise · on July 10, 2023

Are you trolling or never used email?

ikiris · on July 10, 2023

[flagged]

dang · on July 11, 2023

Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.

ratg13 · on July 10, 2023

[flagged]

dang · on July 11, 2023

You can't attack another user like this here. If you'd please review https://news.ycombinator.com/newsguidelines.html and stick to the rules when posting here, we'd appreciate it. Note this:

"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."

Plenty of past explanation here:

https://hn.algolia.com/?sort=byDate&dateRange=all&type=comme...

arp242 · on July 10, 2023

> we made our best efforts to notify affected users

You call three emails (the last of which was almost 2 months ago) "best efforts"?

I had to read your message three times because this is so reality-defying preposterous I just couldn't believe I didn't miss anything. How about warnings on the dashboard? How about an intentional error (or limited service interruption) so that people would log in to their dashboard?

KomoD · on July 10, 2023

> How about warnings on the dashboard?

They did have a warning on the dashboard, problem is a lot of people don't check the dashboard because they don't need to, as they just view everything through grafana, etc.

They also had a notice on the status page

yencabulator · on July 10, 2023

Getting those people's attention would be what the intentional errors are for.

jacquesm · on July 10, 2023

Hi Paul, email is one-way communication and not guaranteed to be delivered. At a minimum you should have monitored who did and did not respond to the email with some kind of action and those that did not should have more effort expended to be able to reach them. Finally, you should have kept the data for a reasonable amount of time (say 90 days) post shut-down so users that did not get the notification could download it. What you've done is super rude and if I were still a customer in an unaffected region it would definitely be reason enough to leave because it's pointless to sit and wait to see how you'll deal with my data when the time comes. Better to preempt that and leave while I still have control.

axman6 · on July 11, 2023

Paul, are you actually for real right now? Did you really just say "We deleted all your data, and its your fault. We did whisper into the wind three times, you should have heard it. No, there is no chance of recovery"?

You might have literally deleted people's whole businesses, companies, who employ real people, who have families, now need to figure out how to continue. Not least of which, your own. If the company survives until Christmas I will be shocked; no one can trust your company ever again - your core business is storing other people's data, and you deleted it, for many, completely without warning.

I guess people still use Mongo even after finding it doesn't achieve any property of the CAP theorem, maybe some people will keep using a database provider with a track record of intentionally deleting their paying customers' data.

There just aren't enough adjectives for astonishment to adequately describe this situation.

I hope you offer Jay Clifford some support, he's clearly been put in the awful situation of having to explain the decisions of others and deliver the awful news. If I were him, I would be in need of serious mental health support, this is an absolutely awful thing to have responsibility for without any ability to rectify.

mlhpdx · on July 10, 2023

Contrary to the majority of the thread here, I find this to be an architectural issue. For whatever reason the system was designed without a way to communicate important service and maintenance issues to the customer. That’s part of the good architectural design of a system – it must include human factors, communication among them.

ratg13 · on July 10, 2023

I’ve worked at companies that aren’t even in the tech sector with less than 10 people and brownouts were SOP.

This is just regular old incompetence/negligence/greed.

olliej · on July 10, 2023

Multiple comments in the linked issue report not receiving an email.

Did you use the same email you use for spam/"marketing" for this notification?

The correct course of action is to shutdown the service and give people time to fetch data, not to erase the data as the first indication of shutdown.

A few emails are not sufficient if the end result is dataloss, a comment in documentation or release notes is not sufficient (the only reference at least one person in the referenced issue found).

truly mind blowing behavior.

asgeirn · on July 10, 2023

Former Belgium user here. Checked my inbox, no emails from Influx since June 2022.

Then again, I was only using the free tier, so I guess I got what I paid for.

DavidKarlas · on July 10, 2023

Why did you feel need to send 3 emails, and not just 1? Is it because you find emails not reliable enough?

dangoodmanUT · on July 10, 2023

> I realize that it's not ideal that we've shut down this system

Not ideal???

You backed up everyone's DB and moved that to another region so they can just restore and change DB endpoints, right?

I don't believe that someone along the chain didn't suggest a scream test or similar. If they did, they must have been ignored.

ratg13 · on July 10, 2023

If you are responsible for this the very least you can do is own up to it and apologize.

Trying to assert that you were doing what you thought was right only presents the image that your company is run poorly.

The correct thing to do is to admit that your best efforts were not aligned with best practices, and look into remediation.

Not “well, we tried”

santafen · on July 10, 2023

You could have just responded with ¯\_(ツ)_/¯ and saved a lot of typing.

mission_failed · on July 10, 2023

Lol great attitude. why would anyone pay money to any company you manage now or in the future if you deliberately trash user data and justify it with 'but we emailed you a couple of times first'

jasfi · on July 10, 2023

Then the title is misleading. Not everyone saw the email, many expected more, e.g. a scream test.

visionik · on July 10, 2023

Three emails are not "best efforts". 4.5 months notice is not "best effort".

My opinion on best effort: I founded, ran, and sold a SaaS company used by some of the most well known companies in the world. Our "best effort" was a minimum of 12 months notice, with a six month grace period afterwards. Emails weekly. Phone calls at least once a month. Reach out to customer leadership if no response. Then scream test as others suggested.

throwaway64478 · on July 10, 2023

How many times have you said influxdb is about managing the data lifecycle.

It is astonishing that you have literally completely ignored one of the primary USPs of your product.

yard2010 · on July 10, 2023

Judging from this link, if that's your best effort, I'm afraid to know what it's like when you're slacking :)

jpambrun · on July 10, 2023

I should not have to say this, but "best efforts" is not enough and is very offensive to every user still relying on your services. You had a duty of results, not merely "best efforts", to reach every-single-one of your active users before hitting shift-delete on their data.

Gartent · on July 11, 2023

You did enough to help, just ignore these ungrateful whiners

ksajadi · on July 10, 2023

Unfortunately we’ve been bitten by influx operation issues a few times before. We adopted influxDB long time ago and always had to deal with breaking changes for each upgrade and every time we had an issue their answer would be: upgrade the latest version and see if it continues.

Then recently they made a change to Telegraf that broke all our data collection because they changed the environment variable replacer and their Jsonnet parser broke.

Now this. Shutting down a region without brownouts and only emails is not operationally acceptable.

We’ve moved on from influxDB for a while and only rely on telegraf now.

galleywest200 · on July 10, 2023

We self host influxdb, never had this problem.

sofixa · on July 10, 2023

Which problem? Of the massive breaking changes between 0.8 and later, and then between 1.x and 2.x? Not to mention InfluxQL to Flux?

Also, they did remove clustering in the open source version which was a very poor move from a PR perspective. And in my view, they have never recovered from it - years ago it was Prometheus vs InfluxDB for (non-SaaS) observability metrics, nowadays the only question is which backend for Prometheus to choose.

kawsper · on July 10, 2023

And now they've released 3.0.0 in their cloud, which they claim is backwards compatible, but lets see.

I sometimes wonder if vendors realize that they put their customers into a buying mode when they do this, when our options are:

- Upgrade to the new version of product X.

- Change to vendor or tooling completely as we're already changing everything.

We might pick another system if we feel like it is more stable.

flagrant_taco · on July 10, 2023

If it's backwards compatible why would they bump the major version?

Either there should have been a breaking change in there or they don't understand symantic versioning

flagrant_taco · on July 10, 2023

Not sure why the downvotes here, a database service misusing semantic version is itself a red flag in my opinion. If major releases don't indicate a breaking change I'm not as confident in what might be part of a minor or patch release.

sofixa · on July 10, 2023

To be fair, they have a bunch of associated tooling like their query language and UI, so it might have been those that have merited a bump to 3.x

flagrant_taco · on July 12, 2023

Yeah that could be, but I think that would still only merit the bump it there was a breaking change in the public API

gtirloni · on July 10, 2023

> they did remove clustering in the open source version which was a very poor move from a PR perspective. And in my view, they have never recovered from it

I still remember this. We were ready to standardize on InfluxDB when we got a taste of their business practices.

sofixa · on July 10, 2023

Same, we were just about to choose InfluxDB, and ended up using it only for a niche low criticity scenario (VMware vSphere metrics, mostly for troubleshooting). We were never going to purchase Enterprise though, so they haven't lost anything outside of mindshare and champions, which can be evaluated to between 0 and infinity.

valyala · on July 10, 2023

Which Prometheus backend would you recommend?

cosmosgenius · on July 10, 2023

VictoriaMetrics. Don't have experience with InfluxDB, but had done rudimentary evaluation for popular backend. VictoriaMetrics stood out mainly due to low comparative operational maintenance.

asymptotic · on July 10, 2023

At AWS, the hierarchy of service priorities is crystal clear: Security, Durability, and Availability. In that order. Durability, the assurance that data will not be lost, is a cornerstone of trust, only surpassed by security. Availability, while important, can vary. Different customers have different needs. But security and durability? They're about trust. Lose that, and it's game over. In this regard, InfluxDB has unfortunately dropped the ball.

Deprecation of services is a common occurrence at AWS and many other tech companies. But it's never taken lightly. A mandatory step in this process is analyzing usage logs. We need to ensure customers have transitioned to the alternative. If they haven't, we reach out. We understand why. The idea of simply "nuking" customer data without a viable alternative is unthinkable.

The InfluxDB incident brings to light the ongoing debate around soft vs. hard deletion. It's unacceptable for a hard delete to be the first step in any deprecation process. A clear escalation process is necessary: notify the customer, wait for explicit acknowledgement, disable their APIs for a short period, extend this period if necessary, soft delete for a certain period, notify again, and only then consider a hard delete.

The so-called ["scream test"](https://www.v-wiki.net/scream-test-meaning/) is not a viable strategy for a cloud service provider. Proactive communication and customer engagement are key.

This incident is a wake-up call. It underscores the importance of data durability and effective, respectful customer communication in cloud services and platform teams. Communication is more than three cover-your-ass emails; it's caring about your customers.

orlp · on July 20, 2023

> Security, Durability, and Availability. In that order.

The ordering of security and durability very much depends on the needs of the customer.

Some data is vastly more valuable to malicious actors than it is to you, e.g. ephemeral private keys. If lost you can simply replace them, but if (unknowingly) stolen it can be disastrous.

Other data is vastly more valuable to your than to malicious actors, e.g. photos of sentimental events.

tetha · on July 10, 2023

> At AWS, the hierarchy of service priorities is crystal clear: Security, Durability, and Availability. In that order. Durability, the assurance that data will not be lost, is a cornerstone of trust, only surpassed by security. Availability, while important, can vary. Different customers have different needs. But security and durability? They're about trust. Lose that, and it's game over. In this regard, InfluxDB has unfortunately dropped the ball.

Interestingly, this is also how I'd allocate tasks to new admins. Like, sure, I'd rather have my load balancers running, but they are stateless and redeploy in a minute. The amount of damage you can do there in less critical environments is entirely acceptable for teaching experiences. Databases or filestores though? Oh boy. I'd rather have someone shadow for a bit first because those are annoying to fix and will always cause unrecoverable loss, even with everything we do against it. Hourly incremental backups still lose up to 59 minutes of data if things go wrong.

> The InfluxDB incident brings to light the ongoing debate around soft vs. hard deletion. It's unacceptable for a hard delete to be the first step in any deprecation process. A clear escalation process is necessary: notify the customer, wait for explicit acknowledgement, disable their APIs for a short period, extend this period if necessary, soft delete for a certain period, notify again, and only then consider a hard delete.

Agreed. At work, I'm pushing that we have two processes: First, we need a process of deprecating a service and migrating customers to better services. This happens entirely at a product management and development level. Here you need to consider the value provided for the customer, how to provide it differently - better - and how to decide to fire customers if necessary. And afterwards, you need a good controlled process to migrate customers to the new services, ideally supported by customer support or consultants. No one likes change, so at least make their change an improvement and not entirely annoying.

And then, if a system or an environment is not needed anymore, leadership can trigger a second process to actually remove the service. I'm however maintaining that this is a second process which is entirely operational between support, operations and account management. It's their job to validate the system is load-free (I like the electricians term here), or that we're willing to accept dropping that load. And even then, if we just see a bunch of health checks on the systems by customers, you always do a scream test at that point and shut it down for a week, or cut DNS or such. And only then you drop it.

It's very, very careful, I'm aware. But it's happened 3-4 times already that a large customer suddenly was like "Oh no we forgot thingy X and now things are on fire and peeps internally are sharpening knifes for the meeting, do anything!" And you'd be surprised how much goodwill and trust you can get as a vendor by being able to bring back that thing in a few minutes. Even if you have to burn it then to turn up the heat to get them off of that service, since it'll be around forever otherwise.

lopkeny12ko · on July 9, 2023

Wow, the incredibly callous 3-word explanation of the issue by pointing to a docs link with no other context. Really gives off "it's your fault for not reading the wiki." Is this how InfluxDB treats their customers?

Incidentally at work we've been evaluating a new hosted observability provider, looks like we can rule out Influx as an option.

teraflop · on July 9, 2023

> Really gives off "it's your fault for not reading the wiki." Is this how InfluxDB treats their customers?

I don't see any indication that the person who posted that is associated with InfluxDB. In fact, it doesn't seem like any staff member has posted in that forum in the past week. Up to you if you consider that better or worse.

ilyt · on July 9, 2023

It's weekend and I assume people designated to answer questions on forums do not work 24/7.

It could be them not sending notification about it early enough or at all, it could be that note being stuck in spam or something, it could be person complaining not reading the e-mails.

I wouldn't jump to conclusions here

cfeduke · on July 10, 2023

I used Influx Enterprise at a previous startup. Support was so bad after our first year we switched back to the OSS Influx and added HAProxy with manual replication and round robin load balanced. Was so much smoother, wish we had done that from the start since that was our original plan.

john_max_1 · on July 10, 2023

5% interest rate is breaking the tech companies. If you are dependent on a SAAS service for your infra, ensure that it is either

  - self-hosted
  - provided by a big deep-pocketed cloud infra

Otherwise, the service might shut down with a 30-day or so notice.

plasma · on July 9, 2023

Hard to reverse actions need multiple safety switches, for example, turning off the machines in that region for 2 weeks before deleting them, which would bring support issues to attention ahead of the no-going-back step of deleting data.

gtirloni · on July 10, 2023

So many easy ways that this could have been avoided. sigh

- Phone calls

- Scream tests

- Monitor services still in use. Contact these customers individually

- ...

Not a single individual said "Gee, people are still using that DC, should we really destroy it?"

Either this shows Influx is really naive and inexperienced or... they are in deep trouble cash-wise and were working in panic mode to cut costs.

tredre3 · on July 9, 2023

> According to the support, the notification emails to the users were sent on Feb 23, Apr 6 and May 15th. However, we did not receive those at all.

If true, this is concerning. One message getting lost in spam understandable. But three over 6 months would imply they're being blacklisted and/or their mail sender is simply broken.

Do serious companies not have canaries or other checks in place to ensure their notifications are correctly delivered to customers?

arp242 · on July 9, 2023

If a spam service erroneously marks one email as spam, chances are it will also marks other very similar emails as spam. So it's not too surprising all three were marked.

For these kind of automated emails getting all emails consistently being delivered to everyone is really hard, almost impossible.

The problem here isn't really that emails weren't being delivered, it's that they seem to have tried only one method to contact people, didn't check how successful that was (e.g. by seeing how many customers were still on those regions), and seemingly never tried anything else (such as notifications on the dashboard, a temporary brown-out to alert people, etc.) – "we tried one way to contact you and that didn't work, so we just deleted your service sucks to be you lol kthxfuckitybye"

adrr · on July 9, 2023

It could end up in the promotions tab or update tab that no one checks.

ikiris · on July 10, 2023

If you send out spammy notices, yep.

fbdab103 · on July 9, 2023

Even if all three emails were properly delivered, that is not sufficient notice for a storage service. Why is there not also a reminder on the dashboard?

A financial service I use was recently purchased by another. The company has been aggressive in keeping me on the loop about what is upcoming. Maybe six months before the actual event a heads up. Again at two months. Then at one month. Then every week, along with countdowns to the deadline. "Are you ready? This is really happening. Here are relevant docs on how to ensure your transition goes smoothly."

tracker1 · on July 10, 2023

Even the dashboard isn't really enough... if I'm running a one-off application or many, I may not log into every dashboard for every single thing regularly. A scream test would have been most appropriate, combined with a backup and at least 30 days retention for migration.