To all Cloudera's clients: pack your stuff and try to get an alternative vendor. Private Equity will not bring anything valuable to this business except higher prices,more aggressive sales and poor customer service.
As seen with ExtJS/Sencha and TravisCI. I'm sure the story will be the same. Chill for about 6 months to 1 year, and then layoff all the engineers, layoff the support while shipping it overseas and then hope long tail subscriptions and "trapped" organizations continue to pay the fees.
If you're looking for an alternative vendor: We've started a new company (called Stackable) to build a distribution for all these open source "data" tools (e.g. Apache Kafka, Apache NiFi, Apache Spark, ...).
I'm a committer for Apache HBase, Apache Hive myself and I've been in this space for 13 years now. Yes, the hype is over but there are tons of companies using this stuff in production and tons of companies choosing it for new projects.
We're trying to tackle the biggest pain points our customers had: Lack of flexibility (i.e. locked into specific versions for ages), CDH/HDP not built on Infrastructure as Code principles, Security is hard to do, ...
The three-sentence (buzzword heavy) technical summary: Our distro uses the Kubernetes control plane but we've developed a custom Kubelet that runs software using systemd as its backend as well as a bunch of operators (all in Rust...). This allows us to leverage the best of both worlds and also allows hybrid scenarios (part in containers, part on "bare metal"). We've replaced Ranger/Sentry with OpenPolicyAgent.
You could make an argument that Berkshire Hathaway is the largest M&A firm ever. But it's not really private equity (though it's not really public, either).
Dell's approach was actually more like the classic "taking a company private again", where you use public equity markets to grow big but keep control, then take it private at terms that don't really reward shareholders for the massive growth. This looks like the modern variety of PE capturing predictable revenues from a large, mature client base that can pay their fund the expected returns for the next 5-7 years. It's boring as hell and never means (a) a better product, or (b) a bigger pay-off for employees.
Having worked for a Silver Lake funded company (I originally called a startup, but that's not fair to say anymore for a private company that now makes billions), I can assure you that they don't take a back seat to how the company is ran (that's not to say they take a direct hands on approach, either).
In 2017, I joined a company that had been spun out of Ebay and bought by a PE firm. The firm invested a large amount of "growth capital" in the biz to transform the product from a software license to a cloud-based service. This transition not only increased our revenue exponentially but also gave us the ability to analyze data on how customers were using our product (prior to this, we had zero visibility into how customers used our on-prem product). Using this data, we were able to better serve our customers & partners and improve the overall experience of using the product. A few years later, we sold the company to a large company for a pretty penny (>$1B).
This is all definitely anecdata but, IMO, being backed by a PE firm forced us to focus on revenue (really EBITDA) alongside product growth. A mechanism that forced us to focus on the impacts of each product decision we made. I think this ultimately helped us keep a steady pulse on the market w/o chasing every shiny new trend that popped up.
The data industry continues to hype this idea of “multi-cloud,” but then the “modern data stack” is centralized around a single warehouse and nobody sees any irony in that.
The big bet we’re making at Splitgraph [0] is that the next wave of data engineering will take a more decentralized, “data mesh” type approach to enterprise architecture. “Data gravity” really does exist - it’s expensive to move, in terms of both cost and operational complexity. And with increasing specialization of analytical databases, a single source of truth will become unrealistic. So instead of bringing the data to the query, why not bring the query to the data? All we need for that is a set of read only credentials. And yes, it should also be easy to warehouse your data, but it doesn’t need to be the default.
Cloudera mentions they bought DataCoral to help with data integration and connectors. They’ve correctly identified the problem - data sprawl and fragmentation will inevitably grow - but I’m not sure they have the right solution.
Data integration is important, but it’s a moving target, which is why it calls for a collaborative open source solution. This is why so many new startups, like AirByte most recently, are coalescing around the Singer taps that Stitch left behind after its acquisition by Talend.
We also support using Singer taps to ingest data into versioned Splitgraph images [1], so we’re excited to see more collaboration on maintenance of taps. For us it’s a useful feature, but it should be just that — a feature. Is there really a need to replicate all of your data before you can even query it? Or would you rather experiment by directly querying its source?