Hacker News new | past | comments | ask | show | jobs | submit login
Amazon’s consumer business has turned off its Oracle data warehouse (bloomberg.com)
180 points by petethomas on Nov 9, 2018 | hide | past | favorite | 94 comments



When non-traditional databases became popular, I thought it was primarily driven by people short-sightedly prioritizing development time over all of the good relational database features.

Now I see things differently -- the non-traditional databases are just better at scaling horizontally, eventual consistency, and running in cloud environments than the traditional databases. They are easier to set up and use. Some of them now have pretty good relational features and schemas.

During the past decade the Oracles of the world have continued to think dismissively of new non-traditional databases, as I did at first. The non-traditional databases got better and better while Oracle kept doing a lot of the same stuff. Oracle just didn't take the competition seriously, and that's fair enough because the competition didn't deserve to be taken seriously at first -- but now it does, and it's becoming a threat to Oracle's business.

It's obvious that Amazon would want to use their own database products -- but the impressive thing is that those products, which are not very old, are already good enough to replace Oracle for a lot of use cases at very high scale.


"[N]on-traditional databases" with far better performance actually pre-date relational databases.

Relational databases solved the problem they had, of requiring much developer effort to evolve for new application needs, and re-organization of the datastore for changing performance needs. Basically, everything is a table, and you can combine/separate tables, from whatever form it currently is, to whatever form you currently need. This is also helpful for reporting (as opposed to operational applications).

But relational databases, like dynamic languages, pay a performance price for their flexibility.

Of course, every other innovation has since been added, and there's Oracle's licensing too. So it's much more complex than the base technology of relational algebra.


Yes. Many people leave Oracle’s technology footprint out of frustration with their Salesforce.


There are places where relational DBs are inherently better than noSQL (namely when you need fully up-to-date information on every query, no matter what) - But those situations are becoming more rare compared to situations where potentially stale data is an okay tradeoff for performance gains.

Honestly, IMO the biggest issue holding noSQL databases back is lack of good documentation/support a lot of the time. Technically, though, they're going to become the standard for most use cases soon.


Why do people continue to use the term NoSQL ? It's ridiculous.

There are hundreds maybe thousands of NoSQL databases ranging from Redis to MongoDB to Cassandra to InfluxDB to Druid. They pretty much have nothing in common other than storing data. Many of them support SQL directly, others via Spark and almost all have a SQL equivalent.


Exactly this, „non-traditional“ and noSQL is as inexact as it gets. It includes an incredibly wide range of architectures, from Spanner and CockroachDB which deliver ACID guarantees equal and even surpassing relational systems to pure performance tuned systems such as Aerospike which have few guarantees and features but are blazingly fast. There are pretty much all shades of consistency in between, like Cassandra and Scylla with tunable consistency amd LWT. Lumping them all together into a single category will yield poor results.


> when you need fully up-to-date information on every query, no matter what

This doesn't require something relational, just something centralized. And not even that; consider Spanner.


Agreed. I think for temporal data (accounting) reporting non-relational databases are fine.


To clarify further. Of course, you will need a relational data repository.


The problem with noSQL is that there are too many players out there making duplicated efforts. You have MySQL & MariaDB and PostgreSQL as the two main SQL platforms, but you have too may to list in the noSQL field- most of them providing the same amount of functionality.


This always happens and the noSQL market will eventually consolidate / settle down to a handful of players just as the SQL market did.


That depends on the functionality requirements of noSQL customers. I posit that writing a new, indepedent RDBMS that implements the latest SQL spec with the same maganement and replication capabilities as the incumbents is on-par with trying to compete with WebKit and Gecko. The SQL Server engineering team at Microsoft is about 1,000 people according to my friends on campus.

Whereas implementing a Key-Value Store is orders of magnitude simpler. There is plenty of room for innovation, of course - but you don’t need 1,000 people to build-on some new replication scheme or distributed backend.

If you take a simple NoSQL system then tack on things like object scheme support, JSON blob value indexes, node graph support, etc you’ll end up with an RDBMS analogue.

So the barrier-to-entry to building a NoSQL system is much lower than a RDBMS - and for many companies they might decide to build their own rather than extend an existing system - leading to more completion but not necessarily a better product.


It always starts that way. In time, the legacy baggage that gets added will turn the noSQL databases into the same bloated monsters that the SQL databases have become. There was a time when many of the SQL database options were the (relatively) simpler and more powerful options vs the wide variety of home-grown databases they eventually replaced. (i.e. there was a lot of baked in business logic etc. that the move to SQL databases forced companies to disentangle from the database engine itself) SQL Server was originally just a fork of Sybase SQL Server (a much smaller, simpler version) with some FoxPro tech bolted on to it...


A famous P. Greenspun quote may be savagely hacked for the topic at hand: Any sufficiently complicated No/New/WhateverSQL engine contains an ad hoc, informally-specified, bug-ridden, slow implementation of a RDBMS.


Right - and FoxPro was just another dBase clone. NoSQL today is where dBase was 30 years ago.


> That depends on the functionality requirements of noSQL customers. I posit that writing a new, indepedent RDBMS that implements the latest SQL spec with the same maganement and replication capabilities as the incumbents is on-par with trying to compete with WebKit and Gecko. The SQL Server engineering team at Microsoft is about 1,000 people according to my friends on campus.

CockroachDB is the newest from-scratch SQL engine that I am aware of: https://www.cockroachlabs.com/

They re-use the RocksDB K/V store maintained by Facebook (in C++) as the storage layer, and their own code is written in Go, so I suspect that their development work is probably a lot less time-expensive than the teams working on older SQL databases.


Are you forgetting about SQL Server and Oracle?


And SQLite!


and Sybase and ComDB...


That’s what happens in a growing market. Once it’s matured, it boils down to a handful.


> You have MySQL & MariaDB and PostgreSQL as the two main SQL platforms

Based on license revenue it's actually Oracle and MS SQL server by miles. They are currently #1 and #3 on DB-Engines.com. (https://db-engines.com/en/ranking) MySQL is #2. PostgreSQL is #4, though it's pretty far back from the first three.


Maybe I'm confused, but ranking databases on license revenue will inevitably show that the databases that charge licenses are higher.

I mean PostgreSQL is open source, and required no license fee. I'm not sure that license revenue is a good comparison metric.


DBMS Engines uses a set of metrics that does not include revenue. See https://db-engines.com/en/ranking_definition for more. Oracle and MS SQL Server have consistently ranked highly there for many years.


You know MySQL, MariaDB, and Postgres don't have licensing costs, right?


Of course. I've worked in and around the OSS DBMS market for over a decade. But the numbers were really big. It seems to me that out of a 2015 US $30B DBMS market Oracle and MS were collecting something like US $29B. The rest of the market,which included products like MongoDB and Cassandra was close to a rounding error. (I don't have the Gartner report handy, sorry.)

It's popular on HN to focus on OSS products, but there's a very large proprietary RDBMS market measured both in terms of revenue as well as users. That's how Oracle and MS got to be the behemoths they are today.


> namely when you need fully up-to-date information on every query, no matter what

So you basically want full consistency (vs eventual consistency) in a distributed environment. You can set up you noSQL that way. I don't see how relational DB's are "better" than noSQL in consistency.


And if you do that you go back to the performance of one instance or (probably) worse.


Absolutely, but isn't that the case for SQL too? My point being, noSQL is not worse than SQL if all you care about is storing and retrieving documents vs joins, etc.


Of the three DB products mentioned in the tweet, only one is a non-traditional database: DynamoDB. Both Redshift and Aurora are traditional RDBMs, which are forked from very old codebases. Redshift is a fork of Postgres 8.0 that was adapted for OLAP use-cases, while Aurora is a fork of MySQL (or PostgreSQL in the PostgreSQL-compatible versions) that runs on a custom storage engine.

The main reason technically-capable companies like Amazon or Netflix are moving away Oracle has nothing to do with horizontal scaling. It's the licensing costs. Oracle is just too expensive for them, and even if you are not Amazon yourself, RDS (or any other Cloud RDBMS) would be much cheaper for you.


This isn't really about relational vs non-relational. Redshift is still a relational database of the MPP genre.


The non-traditional databases got better and better while Oracle kept doing a lot of the same stuff

I'm not sure this is true, you can look at the new features in 8i, 9i, 10g, 11g, 12c and see that Oracle is adding a lot of features, and some of them are very impressive. The problem is most people don't need most of them and if you want any of them you have to pay for all of them.

And the "non-traditional" databases had a pretty low bar. MongoDB just had to stay up for more than 5 minutes and not lose your data. Do they even do that yet?


> MongoDB just had to stay up for more than 5 minutes and not lose your data. Do they even do that yet?

What are you implying? Do you have any source for that? Some of the top tech companies out there use non-traditional databases, including MongoDB. They won't if the nodes crash every 5 minutes and/or there is data loss.


The parent was exaggerating but there are countless articles about how bad mongo is. These might be in the past, I do not know. Anecdotally I had horrible experiences with dataloss and instability, both of them incredibly hard to debug compared to say, MySql or Posgres. And when you see a hip company data leak it is often Mongo that was compromised (to be fair, usually because of it’s weak default settings and stupidity on the admin side, but still, it is a datapoint against mongo). Persobally I do not find it easier to use either: our ORM is a optionally schema’d layer; it works as flexible as Mongo and in naive setups, it is much much faster and easier to use (the latter is an opinion), without the risk of dataloss and with flexibility of db engine (it even supports Mongo because I was bored on a sunday).


Mongodb's storage engine is now wiredtiger. It is a bit young, but it's a really beautiful C codebase written by database veterans.


Good to hear; I have projects where others made the tech choices, and when that happens, these days it's always Node + Mongo unfortunately. So I inherit that. It is much better than it used to be, but still depressing compared to what else is there. But what works works anyway.


Despite what I said in my parent comment, I totally hate MongoDB. I used it for 5 years. The last time I was looking for a job I did not consider teams that used it.

There are other non-traditional databases that are much better than MongoDB.


Most of those oracle databases are moving off to Aurora, which is pretty much traditional.


Traditional yes, and not even necessarily better.

Amazon had a big outage in recent times that their own post-mortem linked to trying to move off Oracle to Aurora.

Interestingly it was posted here to HN and repeatedly and immediately flagged off the page - it appears that some HN readers are very keen on suppressing anything that paints Oracle in a good light. It makes you wonder what other stories might be going missing that would alter people's impressions of this company.


This article was essentially Oracle PR.

Andy Pavlo, database professor at CMU, has debunked it and gave details on the journalist's dishonesty.

Also, anything is better than oracle. If not for technical reasons, price wise it makes a lot of sense.


That sounds completely delusional I'm afraid. The article(s) were summaries of an internal post-mortem written by one of Oracle's competitors - the exact opposite of Oracle PR. And how exactly is some random professor going to debunk the conclusions of Amazon's own staff about their own infrastructure?

Your last sentence appears to be typical of the problem I'm describing.


First: Andy Pavlo was contacted by the author of the article. He's all but random, he's one of the references in the field. The outage was quite minor, not user visible and cost something around 100k.

I've seen such outages after an Oracle version upgrade so it's all but major.

For your information I've been managing critical databases for more than a decade so I know that it might look unusual to you, but it's quite clear to anyone barely knowledgeable about database management that the article really overblown the issue. It was also mocked by Amazon's CTO.

So please, manage databases for a few years. Then I might hear your opinion and maybe consider it.


My hot take is that K/V stores solved a problem they didn't know they had.

To whit, de-normalised tables for the sake of "efficiency".

It's pretty difficult to under normalise a two column table -- which is what K/V is.


That is a hallmark of a disruptive technology, to not be as good at what the incumbent values but better in some other dimension that they don't.


The other interesting thing is that the latest database trends reflect the general trend in the industry away from monolithic solutions. Most of the new databases are not one-size-fits-all, but are optimized for narrower use cases.


> The non-traditional databases got better and better while Oracle kept doing a lot of the same stuff. Oracle just didn't take the competition seriously, and that's fair enough because the competition didn't deserve to be taken seriously at first -- but now it does, and it's becoming a threat to Oracle's business.

Yes, like traditional car companies not taking EV's too seriously, or Intel not taking mobile/ARM too seriously, etc. It seems the older and more established a company is, the thicker their forward-thinking blinders are. "We're king, no one can EVER touch us."


Part of this is because successful companies have the survivorship bias of yesterday


What do you mean traditional car companies not taking EVERYTHING IS seriously? Volzwagen is switching all their cars to electric.


They could (should?) have been doing this a decade ago, but they didn’t. It took the relative success of Tesla to convince them.


Can you mention the popular non-traditional databases that are competitive with a traditional relational database ?


Shameless plug, but I think the new math that underlies global consistency is super interesting, I did a webcast with Professor Daniel Abadi about how Google Spanner differs from FaunaDB (my employer, which doesn’t require atomic clocks), you can get a link to the recording here https://www2.fauna.com/wcspannercalvin


The only thing that this "article written about a tweet" made me think was why has it taken Amazon so long?

Oracle sued my company a few years back for license compliance issues - I vowed never to run their stuff again and rip it out wherever I find it.


> why has it taken Amazon so long?

I imagine at the scale of Amazon replacing some of the core data stores with all their existing data is a quite complex task. Just think about the amount of data you need to migrate and keep in sync during the migration and all the third party tools using such data. So while it takes long, I don't think Amazon is particularly slow doing it.


Amuzingly AWS started as an effort to restructure their own internal infrastructure, and when they saw how useful it was they decided it could be sold to other people as well. Replacing Oracle is almost the exact opposite direction.


What’s wrong with an “article written about a tweet”, when the tweet is from someone — the CEO of AWS — with the highest authority on the matter?


Maybe if the AWS CEO had denied Bloomberg's backdoor story in a tweet, instead of the AWS chief information security officer in a blog, Bloomberg would have retracted it by now?



I bet Amazon uses Oracle back-office software like PeopleSoft , some Oracle ERP and/or BI Tools(Hyperion). Oracle has a stronghold on these type of systems and they only play nice with other Oracle products.


Similar experience with IBM and RedHat. (And go figure - they merged!)


I have no clue why Larry Ellison thinks it's a good idea to take potshots at one of his own large customers, instead of talking about the benefits of Oracle's products. Very weird, and kind of childish.


Because Oracle doesn't give a crap, they know they have a captive market and are the definition of a rent-seeking company. They don't sell their products based on their merits, they do it because companies have either have no choice or are already held hostage by them.


Because he is Larry Ellison? It is the same guy who thought the first version of Oracle DB should be v2.

From Wikipedia: "Note that there was no v1 of Oracle Database, as Larry Ellison, "knew no one would want to buy version 1."

https://en.m.wikipedia.org/wiki/Oracle_Database#Releases_and...


Because people can have opinions and don't have to shill out to the highest bidder. Amazon is in a position where they don't have to rely on Oracle, and that's a good thing for them. It's also a good thing for everyone else.


This is not surprising at all. For the past couple of years, Oracle's behavior has been egregious at best.

You can pick up any of their product and find that MYSQL, MS-SQL etc are supported but 2001/03 release because of "incompatibility issues". If that is not enough Oracle's support is clueless about what are these "incompatibility issues". And given the precarious security environment we are in, everyone needs a DB which has been patched sufficiently and allows newer features like TLS 3.0 etc, you are left with no choice but to go for the only supported DB is Oracle DB. I have sat in many meetings where the CIO/CTO have seen this as an arm twisting tactic by Oracle.

I am no market expert but if Oracle keeps going down this path they will cease to exist in next 10-15 years.


They just keep buying niche enterprise applications that have entrenched non-technical users and then raise prices, drop interoperability with non Oracle and start milking.


I'm not heavily involved with databases from a development point of view, is someone able to explain why Oracle is as successful as it is? What sets them apart from what seems like a plethora of other DB systems?


Putting aside business practices, Oracle database has features that don't show up in opensource systems, often for years.

The one I miss most is resource constraints on queries. It's nice to be able to guarantee that some queries will get more resources than others.

On the other hand, Oracle's SQL dialect is (or was, I stopped using it about 5 years ago) full of frustrating backwardness. No boolean type, so you get a mix of CHAR(1)s or INTs, depending on the prevailing DBA's opinion. And there's no serial or autoincrementing type, so for every table you wind up copying and pasting the same code over and over (create index, create sequence, insert trigger, update trigger).

The most head-scratching part is that you can find apologists for these flaws.


They were also very good with geographical data (Oracle Spatial). Now PostGIS is as good if not better.

We have a lot of applications running on Oracle databases at my company, and therefore a lot of Oracle DBAs.

Now they prefer MariaDB or PostgreSQL for most new projects because Oracle has become too aggressive with their business practices, pricing and audits.

But now the DBAs must learn PostgreSQL and that requires some non trivial effort on their part to become as proficient as they were with Oracle.


My personal favorite - the inability to store an empty string, instead converting it to NULL leading to null-checks in code even though the value is known to be non-null when inserted.



Ah, at last, after only 40 years. So there should be a boolean type by 2058 or so.


They are viewed as a 'safe' choice for enterprise customers in that they've been around forever, run on pretty much everything, and have product offerings that can tick off pretty much every vendor selection checkbox. They're essentially the IBM of databases (if you're not already tied to DB/2 as a 'big blue' shop or SQL Server as a Microsoft shop)


They have been around virtually forever, they had a very aggressive sales organization, and their database does generally perform better than the competition for most standard SQL workloads.


Best explanation I've heard is Oracle makes simple problems complex, but hard problems possible.


They were first to get big, and became a standard. Hard to kick out an entrenched standard.


That brings back an old memory of the data warehouse being extremely overburdened during peak. Capacity constraints and over utilization kept bringing down clusters. As a last ditch effort, the data warehouse team started randomly disabling queries en masse, under the assumption that if they were actually mission critical, the user would just re-enable them again.

I knew some engineers that worked on a centralized data engineering team that served 100+ software teams, and they managed several thousand scheduled queries. I felt so bad for the guy that had pager duty that first night. He said he got a sev-2 every 12 minutes on average for 24 hours straight.

I certainly hope Amazon has fixed their data warehouse issues since then.


I've only been here for a few months (BI engineer), but overall uptime for all data platforms has been good and things run smooth pretty much 24/7. Extremely impressive to me especially considering the breadth and depth of some of these datasets. I do have my gripes about how things are laid out here (layers upon layers of abstraction + codewords for everything + some big tooling changes not being documented or still tribal knowledge), but all in all it's a well-oiled machine here


This is as much or more about moving away from a specific product stack - Oracle E-Business Suite (EBS), which is an aging and expensive ERP titan. The news and people who aren't dealing with Oracle don't necessarily see that clearly.

I'd bet that their Oracle Data Warehouse being referred to here isn't just about the Oracle RDBMS database technology itself, they are talking about a specific COTS Oracle Data Warehouse model that you can buy prebuilt and works as a destination for all other Oracle ERP etc. subsytems.

I say this knowing it first hand coming from a company trapped in that particular Oracle stack. It's expensive, limiting and very locked in, and has a huge footprint and requires specialists to run (as do many ERPs).

I see actually a mad scramble by the big tech vendors (Microsoft, SAP and others) trying to push their ERP solutions as cloud enablement has opened up a window of opportunity to shift and escape some of the pains of the old ways (while creating an entirely new set of problems).

I'd be more interested if Amazon sees this same opening and is trying to enter the ERP game itself by building something in house that they turn around and offer as a service via AWS. They certainly have the scale to look pull off such a thing, and they already own all the enabling technologies to build an ERP system, data warehouse and rest of the stack.

That's my pet theory at least!


Lest we think this is a nail in a coffin, Oracle’s stock is barely down after market close (7:50p ET) off a price close to a 5 year high.

https://www.nasdaq.com/symbol/orcl/after-hours-chart

The “lawnmower” keeps on mowing: https://news.ycombinator.com/item?id=18323166


Oracle bread and butter is back-office systems like ERP/Accounting/Hr. They own Netsuite, Peoplesoft, and other similar products.


Wanted to share some insights on the data warehouse industry here from a co-founder of intermix.io.

According to db-engines.com, Oracle has declined 9% since July 2016. Amazon Redshift and Google BigQuery have grown over 50% in the same period, while Snowflake and PrestoDB (although each are 10x smaller than Redshift and BigQuery) have grown over 60%.

This is happening because enterprises are shifting to the cloud. When they go to the cloud they are shifting -away- from on-prem warehouses like Oracle, Teradata, and Vertica. Enterprises choose Amazon or Google depending on which cloud platform they adopt.

Companies launched after 2010 were born in the cloud (and thus never used Oracle since Oracle does not have a cloud) and are more likely to choose Snowflake due to ease-of-use (even as Snowflake is more expensive than Redshift).

What does this mean?

Consider that we are still in the early phases of cloud adoption. 32% of enterprises are in the cloud, rising to 52% by 2022. At the same time, over half of enterprises said they will use more than one cloud (hybrid cloud) within the next 10 years.

Amazon Redshift is the warehouse of choice for Enterprises on AWS. Snowflake is eating up the mid and SMB markets. prestoDB solves awesome problems for interactive and exploratory analysis for all. BigQuery is used by GCP customers.

These trends indicate that Oracle will struggle to grow revenues and margins, as they are relegated to serve the (still large but shrinking) portion of the market that chooses on-prem, and pursue aggressive rent-seeking of an aging install base (the Java mess is an unrelated but telling example of this strategy).

To reverse this trend, Oracle must find a way to serve cloud customers. That probably means acquisitions.


"Nobody ever got fired for buying Oracle."


And many 2018 CIOs regret that their 1998 predecessors said the same thing about IBM.


...until the CFO saw the bill.


The story goes that Amazon was one of Oracle’s biggest customer, and they have been for a long time.

When Oracle entered the cloud they started making fun of Amazon, “If their cloud and database offerings are so great, how come they use us to do the actual heavy lifting? That’s because our stuff is rock solid and theirs isn’t”

This pissed off Bezos, the sky broke and a voice foretold, NO MORE ORACLE!


That's been an expensive lesson to Larry Ellison not to be so arrogant.


If Larry had been nice, you think Amazon would have merrily kept paying Oracle license fees over moving to their own software?


Do you really think Larry has never had a similar thing happening, in the 40 years he’s been in business?


This isn’t about nosql vs sql, this is about how oracle is lead by an out of touch, megalomaniac selling a subpar product, that you can literally get for free. I hope this gives other companies the kick they need to dump Oracle and MSSQL too.


Just in time for Black Friday.... yay! What could go wrong?


So does this finally signal that Oracle is becoming way less powerful and relevant than they were before?

How many folks still actually use Oracle? And how many are trying to get rid of them?


Many Oracle installs used to rely on the apps that ran on them like ERPs and the like. Oracle bought many of those companies (e.g. Peoplesoft)

With cloud availability and many of the features of high end DBs commoditized, and the new strain of apps being cloud / SaaS, Oracle's available market has dried up.


Oracle has been threatened for a while, hence their desperate run to the cloud. Like IBM, though, they have enough fat to survive a very, very long winter; and, as cornered animals go, they are among the most vicious.

Personally, I don’t expect to see Oracle die in my lifetime. They will keep cannibalizing other companies to stay alive, probably forever.


I think Amazon is just in a unique position to dog food their own technologies.


I don't think Oracle is getting too many new customers. Their product is simply too expensive.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: