Hacker News new | past | comments | ask | show | jobs | submit login
Thoughts on Uber’s List of Postgres Limitations (2ndquadrant.com)
245 points by areski on Aug 3, 2016 | hide | past | favorite | 103 comments



I think its worth mentioning again that what Uber ended up using has no resemblence to an RDBMS (single table, manual indexes).

So regardless of whether their complaints are justified or not, it should not be taken as an endorsment of mySQL over postgres, but rather of an endorsement of NoSQL over RDBMS .

Which is really just what every company at these scales do (except for google and f5 if whitepapers are to be considered).


Uber runs a lot of databases in production. We stopped using Postgres a while ago for new applications that talk SQL, and this article explains some of the reasoning.

We still have many applications that talk directly to MySQL, and we still have our original API monolith that talks directly to Postgres.

All new applications are being built using distributed databases like our in-house Schemaless system which happens to be backed by MySQL, and we also have Riak and Cassandra in production.


I would love to see a similarly detailed write-up on your guys' transition from Riak to Cassandra. Especially considering you would have been really, really deep into the details and guts of operating Riak at the red-line from your time at Voxer.


Will Schemaless ever be released open source? It sounds great...


Does Uber still use Python for backend services?


> Which is really just what every company at these scales do

No they aren't. You need to understand that databases are being used in a whole range of different ways than you think.

Batch/streaming analytics are typically done with HDFS, Cassandra, HBase, MongoDB etc. Event aggregation often with a time series database like Druid, InfluxDB etc. Web API serving layers are still the domain of lightweight SQL databases like MySQL, PostgreSQL. Customer data warehouses are still dominated by Teradata, Oracle, SAP etc.

What you misconstrue as a single platform is often a multi headed beast with various architectural components. Each comprising their own technology stacks.


While I mostly agree with your comment, some NoSQL databases are not suitable for analytics. Indeed, they are terrible at them.

Unstructured information is, surprise, unstructured, which means it is harder to query / analyze, since the structure needs to be fully scanned and parsed to perform the computation.

I have seen many cases where MongoDB is used for this and suffers from really bad performance.


I think the trap is simple POCS.

Do a simple POC where you ram 1 million rows in a MongoDB and then make a webapp to do some basic analytics. Look, it works and get responses within a second. Cool!

Then real data comes in at 12 billion rows and your analytics take 3 hours to run.

So you try to do the sharding thing, and realize that it works for a while, except now every analytics query needs to hit every shard....


Absolutely true, and the scale keeps getting larger > 12 billion rows, and comes up with even harder challenges.

Worked on a 1.3 trillion row (for 1 table, others varying below this) database this year for predictive analytics, and it was mind boggling the hoops to jump through to try to get that thing to run anything in a manageable time frame.

Any POC level show of a database would be about meaningless at that scale. You have to do a real test to view the challenges around it.


Which database did you use and what would you recommend?


We used SAP's HANA for a portion of the data, and then used an option on it called Dynamic Tiering which is essentially connecting Sybase IQ to HANA for a disk data store as well.

Keeping 1.3 trillion in memory is pretty expensive, so we were trying to cut some costs by using that to funnel data in and out.

Not a bad solution overall.


but do you need to access all 1.3 trillion rows to do predictive analytics? why not just use a very small sample of the data?


Yes, you need access to all 1.3 trillion rows (plus more) for the use case I worked on.

Ideally, you'd actually use more than that, however diminishing returns leaves you with that as what we were going with.


For scalable SQL products, I'd consider Google's Spanner to be in that boat. There is actually a class of products trying to do this under the name of NewSQL[0]. CockroachDB seems to be mimicking spanner's design principles and deliver a similar product[1].

[0] https://en.wikipedia.org/wiki/NewSQL

[1] https://github.com/cockroachdb/cockroach#design


Can you expand on what you mean by a single table with manual indexes? How does a frequently used table with indexes not resemble an RDBMS?


the secondary indexes are saved in a different table and (I can only assume) the application layer (or one layer above mySql) is responsible for keeping that table up to date.

If you use only 1 table, then it's by no means "relational", so I don't see why you'll need a database system designed from start to finish to support a relational model.


> If you use only 1 table, then it's by no means "relational" [...]

FYI The term "relational" in "relational database" does not refer to the relation between tables but to the mathematical concept of a relation which is some set of tuples ie a single table.


Well, the relational model does not describe or contain a notion of table really, so I guess you could model it in any way you want, even within a single table.

But I think that's a mute point, I doubt they kept any part of the relational model inside their schemaless database.


Probably just an auto-correct mistake, but you mean "that's a moot point".


Autocorrect, don't take it for granite.


The "relationalness" is/was not really important here. It's all about MVCC and how storage engines handle it.

Postgres is lacking in these scenarios, whereas a particular fine tuned version (or fork) of InnoDB (or MyRocks or whatever they end up chosing) handles this better.

See Facebook's "mysql-5.6" branch, that has hundreds of patches piled on to support especially these taxing workloads.


Well, for that matter, as I understood it, they also have no transactions nor atomicity and are basically eventually consistant , though these I'm simply inferring from their posts, it was not stated as far as I can remember.

So MVCC also has almost zero bearing on what they are doing.


Just because you may not always use the database with the BEGIN/COMMIT semantics, that absolutely does not mean that "MVCC has almost no bearing on what they are doing".

MVCC and how it has been implemented in the two storage engines in discussion here absolutely has a lot to do with it.


If it was just about NoSQL vs RDBMS they could have used Postgres as a key-value store instead of mySQL. So yes they absolutely did endorse mySQL over Postgres.

Also, what is the scale of Uber that you think is unsuitable for RDBMS? Uber does a million rides per day but these rides are very predictably local. I don't know enough to make any firm claims here but at first sight this doesn't look like the sort of scale that is unachievable on an RDBMS in principle.


They endorsed MySQL over PostgreSQL for a type of structure and access very few people use. I don't see how the query language plays into the recommendation at all.


Well, Uber claims they are using MySQL because it's a better NoSQL database than Postgres.

From the context, I don't think they ever evaluated Postgres NoSQL features. But I don't think they would get a different conclusion if they did. MySQL trades some consistency guarantees for speed, and it looks like Uber doesn't need those extra guarantees anyway.


"Also, what is the scale of Uber that you think is unsuitable for RDBMS"

Never said it's unsuitable, I have no idea. Just said the move from relational to NoSQL is what most billion dollar companies with mass-market apps or websites do.

Many of them also do sharding on normal RDBMS.


Potentially the most useful part of this post to me was this part

> 2ndQuadrant is working on highly efficient upgrades from earlier major releases, starting with 9.1 → 9.5/9.6.

I hadn't heard of that before. Anybody know more about this? I'm currently babysitting a 9.1 deployment which we desperately want to get upgraded. The amount of downtime this can tolerate a very limited and I was currently tasked with coming up with a plan. Its going to get hairy. If such a tool is really on its way, I could make a case for holding off on the upgrade for a few more months and save quite a bit of work.


That paragraph might be clearer, but Simon is referring to supporting logical replication from 9.1 to recent releases, which essentially means backporting logical decoding (added in 9.4) back to 9.1. There's no way this could get to official 9.1 branch, as that'd be a major risk / behavior change.

Making "zero downtime" upgrades possible is part of the whole logical replication effort - both within the community and in 2ndQuadrant in particular.

Petr Jelinek actually described how to do that using UDR (uni-directional logical replication) in 2014:

https://wiki.postgresql.org/images/a/a8/Udr-pgconf.pdf

There might be a newer talk somewhere, I'm sure he spoke about it on several events.


you can use pg_upgrade with -k - it will complete within seconds. Afterwards, things will be slow until a complete analyze updates the statistics, but the update itself can be done in seconds.

I have updated ~2TB of database from 9.0 all the way to 9.5 over the years.


The problem with this is that if anything fails, you can potentially corrupt your data and have no backup plan. To make that option safe, you would have to copy your data directory first, and you need to be offline for that. So you have to add the time it takes to make that copy.


This is why I ensure that the slaves are up to date, then disconnect them, pg_upgrade the master and resync the slaves (which is required anyways). If something goes wrong, I would fail over to the slave.

Also: You don't need to be offline to copy the data directory. Check `pg_start_backup` or `pg_basebackup` (which calls the former)


That requires the master and slaves to run different versions for a while. And that is not possible with stock postgresql, is it?

Regarding your second point, I meant copying the data directory as in a 'cp' command. Or rsync if you will. The functions you mentioned are only useful when doing a dump, isn't it? And recovering from a upgrade problem using a dump is way slower than just starting the previous version in the backup data directory.


> That requires the master and slaves to run different versions for a while. And that is not possible with stock postgresql, is it?

Yes. That's not possible. But if I announce the downtime, bring master and slave down, migrate the slave and run our test-suite, migrate the master, run the test suite again and bring the site back up, then I know whether the migration worked.

If the migration on the slave fails, well, then I can figure out where the problem lies and just bring master back.

If the migration on master fails, but works on slave, then I can bring slave up as the new master.

No matter what, there's always one working copy and the downtime is limited to two `pg_upgrade -k` runs (which is measured in minutes).

> Regarding your second point, I meant copying the data directory as in a 'cp' command. Or rsync if you will.

Yes. You execute `select pg_start_backup()` to tell the server that you're now going to run cp or rsync and to thus keep the data files in a consistent state. Once you have finished cp/rsync, you execute `select pg_stop_backup()` to put the server back in the original mode.

This works while the server is running.

If you don't want the hassle of executing these commands, you can also invoke the command-line tool `pg_basebackup` which does all of this for you.


Any chance you have a blog or website where you could write up/post an example of this entire process? It sounds like the details that you've posted above would be of extreme assistance to many others.


Yes, that does allow you to copy the database directory with the "cp" command. The command tells postgres to stop deleting obsolete WAL files until further notice. As long as you start your copy after you issue the command, and copy across at least all of the files that are present (as in, you can ignore new files that are created), then the data is safe. Just don't forget to tell postgres that the backup has finished afterwards.


pg_start_backup/pg_basebackup are used when doing an rsync-style copy. You'll end up with a copy of the data directory, rather than a dump. You can then start up a server instance directly in the resulting backup directory.


How do I learn all this stuff, as a person that has no reason to touch Postgres other than personal interest? I never get to encounter these types of problems in my day to day.


You don't need to be offline to make a copy of the data directory. You can do that ahead of time, keeping all the WAL segments up until the point that you make the switch.

(See also https://www.postgresql.org/docs/9.5/static/continuous-archiv...)


Solution: Have a backup plan.

;)


9.1 to 9.5/9.6 upgrade should be available for customers by Oct


I do think that you need to learn how to use in the best way the technologies you have chosen or that are present in your current setup. No matter if it is a MySQL, Postgres or any other DB, it requires as a part of a job, learn how to use at it's best. The points on the article are good, however it's true that Postgres had problems with scalability not so long ago. That's changing, however I think that other data stores have addressed the problem of availability and scalability earlier and gained maturity during the last years.

Also, there is something that caused some noise to me:

    This point is correct; PostgreSQL indexes currently use a
    direct pointer between the index entry and the heap tuple 
    version. InnoDB secondary indexes are “indirect indexes” 
    in that they do not refer to the heap tuple version 
    directly, they contain the value of the Primary Key (PK) 
    of the tuple.
That's true, but the article doesn't make explicit that the PK on InnoDB is a clustered index and, that there are other optimizations like adaptive hashing to make read queries faster.


Agreed 100%, and the author also failed to mention several other advantages of having a clustered PK and indirect secondary indexes. A few off the top of my head: reads in PK order are faster due to lack of indirection; clustered index takes up less space due to lack of storing pointers to tuples; secondary indexes will be covering (no need for PK lookup at all) if the query only uses columns in the PK and secondary index.

It is interesting/ironic to see the article complain "those limitations were actually true in the distant past of 5-10 years ago, so that leaves us with the impression of comparing MySQL as it is now with PostgreSQL as it was a decade ago." In the MySQL world, we very very frequently see the opposite -- Postgres fans bashing MySQL for things that haven't been true in 10-15 years, as well as things that simply have never been true. It certainly is frustrating, just like what the author is experiencing!

Having a favorite/preferred database is fine, but I don't understand all the extreme views -- why do so few of these articles take the view that Postgres is a better fit for some workloads, and MySQL/InnoDB is a better fit some other workloads?

Or even just an acknowledgement that the authors of these articles rarely, if ever, have a comparable amount of expertise in both databases -- which would be necessary to make a fair comparison. Yes, Uber's original article clearly shares this same problem, but at least they seem to acknowledge it more clearly than the author of this response article. Take the section on replication comparison, for example: the author is describing logical replication support in Postgres even though it's currently a third-party addon. Cool, but MySQL has all sorts of third-party replication systems too. Alibaba has implemented physical replication in MySQL. And meanwhile even in MySQL core, there are two different types of logical replication -- there's no restriction to only use statement-based logical replication as this article implies.


Might be worth someone writing "a panicky guide to installing varnish" for such database issues. Pretty embarrassing though!


Does anyone know the back story to Uber - why didn't it try to improve Postgres rather than move on to feed on another host?


In one of the other HN threads on the issue, there was mention that the switch occurred shortly after a change in leadership position of individuals with enough 'power' to impart technology changes by decree. So there is an aspect that there was some, if not much, political opinion involved as there was any technical analysis involved in the change.

Given the fact that the change which occurred is an "apples to oranges comparison" change (PostgreSQL and an SQL normalized DB table structure to MySQL and a noSQL style single table key-value store) then there is some credence one can put towards the rumor that some (or most) of the change may have come about from a PHB [1] saying "do this, this way".

[1] PHB (Pointy Haired Boss, Dilbert cartoon reference)


Most likely they don't have any engineers with the skills necessary to build/improve a RDBMS.

(note I said build, not use, different skillsets between driving a car and re-configuring the engine to run on seed oil)


> why didn't it try to improve Postgres

Honestly, why would they? If there is a product that does what they need, why spend resources improving another.

Just because it's open source and they could spend money improving it, doesn't mean a company will spend that money improving it.


> If there is a product that does what they need, why spend resources improving another.

I would hardly say MySQL does what they need without any improvements. They built an entire second platform on top of it.

https://eng.uber.com/schemaless-part-one/


MySQL does need improvements. This [0] is an example.

[0] https://github.com/facebook/mysql-5.6


MySQL does need so much improvements

https://dom.as/2016/08/03/mysql-does-not-need-sql/


Isn't MySQL owned by Oracle and closed sourced as a result these days?


Considering some of the things they wanted to improve were presented almost a decade before as problems (along with technically sound solutions to those problems), I actually think they did the best thing they could for the Postgres community.


because they couldn't find what they were looking for in Posgres forum


And we all know Uber cannot afford a couple of hours of professional PostgreSQL consulting.




actually I watched the video and the errors happend here are more cultural errors. I guess the switch is more like a "we failed, so we start new". and new means with something different.

also their replication strategy looked like a joke and not enough automation. hopefully they never use galera, else their bad engeering practices could actually suffer in huge data losses.

I've once run galera-cluster and it pretty easy came to data losses especially after short network splits which occured randomly at the network.



I came here to write a snarky comment, but now I can write two ;)

first: if you think any particular db platform is clearly a winner in "db wars", you are naive. there are so many factors involved in configuring the db, the backend, the frontend etc. that you can always find a case where: the supposedly winning db is failing, or the supposedly worse db is performing perfectly fine. and from my experience, you should always use the platform/framework/language that is best for the current project, not the one you madly love. clearly postgres wasn't working for uber. that does not mean it will not work for your project. i have a recent experience where a binary file of a programming object works much much faster than mysql and solves several other problems. would i say "use binary files instead of rdbms"? of course not. but in this one case it does wonders. the "tech-vs-tech" wars need to end, they are pointless.

second: if you cannot setup your blog to withstand an HN spike then maybe you don't have as much real world experience with scalability (albeit simple) as you might think (hint: static page cache behind cdn will make you almost bulletproof - also, with for example Azure, that's dead cheap).


That second point is a really unnecessarily belittling straw man, and I think such comments are counterproductive to the discussion.


I don't know, if you're going to critique the very talented engineers at Uber, seeing your blog fall over due to capacity doesn't lend you a lot of credibility.


First, it's unlikely that they wrote the blog software. Second, a critique should be based on the merits of the content, not the person delivering it.


On what information do you base your opinion that the engineers at uber are very talented?

So far what i have seen, the only thing Uber is talented at is violating local laws and then throwing sacks of money at it to pay fines or whatever. (and inflating their own (bubble)value, but probably not many people agree with that)

Seeing their blogs mysql-> postgres followed by a postgres -> mysql migration, doesn't give me the idea they are very talented (they still might be, but so far no data has proven me this). Talented would be to forsee these issues and to have avoided encountering them at all. At least that would be my definition of very talented.


I'm not sure what anything you said has to do with their choice of database.


Then with your own reasoning, i'm also not sure that is worth a comment of yours.

I clearly indicate their recent changing of db software twice to avoid issues (experts could solve) is an indication one in my eyes is not very talented.


Actually no, you didn't clearly say that at all. How do the experts solve the fact that Postgres' rewrites entire indexes on row updates?


"Seeing their blogs mysql-> postgres followed by a postgres -> mysql migration" this is not clear to you? Perhaps a visit to an optician is in your best interest.

And why do you make the assumption, i'm a postgresql expert to answer that question.

Though others did, so if you would have bothered to read the other thousands of comments on the matter. You would not have needed to ask this question,

PostgreSQL Heap-Only-Tuples (HOT) from: http://use-the-index-luke.com/blog/2016-07-29/on-ubers-choic...


I disagree. When you put yourself on stage to criticize another, you open yourself up to criticism. And in this case, his little blog didn't scale, and it was rightfully pointed out.

It's not terribly nice, but the comment is true, and relevant.


Criticism is good. Straw men are not ("your opinions and arguments on dbs are irrelevant because your blog fell over"), and being holier-than-thou about it only makes people stop listening to your actual criticism. That's why I think it's unproductive.


If you're stupid enough to think that blog software reflects on the contents of the blog, then you're likely not going have the critical thinking capacity to evaluate any of these criticisms anyway.


An hour or two worth of work with varnish and you pretty much solve that problem. In 2016, that's the sort of thing I'd expect anyone competent rolling their own platform to be doing. If you're not using some off the shelf blogging platform (like blogger, wordpress.com, medium, tumblr, or infinity others that are free and not too bad) then that means you are trying to prove a point in hosting your own thing. If you can't do that very well, that perhaps proves a different point than you intended.

I run varnish even on silly joke websites I set up with no traffic and literal kilobits worth of only static content, it's just so easy to do and it's a good habit to have.


>then that means you are trying to prove a point in hosting your own thing

No it doesn't. Stop projecting.


And these moralizing comments make HN dull and dreary. Compared to 7-10 years ago sometimes I feel that suddenly we're in some sort of new Victorian era.


> static page cache behind cdn

I wouldn't want my page behind a CDN. CDNs make users much more trackable across sites.

My point isn't that CDNs are bad for everyone. My point is, once more, that most questions are not as simple as they may appear.


> CDNs make users much more trackable across sites.

How does has CDN do this in a way that a "regular" web deployment wouldn't?


The same CDN is serving thousands of sites, so they can track a ton of what users do. CDNs are perfectly placed to capture and sell user stats.


A third party (CDN provider) can easily track visits entirely server-side across all of the sites that it serves. Typically this is sites owned by lots of different companies users could otherwise visit without any of those companies knowing of visits to any other companies' sites -- the CDN knows.

This also comes up when the CDN handles SSL termination.


The fact that you can find exceptions do not change the fact that in the common case, tech x is better than tech y. Unless your use case clearly is one of the exceptions most of the time, you still win by choosing the tech that produces the best results most of the time, instead of the one that produces the best results in the exceptional cases only.

So that tech-x versus tech-y is still very relevant.


I agree that you should use the best tools for the current project. That is why we need to discuss which tool is the best and in what cases, which is all we've been doing here: providing the information people need to make reasonable comparisons.

There can be clear winners in such discussions, though this changes over the years. Many people are now concluding that Postgres is a winner at the present time and usage is expanding significantly. The people I meet aren't madly in love with Postgres, they make rational choices with the best information they have. Uber posted their information in the hope others would benefit. I think they have and I thank them for it.

(Whether you forgive me or not, I don't manage our blog site, but I guess they'll be some discussions. ;-) )


re: your second point, this isn't necessarily anything to do with database scalability or tuning.


exactly my point. sometimes the solution has nothing to do with the perceived problem.


I sincerely dislike it when the mods change the positions of comments in a thread.


It's not about being the best, it's just about comparing offerings.


I came here to thank the Postgres developers for their hard work. Also their grace under such trying times.


Doesn't Postgres use mmap files internally?


No. It uses mmap(MAP_ANONYMOUS) for most of its shared memory, that's pretty much all the use of mmap.


would simply coughing up the money for an expensive oracle/sql server solution have worked in this case?


To me, the real news is that Uber ($50B company) didn't bother to engage the postgres community before migrating - they'd have jumped to support Uber.


I don't get the sense that Uber has had much stability in technical leadership over its lifetime. Big moves like this can be as much cultural as technical.


What would be better to have in resume, "we were using PostgreSQL and after some tuning it worked just fine" or "we designed and implemented scalable modern BigData realtime OLTP solution"?


"Error establishing a database connection" ^ Running on pg too?


'Error establishing database connection'

How very meta.


Fortunately they can call themselves!

    WHEN IT'S CRITICAL, YOU CAN COUNT ON US
    2ndQuadrant provides full 24/7 problem resolution technical support for production systems.
    If PostgreSQL breaks, we'll get you back up quickly.
[1] https://2ndquadrant.com/en/support/support-postgresql/


Which is exactly what happened and we fixed it fairly quickly, within the SLA we offer to customers. So I'm happy.

I accept all of the humour on that point with a grin myself, though I must say its a nice problem to have. Thanks to everybody for reading and commenting.

2ndQuadrant is a large enough company that we have CTOs who write blogs and design stuff, we have other people who run blog websites and a variety of infrastructure, but mainly we have many dev and support staff helping customers.


The site is powered by Wordpress, which, to the best of my knowledge, can work only with MySQL. Now that is very meta.


I believe the blog site does run MySQL. We eat our own dogfood wherever possible, especially on important services... we hadn't regarded the blog site in the same category until now.


It can work with MariaDB as well, though it's just a Fork of MySQL so hard to say if that counts.


HNHOD: Hackernews Hug of Death.


"Error establishing a database connection"

Genius !


It seems that after lots of hyped NoSQL systems companies are still using MySQL or Postgresql as simple storage backend with lots of custom crutches over it, like it's 2007. So all these cassandaras and riaks failed expectations?


This is one of the more ridiculous posts I've seen on HN.

"Companies" are taking a whole range of approaches to storing data. With a combination of NoSQL, SQL and Filesystem e.g. HDFS and everything else in between. Cassandra in particular is killing it right now under the stewardship of Datastax which is why they've grown from 1 person up to 400+ employees.


Companies, outside the startup scene, are still using Oracle and MS SQL Server, and for the most part are pretty happy with it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: