Hacker News new | past | comments | ask | show | jobs | submit login
PostgreSQL: the good, the bad, and the ugly (lwn.net)
177 points by pykello on May 21, 2015 | hide | past | favorite | 76 comments



> The experience of Firefox and Chrome is that long release cycles actually decrease quality, precisely because of the dynamic on display here: long release cycles create huge pressure to land under-done features just before a deadline. With date-driven short release cycles, if you're not confident in your feature you'll be happy to just slip it to the next release. It greatly reduces stress on developers in general.

This comment really resonates with me, and wasn't something I thought of -- the pressure to deliver when your release cycle is so infrequent is something I hadn't noticed before but once mentioned, realize it's absolutely true.


Unfortunately, it seems like short release cycles can also lead to buggier released; after all, who cares, it'll all be fixed in eight weeks or whatever, right? It's not like people will be using the broken thing for a year.

Of course, the next release will then have a _different_ minor bug. Then the pattern repeats.


I don't think that long release cycles are the solution for this, though—major x.0 releases are well-known for often being buggy themselves, after all. I think that buggy releases speak more to problems with general project management and priorities than to the release schedule used.


They can, but in my experience they don't.

If you release every two weeks and something isn't fully signed off at the end you don't release it - it's only two weeks until the next one. If you release every three months you cram in everything you can because it'll be at least three months until the next release.

Of course it's only true if you treat releases as an opportunity, not a constraint. If you don't promise dates but keep the opportunities frequent you get more done.


This is not how it works out in practice, at all. In practice, the slow release cycle works out worse.


Missing a train isn't a big deal if it's every 6 or 8 weeks, so you can delay or stay feature flagged off until the next train leaves the station.


In my experience, this is not the case if you have short release cycles combined with hard commitments. I work at a company that does enterprise software that releases every 8 weeks, and there are still problems when we commit to shipping a feature for a particular release.


The nominal word being commitment. If you commit to the feature you haven't been able to take advantage of the short window to make "letting it slip" a less painful option.


I agree.


> the problem would be in supporting many more versions

things like backporting fixes and maintaining upgrade paths are extremely expensive in the long run from my experience, something that I think short release cycles would exacerbate.

I think it comes down to what's mentioned in response to the comment about short release cycles: people are reluctant to upgrade an RDBMS because of the risk involved and expense to test compatibility with a new release.

I dunno maybe the problem is finding the shortest release cycle possible without putting people off and adding too much overhead?


I can't remember where I read it but someone stated something similar about train scheduling (I'm paraphrasing here); When the train departs every 15 minutes getting the train becomes a non issue, when the train departs once an hour it becomes an ordeal.


Short release cycles don't solve this problem. If anything, they exacerbate it -- you can just fix any bugs in the next release, right?

On top of which, they incur heavy user cognitive costs when tools change every few weeks. Short release cycles seem to be more about making developers happy by alleviating them of a weighty responsibility, rather than making users happy by shipping something that stands on its own as a solid, well-made product that doesn't need a stream of constant updates.

I want software to behave more like reliable products that have survived the decades in the real world: they work as advertise, and they keep working. This is something PostgreSQL has achieved for many years.


Users want magical flying unicorn ponies, but short release cycles still work out better than long ones, because you can delay questionable features just a few weeks instead of a year.


That might make sense for the next omgchatpopapp, and it might make the marketing department happy to have a constant stream of feature noise, but that's not a universal truth, and I'd rather optimize my practices for practicing my craft with responsibility and care for the future.

Churn doesn't make sense for PostgreSQL; people who rely on software as a tool to accomplish their work generally do not benefit from incomplete poorly considered "minimum viable" solutions and a constant stream of instability. Strategies like continual delivery -- or focusing on MVP -- optimize for quickly proving/disproving user traction at a minimum possible expenditure, as to more quickly reach a stage where further growth can be funded via more investment.

In the process, these strategies incur a high percentage of false negatives -- including ideas that could work if given more consideration and care. These strategies are ultimately about exit events, not making well-crafted products that stand the test of time. Ironically, the latter is what often leads to much more substantial renumeration.


Your hypothesis looks compelling written down, but it's entirely wrong in practice. This is the process where Linux comes from too, and Postgres is on that order of system software produced by an open source process.

Your error is the erroneous assumption that centralised control is even feasible. This assumption has been proven wrong repeatedly. You're assuming the bazaar development method is riskier, but the certainty offered by the cathedral method is that of project overruns and likely failure. Unless you fund it sufficiently for NASA-like certainty, which you're not doing.

(Note that Postgres has been done much more that way all along, and is famous for quality; the crappy alternative, MySQL, which is in practice a product showing the sort of qualities you describe, was a one-company project for most of its history.)

Go reread The Cathedral and the Bazaar http://www.catb.org/~esr/writings/cathedral-bazaar/ and note that it's been proven pretty much entirely correct in practice. (By the way, "agile" at its best is literally an attempt to port successful open-source development to commercial practice.)


I don't understand your classification of PostgreSQL as "the bazaar", and MySQL as "the cathedral"; if anything, MySQL exemplified the bazaar model of incorporating almost any half-baked idea in the rush to regularly ship features, while PostgreSQL has always taken the approach of shipping when it's done.

PostgreSQL's careful approach has always required "cathedral" centralization of technical management to ensure that things are done correctly, or not at all.

I also have to question the assertion that Linux provides an objective example of well-written software, when the code quality and issues present in Linux are most reminiscent of MySQL.

Relative to a so-called "cathedral" model such as FreeBSD's (or PostgreSQL's), Linux's software development model produces:

1) Code of considerably lower overall quality than that of FreeBSD in terms of bug count, maintainability, and simple consistency.

2) Poor (and often immediately replaced) architectural design designs that must be supported indefinitely.

3) Additional cost levied against downstream consumers of the product; simply shipping a reliable kernel requires considerable effort on the part of downstream distributions.


When you're maintaining a database server, though, the pressures of keeping it up are much greater than a simple web browser. It may take several weeks just to do all the regression testing on the development and staging servers, several hours to do a dump from the production server, running the manual migration steps, and then bringing the server back up. A six to eight week train type schedule for a database would be more than a bit painful for an admin, especially if point release bugfixes are ignored in favor of just putting the patch in the next release.


It's one of the central platforms of open source software: "release early, release often."

Time-based releases work to get your software in order. See Linux and Libreoffice as well. https://en.wikipedia.org/wiki/Release_early,_release_often


I lurk (mostly) on the Postgres general mailing list (not the hacker list), and I recognize a lot of the names in that article. I've got to say that the helpfulness and civility of the folks there is amazing. I'm sorry to hear there are tensions around the release process. I'm glad the article expressed that in spite of those tensions, there is basically good will. That fits my impression of the community. I don't write much C, but I've got a few features I'd love to find time to contribute. I'd be honored to be a part of that project.


Nothing quite like seeing the sausage get made. But, the open discussion, while perhaps unnerving, is certainly preferred (IMHO) to a closed source development process happening behind closed doors. A bit of chaos is the price to pay for the open process.

Command-and-control does have some advantages. And maybe as Postgres grows, the advantages related to a more orderly process become more visible. (For the record, I think Postgres is a fantastic RDBMS + JSON store)


Placing JSON store at equal footing with RDBMS is a little bit loaded.


I think the PostgreSQL Development Group does a great job overall. If they indeed have a reputation of being hard to work with, all the better in my opinion if it means guarding the integrity of the Project.


> if it means guarding the integrity of the Project

That's a big "if." There are many projects that have been controlled by "hard to work with" developers, and it's not always a good thing. Even if the project doesn't suffer significantly from it, that doesn't mean it's helping the project either.

That may not be the case here, but just as a general statement, you can't chalk up "hard to work with" as equating to "focused on quality" and/or "good for the project."


It's open source with a very liberal license. If someone is too difficult to work with, the rest will just fork and move on with life. Do you think the fact that people stick around, and have a respectful discussions means that the guys are not too hard to work with, and the values they provide outweigh the challenges of working with them?


My comment was more on the general "being hard to work with is a sign of skill" idea that pervades the industry.

> If someone is too difficult to work with, the rest will just fork and move on with life.

This is a gross simplification.


A prime example of handjamming inconsequential happenings into a good/bad/ugly blogpost format. If this is what counts as "the ugly", they're doing pretty dang good.


... for PostgreSQL developers, not users.


From the "bad" section:

Robert Haas asked for suggestions toward the solution of some nasty data-corruption issues associated with the "multixact" feature [...] the relevant multixact changes were merged during the 9.3 development cycle. The 9.3 release happened in September 2013, but the fallout from that particular change is still being dealt with.

there is concern within the PostgreSQL community that its well-earned reputation for low bug rates is at risk


But it is a look at the process that eventually impacts users. It explains why upset has taken so long. A bit of chaos and a bit of politics.


I don't think upsert has taken so long because of the community and processes, but because the Postgresql team has high requirements for it: http://www.depesz.com/2012/06/10/why-is-upsert-so-complicate...

For example, upsert that is worth releasing should:

* Work with all unique indexes that may exist on a table, not only a primary key,

* Be correct even at highly concurrent writes, at any isolation level from Read Uncommitted to Serializable,

* Be fast even at highly concurrent writes, so no table locks, locks may not be held for more than an instant, and no re-running the entire transaction in a loop,

* Never throw an "another transaction modified data" exception at Read Committed isolation level, it should always correctly succeed as an insert or update even if another parallel transaction inserted or deleted the matching row.

I'm not familiar much with the other database engines, but my impression has been that their upsert or SQL merge do not provide all of these.


Article is obviously for PostgreSQL developers BUT my gripe as a user (coming from MySQL).....the hardest part about transitioning from MySQL to PostgreSQL isn't PostgreSQL itself but transitioning from phpMyAdmin to phpPgAdmin. phpPgAdmin seems like it's trying to make my life difficult.


Stop using web-based admin tools, and learn how to use the CLI interface.

I mean this in the best, nicest possible way; we've all been there. But learning how to use the CLI will pay dividends in the short and long run.


I'm skeptical that the CLI can do everything I'd want, but here goes. Is it as easy to use as phpMyAdmin when I want to:

- Change the sort order? (PMA: Click a column. CLI: Edit your query to add an ORDER BY clause?)

- Edit a value? (PMA: Double-click and edit. CLI: Check the primary key and write an UPDATE query? Hope not to make a typo when writing the key.)

- Look up the row corresponding to a foreign key? (PMA: click it. CLI: write a SELECT query?)

- View the extreme values of a result set? I.e. I run a query which returns 200 results, and I want to look at the first few and the last few. (PMA: home and end. CLI: scroll?)


All of those can be done in the CLI and quickly.

CLI has a learning curve but what many will say is that we are SO GLAD we learned to not use our mouse to work with our data base and especially our SQL code.

The reason why I like using CLI is I can use GIT or any version control and make sure I have a clear history of my SQL commands and everything I did.

TL:DR Database Vendor with a Windows Server product screw with the logs and put the blame on me. I had a version control of all our history of CLI commands. Vendor employee or manager changed our log files externally. We won big court case (Before trial of course). Wouldn't have happened if I used the GUI tools.

Long story short. The vendor at the library I want was 100% evil. I asked by phone about two things. I emailed the engineer after the phone call for the two questions. BOOM I lost one month of work unless my backups work. Boom I lost half a day of work for 2 librarians. Vendor tells my boss and my school it will cost us $5,000 due to my mistake for them to fix something (Took them 15 minutes work on the phone due to their error).

My boss and college have a good relationship with me and we go on a conference call. Vendor tells a very convincing story and logs how I screwed up. They don't have records of my phone call and have 100% confidence that their employee would never okay both those things. I call them on the table. I have the email with the answers that the engineer gave me. BOOM their Engineer VP jumps in and says see if you did what he told you nothing bad would have happened and our logs show ... I than email my version controlled CLI commands for the past 6 months and show my commands that I did that day and that if the logs on their end our different they have a serious problem internally.

Long story short we won the court case and we got over hundreds of thousands of dollars back from that vendor.


Nice example of how learning the language has some intrinsic advantages that quickly outweigh the inconvenience of having to learn the language. It took me awhile to learn SQL myself and so I dread a bit having to teach it...but while I love spreadsheets...it is incredibly tiring having to create tutorials in which you explain which button to click and where and in what specific sequence and to make sure you've dragged the mouse so far and held Ctrl (or Cmd depending on what OS you are)...nevermind the problem of troubleshooting (what did you click before you clicked that button and did you left-click or right-click it and, btw, what version of Windows are you on?).

SQL is difficult to learn for new programmers...but I find that that hurdle quickly pays off...no one ever keeps track of how much time they lose to struggling with the GUI because, well, doing things with the GUI are generally very hard to track. And it's a useful revelation, that being explicit, that having the ability to express exactly what you want, has a certain power to it. It's like teaching how to read and write to people who've so far learned only how to communicate in emoji


> All of those can be done in the CLI and quickly.

I'd love to know how.


I use psql exclusively but I think those are good questions and I'd love to see someone give you an answer. Personally I don't feel much friction using psql even for "exploration", but I can see how a GUI would let you more easily follow data from one table to another.

EDIT: I've just barely tried it, but it looks like you could say `\g |vim -` (with a csv plugin) or `\g |csvlook` (not great) and then browse/sort/aggregate the data from there. You probably need to change the output formatting though, because it looks like \g doesn't automatically give you something easy to parse. I have a feeling an interactive command-line CSV reader (less(1) for csv files) would be really useful to a lot of people, but I don't know of anything like that. csvkit isn't interactive as far as I know. csvtool dies on big files. Any alternatives? It would be the perfect thing to combine with \g.

EDIT EDIT: Or just \pset pager and you don't need \g at all. . . .


I would try: First One Hour: https://www.khanacademy.org/computing/hour-of-code/hour-of-s...

Also these two seem decent: http://sol.gfxile.net/galaxql.html

http://sql.learncodethehardway.org/

The Hard Way has semi-decient issues.

I would say SQL is easy to learn and would take only a few days to get the basics and than just learn as you come to stuff your missing. SQL is a very tiny language.


So, when you read these parts of my post:

> Edit your query to add an ORDER BY clause?

> Check the primary key and write an UPDATE query? Hope not to make a typo when writing the key.

> write a SELECT query?

did you think that I was saying: "I think this is how I would do it in SQL, if only I knew SQL. Won't you please point me to somewhere I can learn SQL so that all my problems will be solved"?

Because what I was going for was: "here is how I would do it in the CLI, but I much prefer how I would do it in PMA. Please tell me if there's a better way to do it in the CLI". I know SQL, I thought it was obvious that I knew SQL, and pointing me towards SQL tutorials is super unhelpful.


Go to Stack Overflow. Yes the reason why no one gave you your SQL code is we know that the GUI is just making SQL code for you. Sorry we probably have the mindset of we pointed you to the answer and we are not going to give you specific answer to your questions.

I googled searched by copy and paste your question for 5 seconds for each question:

> Edit your query to add an ORDER BY clause? http://www.techonthenet.com/sql/order_by.php

> Check the primary key and write an UPDATE query? http://stackoverflow.com/questions/3930338/sql-server-get-ta...

> write a SELECT query http://www.w3schools.com/sql/sql_select.asp

I think your issue is you are biased against CLI and think you get more benefit from a GUI. We can convince you what is best for you. We are just saying everything you pointed out is easily done in CLI with SQL commands. We believe everyone who works with databases should use CLI for a number of benefits because of our own experience.


...what part of

> I know SQL, I thought it was obvious that I knew SQL, and pointing me towards SQL tutorials is super unhelpful.

makes you think that I just need really basic SQL tutorials?

I get the impression that you're glancing at my posts, picking out a few words, making up a question that someone else might have asked using those words, and answering that question instead of the one I'm actually asking.

This is incredibly frustrating for me. As such, I decline to continue this conversation.


No I am being super Ironic. You are listing EASY to do SQL commands as things that you don't know how to do. You than ask use to show you.

What is your question besides show me how to do 1, 2, 3?


If you think that a click is "easier" than the CLI because, presumably, a click is just a click...then the features of the CLI are not for you. The point of a CLI, and programming in general, is the ability to be explicit with the software and the machine, rather than learning to use the buttons laid out for you by a front-end designer. Sometimes, the latter is good enough, of course.


I think a click is easier than the CLI in some situations, such as when I'm looking at a table and I want to edit a specific value in it. (A curses interface would also be better than a classical CLI for that.)

Note that jperras said "Stop using web-based admin tools". I think the CLI isn't good enough to fully replace them, but I don't think that one should stop using the CLI. I use both.


I apologize for coming off as unnecessarily snooty...however, I interpreted your challenge as kind of broad and, at first glance, missing the point of a CLI...("I'm skeptical that the CLI can do everything I'd want")

The CLI is supposed to let you do more of everything you want, versus a GUI, at the cost of being more complicated...but that's programming in general...programming is always harder and more tedious than the pre-designed solution, but people go the programming route when they need to ability to go beyond anticipated use cases. But if by "everything I'd want" you're referring to not just functionality, but overall usability...no argument, the CLI doesn't provide that. Although I'd argue that at a certain point, the CLI is easier when you need performance and peace of mind that something happened even if it is not easily visible via the GUI.


> View the extreme values of a result set? I.e. I run a query which returns 200 results, and I want to look at the first few and the last few. (PMA: home and end. CLI: scroll?)

Well, you could do that; if you know you are interested in extremes to start with, use ORDER BY and LIMIT to just get the extremes in the first place (or, for more refined concepts of "extremes", consider the use of window functions.)


> I'm skeptical that the CLI can do everything I'd want

You do understand the phpXAdmin's are built on top of the CLI and the same API?

If anything, you can do more with the CLI than with a higher-level abstraction tool someone has to build logic paths/workflows into.


I do understand that, but I want a tool that works with me, not just with an API.


>- View the extreme values of a result set? I.e. I run a query which returns 200 results, and I want to look at the first few and the last few. (PMA: home and end. CLI: scroll?)

SELECT TOP ORDER BY ?


You mean, run the (possibly expensive) query twice with different sort orders, and not see the middle ones at all?

If that is what you mean, I think the advantage remains with PMA.

Edit to clarify: I'm thinking here in terms of data exploration. I want to look at the whole result set, but the most interesting ones are likely to be at the top and bottom. With PMA, I can get the whole result set, and nothing else, on a single page. So I just scroll to the top or bottom of the page to see the extreme values.

With a CLI, it can be difficult to find where the output of that specific query started. (This is a trade-off, because the CLI allows you to see results from your previous queries.)

If there was a CLI feature like 'view the previous result set in less, without re-running it', I think that would satisfy me on this point. If you have to re-run it, that's less good, but it would beat scrolling.


If a default pager isn't set for you, you could set it. http://stackoverflow.com/questions/14474138/customize-pager-...


Thank you! This is a pretty good solution, and it turns out it exists in mysql too. (`pager less`)


> You mean, run the (possibly expensive) query twice with different sort orders, and not see the middle ones at all?

Or run it once into a temporary table, and then query the temporary table however you want to get subsets of interest. Which, if you are doing data exploration on the potentially large results of a potentially expensive query, is probably what you should be doing, no matter what tool you are using. Because it lets you do a lot more than just check the top and bottom easily.



I'm not a DB/pg guy but I think you want SELECT INTO TEMP which will create a temp table from your query (gets dropped at the end of the session).


Or, as a middle ground, JetBrains have a nice SQL IDE: https://www.jetbrains.com/dbe/


But not released yet? I clicked the link saw that I could only get it through an early access release program.


For some reason the email you get sent when you sign up for the early-access program also trips Thunderbird's scam warning feature (in addition to the form being submitted over HTTP, despite the site being served over HTTPS). I wonder if that's because the message contains the string "Hi Friend"?


Google "jetbrains dbe dmg" and it'll link you their knowledge base article with download links to the latest version. It's seriously a fantastic program!


Yes, it is very good and works with anything not just PG. Of course pgAdmin is quite good too.


If you're curious to get hands on with the CLI here's a couple of shameless plugs on psql the Postgres CLI (http://www.craigkerstiens.com/2013/02/13/How-I-Work-With-Pos... and http://www.craigkerstiens.com/2013/02/21/more-out-of-psql/).

If you're really stuck on a GUI editor give JackDB a try if you want something web based or Postico a try if you want a native GUI.


I've used psql for years but never knew about \e. Lately to work on big queries I've been doing a lot of \i foo.sql (while editing foo.sql in a separate window), but maybe I'll see if I like \e better. Thanks for sharing!


CLI is obviously the best choice, but as a nice middle ground, PGAdmin is helpful in making the transition.


The community tooling around MySQL is impressive compared to what's available for Postgres. I've heard the same from people who have switched from SQL Server and miss the SQL Server Workbench suite.

My hunch is that this gap exists because MySQL is often the default DB choice for new developers, whereas Postgres tends to be used by people already familiar with database CLI tools. Once you learn how to use the command line effectively, you'll be able to work a lot faster and won't want to use a web interface again, which is to say that you will become yet another reason for phpPgAdmin to be neglected.


Despite the negative replies you've gotten, I would actually agree that PostgreSQL admin tools are, by and large, not as good as MySQL ones. But that is changing - and I was happy to drop $10 on Navicat Essentials for PostgreSQL a while back.


For a web interface, try TeamPostgreSQL[1]. Beats phpMyAdmin, imho.

[1]: http://www.teampostgresql.com


Your real problem is that MySQL won the network effect early, even though it's just terrible.

The only thing we let people use MySQL for now is third-party frameworks that pretty much require it (Magento, MediaWiki, WordPress, Drupal). In fact, we're moving our Oracle to Postgres, with great success. (Oracle is fantastic at storing your data and giving it back, but everything else about it is the opposite of "agile".)


Have you tried pgAdmin: http://www.pgadmin.org/


For more generic use, I really like DBeaver, as it works with most RDBMS, including Oracle, SQLite,PostgreSQL and MySQL:

http://dbeaver.jkiss.org/


That's like passing up a great car because you don't like the color of the steering wheel.


mm... I'd say it's more like passing up a great car with manul transmision because you only know how to drive an automatic.


http://www.pgadmin.org/

this is a really useful GUI tool for interacting with a postgres database.


If you're serious you never use a tool like phpMyAdmin, SQLYog and the like are way better...


I agree that SQLYog is a far sight better than phpMyAdmin, but it doesn't work with Postgres. What's the SQLYog for Postgres people?


Navicat, DB Visualizer, pgAdmin


If it starts to suck like MySQL, will it become popular like MySQL?

I think that'd be a fair tradeoff.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: