Why Use Postgres - Part 2

boyter · on May 7, 2012

Having recently been bitten by a 20 hour schema change to a very large table in MySQL I have been looking at Postgresql more seriously.

My biggest issue with it having used it for other projects is getting the dammed thing setup and working to begin with. I could never find a decent tutorial (or rather one that fits my mindset) of how to do the following,

1. Install 2. Setup users, including how to set it up on a local development machine with 'root' user who can do anything (saves time in dev) 3. Import/Export SQL/Backup files

I managed to do all 3. Once.

I'm sure its out there, and I would switch in a heartbeat were these steps much easier to work out. As it is I can just copy the table, run the alter in the background and rename the tables the next day. Slower, but doesn't cost me any time and works.

moe · on May 7, 2012

The bootstrap is not really hard when you've done it once, just jot it down somewhere.

In my experience most people stumble because the network security in postgres is pretty tight by default.

This is easily fixed and needs to be done only once.

First: Find your pg_hba.conf. It's in the database-directory, that's often linked to /etc/postgresql.

  # Backup the original
  $ cp pg_hba.conf pg_hba.conf_orig

  # Now replace it with our desired security settings
  $ cat >pg_hba.conf <<EOF
  # Require password auth from remote hosts
  # localhost and local socket are trusted
  host  all all 0.0.0.0/0 md5
  local all all trust
  host  all all 127.0.0.1/32 trust
  EOF

  # restart
  $ /etc/init.d/postgresql restart

From now on you can connect as any user from localhost without a password. Thus, we can now just go about our business.

  # connect as user postgres (the super-user)
  $ psql -U postgres
  postgres=# create database dummy;
  postgres=# create user bob with password 'pony';
  postgres=# grant all on database dummy to bob;
  postgres=# ^D

  # Now you have a database that user bob can use.
  # From remote he will have to use the password 'pony'.
  # From localhost no password needed because of our pg_hba settings above.
  $ psql -U bob dummy
  bob=> create table ...

jeltz · on May 7, 2012

Note: You need not restart PostgreSQL after changing pg_hba.conf. This is enough:

  /etc/init.d/postgresql reload

justinjlynn · on May 8, 2012

I would be wary about trusting all connections from localhost. As vulnerabilities are many, getting a local unprivileged shell isn't exactly hard and as opening a socket connection isn't exactly a privileged operation.. you could expose yourself to a nasty bootstrap attack. Probably better to trust the local unix domain socket, and make it accessible by root only... if you really need it.

moe · on May 8, 2012

I would be wary about trusting all connections from localhost.

The only place where this is an issue would be hosts with multiple untrusted users, and these are becoming very rare.

getting a local unprivileged shell isn't exactly hard

Sorry, that's nonsense. The entire internet relies on the fact that this is relatively hard, unless you neglect basic security precautions.

justinjlynn · on May 8, 2012

You make a good point. Conversely, a part of basic security policies is to present as little attack surface as possible. Ensuring that you're not trusting local connections with root equivalent access to your database is a good way of doing that. Modern GNU/Linux distributions have many thousands of packages installed, some of which with fairly open Internet access. Could you put up a default deny firewall to prevent that? Yes. Given that, in most systems there will always be holes punched in those firewalls which allow attacks through as those machines are typically used to serve traffic and do actual work. If you are exploited by a vulnerability in some random package you have installed then you /will/ have potentially multiple /untrusted/ and /hostile/ users on your system. Now, assuming there isn't a local privilege escalation attack to which your kernel is vulnerable, they'll be looking for services which treat local sockets as trusted and attempt to bootstrap from there. A vital concept of basic security precautions is that one doesn't just prevent attacks through secure defaults and front-line security precautions (i.e. firewalls), but one must both contain and detect successful penetrations when they do occur. You must have defence in depth, and not just a hard shell of security, because you can never really tell which one of those protective safeguards will fail due to bugs, human error, or technical incompetence.

moe · on May 8, 2012

Now, assuming there isn't a local privilege escalation attack to which your kernel is vulnerable

That's an invalid assumption (extremely unlikely).

You must have defence in depth, and not just a hard shell of security

When you can't trust localhost anymore then your defenses have long failed.

justinjlynn · on May 8, 2012

I believe the null hypothesis should be that you assume there are vulnerabilities in those systems. Your assertion that it's extremely unlikely that local privilege escalations exist is demonstrably false. In fact a quick query of the CVE database shows a large number of /known/ vulnerabilities ( http://web.nvd.nist.gov/view/vuln/search-results?cves=true&#... ). Regarding the statement that if you can't trust localhost then your defences have failed... well, duh. The point is to limit the damage by having multiple layers of defence. If a script kiddie manages to use UltraPWN2000 to get an unprivileged shell on your box, you don't want them to be able to easily just drop all tables if they're not smart enough to use a local privilege escalation (assuming they can compile locally anyway). You won't be able to prevent all damage from a skilled attacker but you should attempt to mitigate it and give yourself time to respond to it. You don't trust your locked front door to protect your valuables, you put them in a safe.

moe · on May 8, 2012

Your assertion that it's extremely unlikely that local privilege escalations exist

You misread me. I meant the opposite.

Your argument was based on the premise "assuming there isn't a local privilege escalation attack to which your kernel is vulnerable". I said this premise is invalid (as you just confirmed yourself).

You don't trust your locked front door to protect your valuables, you put them in a safe.

The front door is your firewall. The safe is your host. When someone breaks into your host then it's game over. When you have too much spare time you can attempt to layer further at the host-level but that's usually an exercise in futility.

justinjlynn · on May 10, 2012

I can see that we don't fundamentally disagree on many issues. Attempting to protect from all on-host attacks is pointless. Though I don't disagree with you in the majority of what we've discussed, I will always stand up for simple changes to settings that make it just that much harder for unauthorised users to cause trouble. That said those changes must not add too many onerous access requirements for authorised users. Of which, I don't believe asking people to have sudo access to your machine to have super-user access to your database is one.

wiredfool · on May 8, 2012

Alternately, you can use sameuser privs at the local socket level, which will allow unix user foo access to Postgres user foo without a password, but only on the domain socket. That's probably the best combination of ease of use and moderate security for a dev system.

boyter · on May 7, 2012

That looks too be exactly what I have been missing. I have yet to see details like that not spread out over 5 pages. Thank you!

moe · on May 8, 2012

Yes, the docs could use indeed a few lighter tutorials for people starting out.

However, once you're over that initial learning hump and start looking for more specific things you'll quickly notice why the postgres manual is often cited as being one of the best documentations ever written (inside and outside the OSS world). It's really that good - once you're familiar with a few basics.

its_so_on · on May 8, 2012

How can I take this seriously.

However, once you're over that initial learning hump...you'll quickly notice why the postgres manual is often cited as being one of the best documentations ever written

"It's really that good - once you're familiar with a few basics."

moe · on May 8, 2012

By realizing that sometimes it takes a little investment for a payoff.

Outside the official docs you can find the same number of newbie tutorials for postgres that you'll find for MySQL. The difference is: Postgres has a stellar documentation once you're leaving the newbie stage. What MySQL calls documentation, let's just not talk about it...

its_so_on · on May 8, 2012

EDIT after a downvote.

I'm sorry if in this comment I spoke out of frustration, but I do stand by the thoughts about what a gold standard of a manual has to at a minimum provide. We're not talking about a "little investment". Read his original comment again.

if someone has enough experience with sql and enough mysql experience that they're really hitting limitations in mysql, then a merely "good" or even "ok" manual should at a minimum be able to enable that person's good-faith, dedicated attempt to start using this alternative to succeed. This is a pure horizontal transition. Postgresql is not only "also a database" but "also sql" and "a direct horizontal competitor to mysql" -- the technologies substitute each other other, often on a drop-in basis (in LAMP-type and modern stacks, especially with a framework that uses an ORM mapper that can use either). If someone is familiar with SQL and familiar with an alternative, the manual should at a minimum help the person transition to using this alternative. If nothing else. I would say this is the absolute minimum that a manual would have to provide to meet its basic mission.

At an absolute minimum, a manual MUST let you move horizontally from a drop-in competitor you have some competency with.

(I feel like, even historically, this has been missed by a lot of people.)

This is why I took umbrage with your attitude "why the postgres manual is often cited as being one of the best documentations ever written" after saying: "Yes, the docs could use indeed a few lighter tutorials for people starting out" in response to the guy who simply could not get past the installation hurdle.

This is like a stereo that someone can't turn on. They've used other stereos, but they just can't figure it out. They read the manual, but still can't figure it out. I would argue that, unless it's a real "duh, I was such an idiot" moment (happens to everyone) and isolated case, and it sounds like this isn't what you're saying, then in such a case the manual failed to do its primary mission. If the guy could have turned it on, he would have figured everything else out by himself.

If this guy can't do what he's trying to do after knowing mysql the way he's talking about and having the experience with attempting postgresql the way he talks about, I'd say that while the manual might be a fine technical reference, it is not yet complete and must not be held up as a great manual until this hole is filled.

jeltz · on May 8, 2012

It is the best documentation I think I ever have used, but is still has flaws. The greatest one being what moe stated.

its_so_on · on May 8, 2012

that's fair enough. But let's be clear. Just because I always skip the first chapter of my stereo's documentation ("turning on your stereo") since it's so obvious how to connect things and turn it on, all the ports are very clearly labeled, etc, that does not mean that any stereo manual can be considered good if it does not include that chapter :)

I think I'm basically not winning any friends with this line of thinking, but I just see this as a long, recurring problem. It would be the same with a certain open source movement's long-standing alternative to a certain very well entranched operating system: they are horizontal substitutes for each other, and yet people who are expert on the entrenched system (power users, even) have given up on the open source alternative, because they "could not turn it on". (Get to the same place they get to after a fresh install of their entrenched operating system).

I'm not going to say more because these are (whatever those-things-that-are-landmines-floating-in-water are called)-filled waters.

jeltz · on May 7, 2012

What is your platform? If it is Debian then it is as simple as this to install PostgreSQL and set up a PostgreSQL root user with the same username as your UNIX username (this to avoid having to modify pg_hba and and also not having to specify user when connecting to PostgreSQL).

  sudo apt-get install postgresql
  sudo -u postgres createuser -s `whoami`

After this you an just create a database and connect to it with:

  createdb my_db
  psql my_db

boyter · on May 7, 2012

Usually Ubuntu, so I will have a look at this. However when running an application say in PHP its going to be running under the Apache user. How do I go about configuring that?

jeltz · on May 7, 2012

I see, I usually do ruby development so the webserver is then usually running as my user on my development machine (obviously that is not the case in production).

In your case, then I would just change pg_hba to allow all local connections as shown by moe above. (I would not set any password though since that is normally not useful or necessary for local development).

KC8ZKF · on May 8, 2012

     sudo -u postgres createuser -s `whoami`

Won't this create another user named "postgres"?

EDIT: No it won't:

     Ophelia ~ $ sudo -u postgres whoami
     postgres
     Ophelia ~ $ sudo -u postgres echo `whoami`
     rich

That seems straight out of The Unix Hater's Handbook.

caf · on May 8, 2012

The idea that commandline expansions like $VAR, `backtick` and wild*cards are done by the shell, before the program is invoked, is not particularly esoteric is it?

KC8ZKF · on May 8, 2012

No. But, I had to think about it.

wiredfool · on May 8, 2012

As for backup, pg_dump dbname > db.dump will do the dump, psql -f db.dump dbname will do the restore. There are, of course, far more complicated ways of doing it, including doing it in binary format, or data / schema only, or individual tables, but those are all a flag or two away.

lobo_tuerto · on May 8, 2012

Ok here is a guide I wrote for installing & setting up Postgres in Ubuntu. It's in Spanish but you will find your way around:

http://lobotuerto.com/blog/2009/07/20/como-instalar-postgres...

It covers instalation, setting up the postgres user (the _root_ you want), changing the authentication schema so you can use it for Rails dev (or something else), and finally how to install the Postgres gem for your Ruby.

sandGorgon · on May 8, 2012

Actually a bigger question is how are people migrating from their Mysql databases to postgresql. I'm quite surprised on the lack of a standard, production grade, defacto mysql2pg tool (and sql server as well).

caf · on May 8, 2012

pgdump to export as SQL, just pipe into psql to import.

thristian · on May 8, 2012

There are some caveats about piping SQL into psql:

http://petereisentraut.blogspot.com.au/2010/03/running-sql-s...

sheff · on May 7, 2012

One of the other nice things about Postgres is the pretty much annual release of a new version. 9.2 should apparently hit beta in the next week or so, which will bring some cool new features : http://archives.postgresql.org/pgsql-hackers/2012-05/msg0034... .

diminish · on May 7, 2012

Are there any reasons not to use postgres, as a replacement for mysql?

timc3 · on May 7, 2012

If you have a DBA that knows MySQL extremely well and doesn't need to change. If you have a existing system running well, and the future roadmap of your development doesn't make staying with MySQL difficult.

I am totally a Postgres fan, but there are many sensible reasons not to upset the apple cart.

vidarh · on May 8, 2012

If you use multi-master replication, MySQL Cluster, or depend on properties of one of the alternative storage engines would be my big ones.

jeltz · on May 8, 2012

Hopefully PostgreSQL-XC might fix this lack of good multi-master replciation in a couple of years. PostgreSQL-XC 1.0 will be released soon.

gtaylor · on May 7, 2012

I had no clue that Postgres supported LISTEN and NOTIFY. Granted, it's a new addition, but this is really slick:

http://www.postgresql.org/docs/9.1/static/sql-listen.html http://www.postgresql.org/docs/9.1/static/sql-notify.html

pilif · on May 7, 2012

New addition?

listen and notify go back to at least 6.4 which came out 1998 ish http://www.postgresql.org/docs/6.4/static/sql-listen.html http://www.postgresql.org/docs/6.4/static/sql-notify.html

What might have changed is the implementation though. I seem to remember that some point in the past I was investigating listen/notify and the driver (libpq) was forcing you to handle LISTEN by polling (calling PQnotifies periodically) which doesn't really help compared to polling on your own.

This has not changed so far:

http://www.postgresql.org/docs/9.1/static/sql-listen.htm

states

> With the libpq library, the application issues LISTEN as an ordinary SQL command, and then must periodically call the function PQnotifies to find out whether any notification events have been received.

It might be possible for drivers working without libpq by talking the postgres protocol directly on the wire to get real asynchronous behavior, though I don't know anything about how this is being handled on the server right now (it might still be polling internally on the server end).

wulczer · on May 7, 2012

I'm not sure about long ago, but the recent version of libpq has NOTIFY working asynchronously (you can stick the connection file descriptor in a select(2) and you'll get woken up when a notify is available).

jeltz · on May 7, 2012

Hmm, pretty certain nothing has changed recently there either. At least as far back as 7.1 the documentation has remained the same about this.

http://www.postgresql.org/docs/7.1/static/libpq-notify.html

What has happened recently though is that more language bindings to libpq have started exposing the NOTIFY functionality in more convenient ways.

d503 · on May 7, 2012

One of the bug fixes for Postgres95 Release 0.03 (Released 1995-07-21) is:

* the LISTEN/NOTIFY asynchronous notification mechanism now work

http://www.postgresql.org/docs/7.4/static/release-0-03.html

viraptor · on May 7, 2012

I'm not sure how much I like stuffing random functionality into the database... Looking at the docs: "and then must periodically call the function PQnotifies to find out whether any notification events have been received" - it's actually polling via an SQL connection - that means even more persistent connections to the DB, another timer in the application, more difficult connection sharing (two pools rather than one).

Without knowing much about how it works - why would I want to use this system rather than just ZMQ/RMQ/...?

jeltz · on May 8, 2012

You do not have to poll. You can wait on the socket until you get data and only then poll.

viraptor · on May 9, 2012

I quoted that part from the documentation - the way I understand it is that I can't detect if I have data (so I cannot "wait on the socket") in any other way than by polling via that function. Is that not correct?

sophacles · on May 7, 2012

LISTEN and NOTIFY are a great feature. We used them heavily in one system I worked on. As the project evolved though, we decided the best way to use them was to have clients to both our main message brokers and the postgres server, and translate between them. Largely this was to avoid overloading the pg server as a message broker also -- particularly because for our uses dbus was a much better alternative for local system message passing. Now that NOTIFYs officially have payload support, YMMV.

fdr · on May 7, 2012

As the other posters have said, it has been around forever, but was rewritten to be much faster and support a payload in 9.0.

wulczer · on May 7, 2012

It's not exactly new, 7.4 was the oldest version I cared to check, but I think it goes back more :) http://www.postgresql.org/docs/7.4/static/sql-notify.html

A recent addition to NOTIFY is the ability for notifies to carry payloads (which makes them even more useful)

aidos · on May 7, 2012

Can anyone point me at a HOW to use Postgres guide? I know my way around MySQL and SQL Server pretty well, but Postges is fairly new to me. I did one project where I migrated several websites from one server to another but I kind of fumbled my way through it. Would love to see a guide for getting started with Postgres for those who are already familiar with MySQL.

RowanH · on May 7, 2012

The official docs are fantastic. The newer PacktPub books are also pretty good http://www.packtpub.com/postgresql-90-high-performance/book and http://www.packtpub.com/postgresql-9-admin-cookbook

timc3 · on May 7, 2012

It should be said that the older books have never been very good. I have read probably all of them, none come close to the documentation on the website.

lobo_tuerto · on May 7, 2012

You can check the guide mentioned in the article: http://www.postgresguide.com/

scotty79 · on May 7, 2012

Has slowness of SELECT COUNT() on large datasets been addressed in any way? Last time I checked it was responded with "it's slow because of how we do things here" and if you want to count records fast "you count them yourself" (not actual quotes from anyone, just tl;dr of some answers I've seen).

fusiongyro · on May 7, 2012

I kind of hate this "tl;dr" culture. I can explain the reason why this is never going to change (though it may get better with index scans) but I expect you'll just see a wall of text and decide not to read it. But if you never read long answers, you have no right to expect solutions to your problems or to understand anything. So I hope you'll read this, it may help.

PostgreSQL and most modern, ACID-compliant RDBMSes use MVCC instead of locks. MVCC has a lot of upsides, namely, you almost never need table-level locking for writes, but it ensures you need to look at the whole table to do COUNT. That's because MVCC is basically multiple parallel universes for data. What you see depends critically on which transaction you're in when you look.

For example, suppose you have a process that needs to know the number of users in some user table. You start your transaction, and then another process starts a transaction that goes through and deletes certain users. While this is happening, you need the count. In MVCC, deletion doesn't delete, it just marks rows with the transaction ID range in which they're visible. The rows that are being deleted by the other process are just being marked dead as of that process's transaction ID, but your transaction ID is below that one, so they're not dead to you. Later on, when the youngest transaction alive is older than the oldest transaction that can see these rows, PostgreSQL will actually expunge the data (VACUUM does this).

Because of MVCC, there is more than one legitimate answer the question of COUNT; potentially, one for each transaction in progress. The answer is changing constantly. People tend not to think in these terms; we tend to think, well the table has so many rows in it, right? Why don't you just increment the count when you write one? It's not true, because with ACID compliance, uncommitted transactions should not have visible effects until they commit and transactions in progress should not see anything vary during their operation. So there really are multiple right answers at any given moment. The expectation that SELECT COUNT(*) will be fast is built on the assumption that the database has some sort of master count of what rows are in the table it can just glance at. But it doesn't, and to have one would require removing MVCC altogether and using locks instead.

ataggart · on May 7, 2012

I'm not sure I buy your argument. It is certainly true that with MVCC "there is more than one legitimate answer the question of COUNT", but that's true for any query, not just COUNT. Given that we can maintain transactional correctness for the data, it's not at all clear to me why the same mechanisms cannot be used to maintain transactional correctness for the metadata.

fusiongyro · on May 7, 2012

Yes, but we're not surprised when those other queries have to read the table. For some reason it's surprising with COUNT. I maintain this is not a technical problem so much as an intuition problem.

Explain how you're going to maintain this metadata in more depth. If it can be done in an MVCC manner (i.e. without introducing locks) then it should be explored. How's it going to work?

seunosewa · on May 7, 2012

We can just maintain multiple versions of the row count.

fusiongyro · on May 7, 2012

It might be worth seeing what the cost of this would be in practice. I suspect if the answer were this easy the developers would have just done it by now, but you never know.

ralph · on May 7, 2012

I agree. Given count(star) is a common need you'd think there would be multiple count(star) results in the metadata for the table, each with a generation associated with it.

fusiongyro · on May 7, 2012

It's a short, easy query to write. I'm not convinced it's a big need.

So, your proposal is basically this: make a metadata table. On every insert or delete from the table, update the metadata table accordingly. MVCC will make sure anybody reading from it will get the right answer. Is that about right?

jeltz · on May 7, 2012

I do not see how this would be implemented without introducing extra locking which would harm concurrency, which would remove some of the performance advantage of having MVCC.

Now I see it could be worth taking that penalty on specified tables, jsut like you do for indexes. And some databases do support this in a more general form: materialized views.

fusiongyro · on May 7, 2012

I think you could fake this proposal with triggers. It might be worth seeing what the additional cost is.

scotty79 · on May 8, 2012

Proper solution (not fake with triggers) should be at most as costly as having partial index on the table.

scotty79 · on May 8, 2012

^ tl;dr no, it's still slow

sorry, I couldn't help myself :-)

I understand that it's semi-hard problem given MVCC but I could keep track of counts myself for each set of conditions and each table that I need. It's as simple as keeping count as record in some table, incrementing it in on insert, decrementing in on delete and adjusting on update if row begins or stops to satisfy conditions I want to count by. The only grudge I have with PostrgreSQL is that when confronted with this problem PostgreSQL fans respond with "We've got MVCC so it is supposed to be slow. Stop thinking it's supposed to be fast." not with "Hmm... Maybe we should introduce new facility similar to indexes where you could define what you want to count and the db will count by this conditions and provide results for you fast."

fdr · on May 8, 2012

> The only grudge I have with PostrgreSQL is that when confronted with this problem PostgreSQL fans respond with "We've got MVCC so it is supposed to be slow. Stop thinking it's supposed to be fast." not with "Hmm... Maybe we should introduce new facility similar to indexes where you could define what you want to count and the db will count by this conditions and provide results for you fast."

Plausible, but somewhat bothersome. It could be a special case of the materialized view problem, but it would take some convincing that to suggest that a special count-optimizing physical structure is worth the additional knob and maintenance burden it brings with it. Not impossible to convince, but it would require some tight argumentation with ample evidence.

In my quick assessment, the need for faster counts over lots of data is subsumed by materialized views, which unfortunately is a very tough feature to write, but has the advantage of generally considered being being worth it. A cut-down version of materialized views that just supports counting on various dimensions that adds more bulk to the planner as well as requiring execution maintenance and bug-fixing seems less-worth it. Somewhere in-between is a variant that supports more aggregate functions besides count(), particularly ones where one can define an inverse transition function (which in the case of count is "add one for every record").

There are many interesting features one could add -- and this is one of them -- but at the end of the day there is going to be an assessment by the long-time community members who have the privilege of fixing obscure bugs for years as to:

* Who is going to write it?

* How long will it take vs. other features?

* Is a feature is going to be maintained to a level of quality we find acceptable into the foreseeable future?

* Is the impact worthwhile?

* Could this in any way be done as an extension?

The level of the bar of quality has changed over time, too, pretty much inexorably moving up. Getting LISTEN/NOTIFY in now would be much harder than when it showed up in...Postgres95 (and, granted, it was pretty busted back then, apparently, along with...a lot of stuff). LISTEN/NOTIFY turned out to have a good impact/complexity ratio, so it gets good maintenance, but in a hypothetical world where it was not already committed with a good history of service and use I bet it would have required quite some convincing to add.

There is some old code in existence now that would not be accepted again today. A more sad example is dusty implementation of hash indexes that thankfully few people use or see reason to use. Hash indexes possess ample warnings in the manual to not use them. Yet, it doesn't seem reasonable to deprecate them for people that rely on them, and nobody seems to care enough to improve it, or even convincingly whine about improving it. The community is pretty sensitive to these code barnacles, so if one goes in wanting such a feature, one has to be well-prepared to argue both feasibility of implementation and size and duration of impact.

scotty79 · on May 8, 2012

People ask "why others are using MySQL instead proper database like PostgreSQL?"

Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.

I think that feature might have an impact.

Humble question from person who never tried to build a database engine:

Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?

fdr · on May 8, 2012

> Some of the people try and get burned because features that MySQL did without anyone noticing now take so much time that one wonders if he triggered some strange bug that caused db to do full table scan where one is unnecessary. Then he goes to the internet and look for clues and he sees response "oh, it's how it supposed to work, nobody is working on because it does not seem like a problem" over and over.

Just about every non-trivial database implementation will have a pathology that others do not. In this case, InnoDB is was faster because it supported index-only scans, a feature that was very useful, but hard to implement in PG because of some details of its MVCC implementation -- now there is an implementation in 9.2 that gives it a shot at being in the same class of performance even on wide-row tables. However, both are still much slower than MyISAM: if you search for "why is count slow on innodb?" you'll get a lot of hits -- it suffers much the same defect vs. MyISAM (which has poor SMP concurrency, allowing it to count things as a luxury) that Postgres does.

Perhaps "not a problem" is a lazy answer for "it's hard to get exactly right, and nobody in the same class of implementation is bothering, and this has been discussed to death and is more work than you realize, so don't expect anyone to implement it soon unless you intend to argue it (unless you are also a long-term maintainer) and do it." As you can see...that was many more words, including giving background on two MySQL storage engines.

> Do you know why db can't just keep track of number of entries in a full or partial primary key indexes and use them to give count fast ?

It could. But you'd have to spec out a brand new top-level set of utility commands (like CREATE INDEX) to make this physical structure, it now is yet another knob that can drastically change performance characteristics, the planner gets to scan the projections and qualifiers for yet another little optimization, someone else gets to maintain it (unless you are maintainer), and then there's going to be the person that asks "why isn't is this snapshot-isolated version not even nearly as fast as caching the count in memcached?" (or, if one does it the other way, the reverse), and then finally when you get around to implementing a more general set of functionality you'll be stuck with this old form for basically eternity (depending on your release policy) much like hash indexes. Don't forget to add backup-and-restore support, manual pages, and EXPLAIN and/or other diagnostic output, if necessary.

On the plus side, it could end the endless howling about this problem.

Personally, I think we can get enough hooks in place that someone should be able to implement this secondary structure in a satisfying way without coupling it inextricably into the database, and that's a project I'd have enthusiasm for.

If someone else made this feature in another popular database that serves similar use cases and it got used a lot, then I think the understanding of the upsides would be such that such a feature is more likely to happen. Index-scans fall into that category, and a special structure not seen in other databases doesn't appear to at this time.

scotty79 · on May 12, 2012

Thank you for all the information.

> On the plus side, it could end the endless howling about this problem.

:-) I think that would be also immense relief for the howlers.

saurik · on May 7, 2012

If you mean, "have other major database vendors, such as MySQL, become slower in most of the same ways", then "yes, and for the same reasons ;P"; the one place PostgreSQL could be better on that is that it can't use a covering index currently (which would let it get the count of a filtered query only with a scan of the index, without needing to read and validate the table rows), but that is being addressed in 9.2.

fdr · on May 7, 2012

I basically find your sentiment accurate: databases with high concurrency support need to do a lot of work to do an accurate count(*). However, many that were not postgres supported index-only scans to do it. That doesn't change the algorithmic complexity of O(n) to grovel through the tuples, but it can have a large constant time difference.

No anymore, though....9.2 supports index only scans in many common cases.

Still, as you say, counting all your tuples repeatedly in a table is going to be much, much more expensive than maintaining the count on categories you care about up-front, paying the lock contention and update-costs as one goes in virtually any database that supports SMP.

Erwin · on May 7, 2012

If you want an approximate count you can do something like to get data for all your tables:

     SELECT S.nspname || '.' || relname, reltuples 
      FROM pg_class C,  pg_namespace S 
      WHERE C.relnamespace = S.oid 
      ORDER BY reltuples DESC

This is the estimate used by the query planner, so it's updated whenever a VACUUM occurs (not sure how often the autovacuum runs).

outworlder · on May 8, 2012

As often as it is needed, depending on your database settings.