From the datatype wish list, these look like a bad idea actually:
- url - If all schemes are allowed, this is almost equivalent to "does it include :// after the first word". I'm not sure why it should be included.
- phone - Around the world this is equivalent to: does it contain digits. And maybe also + # and star. And maybe also letters for internal voip phones. And....
- zip - Every country has its own. There is no common format. It can have letters, numbers, symbols, ... anything.
- email - This actually could work if it allowed all possible formats.
PostgreSQL handles UUIDs, IP addresses and booleans quite intelligently. Where appropriate they include specialised functions that you can use in queries, and the formats for input are generous so long as your input is unambiguous.
For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't', '1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the equivalents for false values. It's hard to state how much better this is than Oracle, to take my work environment as an example.
My point is that I imagine that when the PostgreSQL team get around to these types, they will address them with the same care and forethought as they have for everything else. I trust them because their feature plans are not based on maximising shininess.
> For example, the boolean type in PostgreSQL will accept TRUE, 'true', 't', '1', 'on', 'y' and 'yes' (case insensitive) for truthful values and the equivalents for false values.
I can see most of those being useful so that you can easily extract components. For instance, from a URL you might want to extract the URI; or from a phone number you might want to extract the country code.
If done right, I can see those being useful. I agree that it can be overdone, though.
I don't like how timetz was highlighted. From the PostgreSQL docs[1]: "Time zones in the real world have little meaning unless associated with a date as well as a time, since the offset can vary through the year with daylight-saving time boundaries."
Also, I quote: "It's Postgres or Postgres-Que-Ell, not Postgre-Es-Que-Ell". And then a slide shows "PostgresQL" while http://postgresql.org shows "PostgreSQL". That pronounciation makes no sense to me if the slide is correct.
I'm waiting for the day when databases figure out their own damn indexes. They are sitting there with billions of CPU cycles per second, trillions of bytes of RAM and trillions of bytes of storage. Yet someone has to go in and babysit the thing to tell it which indexes to create.
(Yes I know the perfect dbadmin can make the perfect indexes. But I expect the database to figure it out itself and rarely be wrong.)
Sounds like you need strong AI. :) But really, isn't that like wishing the compiler would find out if you need a priority queue or a stack? Or rewriting your n² sort automatically to n log n?
No, it is more like the compiler working out that when you say x=0 whether it should be done by loading a literal, using a zero register or xorring the register with itself.
Essentially it is dumb in that it tries multiple query plans concurrently, figuring out which was most efficient and using that in the future, monitoring its expected performance and trying all candidates again when it deviates.
With modern systems having such a surplus or CPU, RAM and storage a dumb system could try multiple indexing strategies, and work out which worked best.
I also miss this feature however there are options:
a) for small and mid-sized tables you can use a SELECT + conditional INSERT or UPDATE or INSERT-on-exception-UPDATE works
b) write-heavy tables are worth being rearchitected to be INSERT-only as this helps avoiding any locks
I like MERGE too, in the proper place. I'm not sure that the absence of MERGE constitutes "this terrible state".
It would be like a commercial RDBMS lacking boolean and serial types, necessitating reams and reams of repetitive, error-prone boilerplate constraints and insert/update triggers.
One feature I've missed while using Postgres is the built-in REPLACE INTO that MySQL provides. I know it's possible to create a function to accomplish this in Postgres for specific cases, but is there some general variant which could be applied to multiple tables?
That is not enough to solve the general case due to locking considerations.
If we have one transaction running delete and then insert concurrently with one transaction doing a normal update then the update will find zero rows to update if it is ran after the delete+insert. The inserted row will be invisible to the update while the old row will be locked. This means the update will wait until the other transaction is committed and then find neither the old nor the new row.
A correct REPLACE implementation would make sure the UPDATE would UPDATE the replaced row.
The above is assuming READ COMMITTED isolation level.
Yep and even if we do account for this, say by asserting a table lock or escalating isolation level, we can still run into problems in the case that there are triggers defined.
There is no direct substitute in PostgreSQL, but there are case by case alternatives AND as far as any use cases I can dream up right now, they have better (As in they are easier to debug, easier to observe and easier to tune - MySQL necessarily hides the update operation from you) semantics.
However MySQL doesn't. I've just tried it, and MySQL blocks the UPDATE until the first transaction (DELETE, INSERT) is finished (like PostgreSQL), but after the commit, the UPDATE statement updates the newly inserted row.
It seems MySQL firstly searches the table for the row and then attempts to acquire a lock on it (thus waits for the DELETE, INSERT to finish, as it's locked the row being DELETEd), and, after the lock has been acquired, searches the table again for rows matching the query in order to perform it.
It's amazing how the newcomers to tech seem to ignore 'proven' databases like PostgreSQL so easily that have actually stood the test of time and actually have the real capacity to scale enormously on demand (which when done right). What is more disappointing is how the newcomers seem to trade in for beta-quality NoSQL databases that promise them of better reliability and scalability out of the box, which is simply not true[1]. For instance, databases like MongoDB promise the developer to not worry about scalability and focus just on their code, while it does some things under the hood in a terrible way [2].
PostgreSQL is one of the few databases that supports SQL and can actually be less painful to scale than MySQL. Also, the recent versions have enormous performance boost that can silence the SQL critics, IMO.
If you are interested in testing out PostgreSQL for your project, you might want to develop on the WAPP stack and give it a shot:
I was a mongoDB developer too and I loved it so much. I loved the fact that I could skip running db migrations on my rails app and I can focus on just coding, instead of worrying about scaling. But, it was a terrible trade off. First of all, you can't use NoSQL for everything. It was a very painful lesson that I learned the wrong way. 99% of the time, you want to use SQL, because most use cases can be executed perfectly with SQL db's. Next, some db's have a terrible architecture and design decisions under the hood, mainly to improve their benchmarks (seriously!). For example, turning off write-safe by default... :cough: :cough:
Last but not the least, SQL for the wrong use case complicates things and ends up adding redundancy into the database (like storing User details inside multiple collections, un-avoidable, especially when you are against embedding documents). The worst trade-off using a NoSQL database is the analytics part - "Hey db, show me the list of users who are from the United States and who have subscribed to Plan X and who are the highest paying customers and who love chocolate pie"
SQL - "Here you go." NoSQL - "Sorry sir, not possible. Possible, but possibly a nightmare."
If you want to build a 'scalable' app for your next big thing and don't want the scaling hassles of MySQL, which actually scales extremely well already[3], you can use PostgreSQL.
Cheers
[1]Scaling is hard. Don't let anyone else tell you otherwise!
My rule of thumb is, use PostgreSQL unless you have a well-defined need for something else. There are few things you can't do with it, and many things most other free databases lack.
Also, if you're not aware of it you'll probably be interested in Fabian Pascal's Database Debunking's site [1].
> SQL databases start off difficult but pay dividends on the back end. NoSQL databases start of easy but begin to extract their costs later.
Funny - I feel it is the other way around : NoSQL yield scalability to the high end but I find SQL incredibly more convenient because the database does most of the work for me... We all have our bias and mine is thinking in a relational way.
My view is that a single PostgreSQL box can handle reliably and efficiently enough queries than a comparable MySQL box. This is not to say that one is better than the other, it depends on a lot of things, but I am actually addressing only the pain points. PostgreSQL's enterprise brother, powered by enterprisedb has a cloud solution that takes care of scaling while you can focus on your code.[1] I had this in mind when I was writing that sentence.
I've used Postgres for the last 5 years, and mostly MySQL before that. But while I love a lot of features in Postgres, my big caveat with it is that Postgres seems to have an attitude of "if we can't do it perfectly, we won't do it yet", and so we only got "proper" replication recently (or had to deal with third party hacks on top of triggers or the like) and still can't do multi-master replication without third party hacks.
Meanwhile, MySQL have had multi-master replication "forever" because they took the easy way out of "simply" logging statements, and streaming those to slaves, and add a mechanism for adding offsets and step sizes to sequences. It's a bit of a hack, and vulnerable (if you do clashing updates, replication will fail), but in real life usage it gave MySQL a massive advantage for some types of scenarios for a long time (e.g. back in 2006 I ran a cluster of 16 MySQL servers spread over two sites; configuration was trivial while doing the same with Postgres at the time was a massive PITA).
It's a philosophical difference that means that while I'd be more comfortable about trusting our payroll data to Postgres, for a web app where major scalability is a concern, I'd be prepared to consider MySQL in situations where I'd consider Postgres a no-go because of the hassle.
Postgres feels like it has a more theoretically sound foundation, and is catching up feature-wise, though.
A few more iterations on the replication support, and extensions like https://code.google.com/p/plv8js/wiki/PLV8 (run Javascript in Postgres via V8, which also instantly gives nice JSON support to rival many of the no-sql solutions) and it's eating it's way into both the no-sql space and the space held by MySQL.
"a single PostgreSQL box can handle reliably and efficiently enough queries than a comparable MySQL box"
Sorry, but citation needed. Most of the (relevant) stack overflow results are pointing to articles citing 4.1 and earlier MySQL versions, or pure FUD opinion pieces.
MySQL and PostgreSQL are within the same order of magnitude (depending on workload and how much time you're willing to spend optimizing queries) for a single server, so to say that PostgreSQL is better requires some backing up with actual numbers or techniques.
The other commenter is right - When you know what data you want to store, I'm pretty sure you can work out the storage options. For example, most apps that require a typical user registration and login can be done with a basic MySQL box or a PgSQL box.
You can use (and should use) a NoSQL database when you have data that benefits from being non-relational and storing it in a relational db will actually cause you trouble.
Let's say your specific use case requires that for well-defined inputs of the function f(x), you want the outputs immediately and processing this function for each request will cost you a LOT of resources (think CPU, RAM, etc.). So, in such cases, if and only if it makes business sense, you can store the inputs ranging from 0 to x and corresponding computed outputs in a NoSQL database and access it when needed, instead of actually computing it everytime.
Actually my example is bad because you can store the same thing on SQL db's too, but unless there is a very specific advantage[1] that these NoSQL db's have (multitudes faster read speed, lower memory consumption, etc.) you want to stay away from them.
If you are evaluating a good NoSQL db for your project, then I suggest you check out Voldemort DB[2] by LinkedIn, which is actually pretty good and has some positive feedback from people running it at scale[3] without much of the marketing layer that Mongo has.
I guess that, at the time you have the data in hand, you do know what kind of data you have and also have a rough idea about what variations will exist.
If you don't, apparently, the NoSQL databases will work fine, but so would SQL databases that you tell "here's a binary blob". AFAIK, neither would support querying those binary blobs, though.
It's true that MySQL is ahead on the tickbox features here.
But the PostgreSQL team are doing their usual slow, stepwise refinement approach to implementing these features from the primitives and moving up. I expect that in a few versions they'll be at sufficient feature parity with MySQL on this front that anyone who cares enough is the sort of person who decides between Oracle RAC and Teradata.
While that will be good and I look forward to it, I will miss seeing you turn up in these threads like a bad penny.
And I will definitely miss the weekly "PostgreSQL is all you need and you're stupid to use anything else" threads.
My position is that there a lot of different products out there that cater for different needs and there is no "one size fits all" solution. I can scale Cassandra out to a hundred nodes in minutes on EC2 with no configuration changes. I also have queries in MongoDB that are literally 50x faster than on a SQL database.
> And I will definitely miss the weekly "PostgreSQL is all you need and you're stupid to use anything else" threads.
I think that for any case where you might be building a new system based on a relational backend, it's basically true (modulo local constraints like "we're an Oracle shop"). The chances that you will need to run a website that needs 100 Cassandra servers any time soon is ... well it's unlikely.
PostgreSQL is a stable, proven workhorse. That's why I like it. My point of view is that you should start with high safety and features and relax those constraints as circumstances demand.
Jacques, Taligent is a well known troll here - Two things you never talk against when Taligent is around - Apple and MongoDB. And he will start poisoning the threads. Just ignore him.
"In the beginning of Spotify, when load was lower, PostgreSQL was definitely the right tool for the job."
"Later came Postgres 9 and with it the excellent streaming replication and hot standby functionality. One of the most important database clusters at Spotify, the cluster that stores user credentials (for login), is a Postgres 9 cluster."
Hum, your link kind of defeat your point. That or I did not understand the point you were trying to make.
Does Heroku Postgres support PLV8 now? In the questions section of the presentation, he doesn't mention it as unsupported, but as recently as Sept 2012, it wasn't [1] - has that changed?
If this was meant to demystify something, then I completely missed it. Since I'm new to PostgreSQL, I was hoping to get a birds eye view from someone that knows his stuff, but what I got was keyword clouds. Could that be anything else but mystifying?
That's more a presentation of some not so well-known features and extensions found in Postgres than something for newcomers. In any case, watch the video linked in a comment above instead of reading the slides.
I have spent quite a few years working with MS SQL Server, but have only dabbled with Postgres. Can somebody recommend a good Postgres book or tutorial for people like me?
One of the best resources for PostgreSQL is the online documentation - I've hardly seen a F/OSS project with a better written documentation. The only drawback is that it is quite verbose and it's easy to get lost. You might want to start with the official tutorial [1] and might also check the dated "MS-SQL to PostgreSQL Migration" (from 2001) [2] which still provides some nice transition tips.
Make sure to check the "Postgres Guide" [3] which is a recent addition and is excellent to get you started quickly.
The quick intro by Packt Publishing [4] looks useful and concise as well, but is not as good as the Postgres Guide.
Since you're familiar with the RDBMS concepts I would suggest you take a look at the nice "PostgreSQL 9 Admin Cookbook"[5] which has a solid collection of recipes that you can use to learn the specifics of the DB and a bunch of nice features.