I'm using the AWS stack for http://www.soundslice.com/ and I've been using MySQL instead of Postgres, purely because my hatred for MySQL is less than my hatred for being a sysadmin. It was a tradeoff, and I miss Postgres dearly every time I use MySQL.
Yea I was using MySQL on RDS just to escape the Heroku monolith. Funny that Heroku announced a new Postgres pricing model that is VERY expensive compared to old setup (https://news.ycombinator.com/item?id=6712570) just in time to be wiped off the map by RDS.
Last time I badmouthed Heroku on here I got a reply from one of their employees asking me to fill out support tickets for errors I was getting in their apps......
No one's watching. Your initial comment just really didn't add any value. Calling Heroku monolith compared to Amazon is an understatement and I don't see Heroku being "wiped off the map" from this.
I wasn't calling Heroku a monolith because they're a huge company -- it's because they suck you in & then youre forever stuck in their web of services. I bet they start bleeding a lot of money now that there is an easy-to-use alternative for Postgres hosting on a well-known platform. Heroku epitomizes the idea of overcharge + vendor lock-in & people only put up with their high prices because they don't have the time to figure out a better configuration. Database is the easiest thing to jump ship with, you just have to change 1 URL.
It's a shame that Amazon has some proprietary idea of a machine image, but my ideal hosting scenario is just App Server Machine Image + DB + any Machine Images required for extra services. Shouldn't have to choose between several paid services that are just different APIs into an app server component you should have direct access to.
I don't understand how the open source community can be so into Heroku given that it's basically just wrapping open source software & charging for it, getting away with it by saying they're charging for the admin UI or whatever.
Just like AWS, you pay Heroku to simplify your system administration. You don't have to like Heroku's product but there is a place for it. I do not use Heroku but it makes total sense if you have an app you need to deploy and you don't know system admin, don't want to do system admin yourself, or can't afford a system admin.
GitHub just "wraps up" git and charges for it and I am happy to pay them. Setting up a git server is a royal pain in the ass. Setting up a wiki, issue tracker, etc to go along with a git server is even more of a pain in the ass.
Well done cloud services make life simpler, and many people will pay for that simplicity regardless of how the internals are built.
This is not true at all. Apps on Heroku are built on open source software (Rails, Postgres, etc). You can always decide to host it all yourself without too many issues.
How is this not true? It absolutely IS vendor lock-in, just in a slightly less sinister form.
If you add 5 services & then to switch hosts it's not another git push. You have to re-configure each of the services. If you'd done this yourself all along & made machine images it would have been less convenient at the time, but probably cheaper & a good learning experience. There are many other hosting platforms that offer Linux boxes so you could move your whole App/Software layer to another of these companies without much trouble.
I'm not saying Heroku is evil or anything. Yes, they provide a good platform. But I think any web company with a significant customer base would benefit more from the cost savings & freedom of a purer platform than the conveniences of Heroku.
Plus, there are so many configuration issues with their services. I have auto-scaling setup with Adept and still I see these long request-queue buildups now & then. I get the feeling I would not have the same issues with an AWS stack where I have CPU usage monitors that are very transparent & all the networking is trivial.
I still use Heroku as an app server & don't hate it enough to up & move (though a large factor in this is I'm not the one footing the bill, the client is...) but anything I can easily get on to AWS is a no-brainer. Database is one of those things -- a couple clicks to scale up/down every year is all thats really required.
CONCLUSION (cuz I rambled too much): I think Heroku offers scaling/convenience but AWS is just so rock solid & cheap that you can probably just buy larger instances than you need (to compensate for scaling) and have much better performance at the same price. Then you just need to learn how to install your tools & take a machine image as backup. Plus there's a lot of value in learning how to work with machine images that goes way beyond hosting a web app.
Yea I feel you on that but on Heroku I feel like "Every service I add is a per dyno credit card charge that is about to be multiplied by the number of hours in a month". :(
I prefer the Amazon model because you can stick these all on one server & as long as CPU isn't pegged at 100% you're good. I agree tho I'd rather not spend the time figuring it out. For now I only use it for RDS & for services not available on Heroku (some Adobe streaming server stuff).
This is almost the complete opposite of vendor lock-in.
Vendor lock-in is when you write your software on Oracle or MSSQL and moving away requires you to rewrite your whole thing. It's not losing the convenience of "got push" for deploys and having to spend time moving off their hosted versions of open source software and configuring and hosting it yourself instead.
Accusing Heroku of practising vendor lock-in is honestly absurd.
IMHO vendor lock-in is anything that makes part of your work process specific to a given vendor. The harder it is to move from one vendor to another, the deeper you are "locked in".
And as a point of interest, I think these days its probly much easier to convert your database than it is to switch hosting platforms (well... in some cases).
We started off on Heroku because we wanted something dead simple. The amount of time heroku saved us was an incredible value when we were starting out. I don't think there is any vendor lock in. We just did not switch to AWS earlier because our needs were met really well by Heroku and even Amazon Elastic Beanstalk (for ruby) did not come near the ease of a Heroku deployment. Once Opsworks came around, we invested in deployment scripts and switched because Opsworks gives us same ease of use and greater control of our stack.
What do you think stops you from switching? We had no issues at all - definitely none from Heroku.
We still use PostgresSQL from heroku because it is still a solid service and comes with niceties like dataclips. I should confess that I have not explored the Amazon PostgreSQL offering but I am happy with Heroku for databases at the moment.
Heroku is great for clients though. When I was a freelancer, I built these apps for other companies that assumed that I would do the admin & hosting. Heroku is great way to just "tack on a fee" for doing the hosting. If they want a cheaper option, they can always do it themselves. Most clients don't care about the savings on hosting..
I've heard about PostgreSQL and know that HN community raves about it, but am currently using RDS with MySQL.
Does it make sense to migrate to PostgreSQL, I don't have a lot of data as I'm in the early stage?
What are the primary advantages that PostgreSQL provides over MySQL?
Any advise/pointers is appreciated.
Perhaps you mistakenly insert "2013-10-32" into a date column. MySQL will silently convert this to "0000-00-00" (!!). Postgres will raise an error.
Perhaps you make an error in a transaction. MySQL lets you keep doing subsequent things in the transaction. Postgres treats the transaction as invalid and forces you to start over.
Of course, there are things you can do to make MySQL less horrible, and this is a generalization. But Postgres is just more respectful and more solid.
Oh, and PostGIS (Postgres' geo add-on) is by far the best open-source geospatial database. If you're doing anything with geographic queries, you need to be using it. MySQL's stuff is laughable in comparison.
Context: I've dealt extensively with both databases, both from the perspective of a framework author (Django) and a developer making products. I've used both databases on and off since 2001.
Hey at least it throws a warning on the bogus date conversion (MySQL warnings should almost always be treated as fatal errors).
The thing that kills me with MySQL (technically it's with InnoDB-based storage enginges in MySQL) are the subtle quirks. Like the thing where it insists on writing temporary tables to disk if you do a query that selects TEXT or BLOB fields. Even if they could have easily fit in memory, it's not smart enough to be able to determine that with variable length fields. A very non-obvious performance killer unless you're specifically looking for it.
I run a small Wordpress network. MySQL's insistence on going to disk for joins on tables with a TEXT field (even if the query doesn't touch those fields) is probably the major performance bottleneck.
Ouch. Thanks for the explicit heads-up. I generally stay with PostgreSql, but I honestly thought most of these things were "fixed" in recent versions of MySQL (and assumed my subconcious dislike of MySql was at least partly irrational/rooted in Ancient and Outdated Lore). I guess not:
It took me a long term to learn this one. I suppose if I'd read the MySQL docs from cover to cover I would've found it earlier.
One other problem that popped up was ignoring indexes on tables with TEXT fields during joins, which was a planner weakness. I understand it was fixed in 5.6; I'm waiting for the Percona version to stabilise before I upgrade.
I tried MariaDB about a year ago and it had the same problem. It's possible it's been fixed since. I personally prefer the Percona fork of MySQL, which has some performance tweaks yet is basically a 100% drop-in replacement.
I don't think it's fair to say Mysql doesn't care about your data. For example,
> Perhaps you mistakenly insert "2013-10-32" into a date column.
Only with ALLOW_INVALID_DATES sql mode set. As of 5.0.2, the server requires by default that month and day values be legal, and not merely in the range 1 to 12 and 1 to 31.[1]
> Perhaps you make an error in a transaction. MySQL lets you keep doing subsequent things in the transaction.
If you care about transactions you should have STRICT_TRANS_TABLES on.[2]
Thanks for the updated list. It's been a LOOOOONG time since I compared MySQL vs PostGreSQL in detail (around 2001), but what I found at the time made me never want to look at MySQL again.
Understand, MySQL-ers, I know that your DB has been patched A LOT over the last decade, but running with something that did not support ROLLBACK (nor isolation), nor foreign key constraints???
That the MySQL team even thought they could call such a thing a database terrified me, and made me quite scared to ever trust their judgement. (viewing the comments for this article suggest to me that playing those odds was the right thing to do, as well, rather than simply "prejudice")
That, and at the time, PostGreSQL was supporting stored functions that fit anywhere in SQL statement syntax that the return type matched the needed expression type (scalar, vector/row, matrix/table), and could be written in a PL/SQL work-alike OR alternate loadable languages, while MySQL had no stored procedures at all.
That, and at the time (already), PostGreSQL supported OOP-ish "extension" tables that extended other tables with extra, specialized, columns. Rows in the specialized, subclass, table would show up in the generalized, superclass, table (sans extra columns), but the subclass table would only show the relevant specialized type rows, with non-null columns where needed. Other DBs required you to join 2 tables and manage joins and a view to do this.
Putting SQL syntax on top of an ISAM engine just makes it dBase with awkward syntax, and that's not an environment I wish to revisit. (I know that InnoDB is constantly twiddled to suck less, but that back-end was extra back in the day, yes?)
I think the biggest advantages are trust and flexibility.
Trust, because PostgreSQL language semantics are cleaner, closer to the SQL standard, and less likely to surprise you. And postgresql just has a good reputation for traditional engineering quality.
Flexibility, because it offers a lot of APIs and features that can be very useful to adapt your application as needs change. You don't have to go crazy with features, but even simple apps can benefit a lot from prudent use of them -- a trigger here, a foreign table there, or LISTEN/NOTIFY (for cache invalidation) can just save you a huge amount of work and make the system more robust overall. The extension mechanism is very powerful.
Before making any big decisions, do a trial migration and see what you think.
- More robust, fewer crashes, less corruption of data
- More features (JSON data type, partial indexes, function/expression indexes, window functions, CTEs, hstore, ranges/sequences/sets, too many to list)
- More disciplined (doesn't do things like auto-truncate input to get it to fit into a column)
- Not owned by Oracle, it's actively developed, regular major release schedule, etc.
- Better Python driver (don't know about other languages)
- Choice of languages for database functions/procedures (Python, JS, etc.)
Dbvisualizer (http://www.dbvis.com/) works with lots of databases and is really pretty cheap. Disclaimer: no connection other being a long term customer, I use it with Oracle, MySQL and Postgresql.
The basic answer is (from what I can tell) that postgres ships with sane defaults. Lots of little gotchas exist in mysql that experienced mysql dbas know to deal with and avoid. On the more sophisticated side, pgSQL seems to have a focus on being "Really Awesome For Experts" where MySQL seems to be focusing on "Being Easy to Get Started".
I've been using pgSQL for my side project and for my "personal tooling" at work, and I can honestly say that it's just as easy for me as MySQL was for the same sort of things.
> Does it make sense to migrate to PostgreSQL, I don't have a lot of data as I'm in the early stage? What are the primary advantages that PostgreSQL provides over MySQL? Any advise/pointers is appreciated.
If MySQL works for you, there is no reason to change. Some people have quite a dependency on Postgres (hstore, JSON, transactional DDL, pubsub, PLPGSQL etc.)
Ordinarily I would agree, but in this case, the poster said that there's not a lot of data yet so it would be easier to migrate. Choosing a database is a pretty major decision with long-term impact that might not be obvious now, so it's worth exploring a few options if you have that luxury.
I think there are two times when it makes sense to consider a DB migration: (1) early on, when it's easy; (2) if you are in major trouble with your existing DB.
Our dependency on Postgres is more to do with it being a lot safer with our data. It doesn't silently fail, the transactions have a better failure mode, and as previously pointed out it handles ALTER in production a LOT better than MySQL.
The combination of safety, speed/efficiency in schema/index alteration, and its increasingly good performance are why we depend on it. The rest is just gravy.
Just DDL transaction make life so much easier but there is much more like Materialized Views, Advanced Data type, Efficient JOIN, pg_stat_statements, PL, full text search, GIS features (pretty useful to do near by query), fuzzy search, ISO SQL standard compliant, Piece of cake replication.
Do some search and give it a try one weekend.. Trying it is loving it.
Hating to pile on, but doing so anyways, one of the overlooked things I'd argue is that postgresql tries much harder to let you do neat things with your data without much hassle. Needing to write a custom function to process data? Not only does PostgreSQL have support for PL/PgSQL, but it's made in a way that has let people write plugins to python, perl, v8, tcl, etc, so you can do that data transformation on the database instead of shipping it back and forth to another server with the latency that brings. The recent version also brings read/write support to redis, so you have have your updates trigger redis to store new values, etc. Plus, being that it's not controlled by Oracle, there's much less a feeling of the community being held back; MySQL feels very much like Oracle's tying one hand back, so that when you get to the size to need a "real" database, Oracle will gladly step in, at their standard consulting fees, of course.
I have but one suggestion. Unless RDS for PostgreSQL is drastically different from RDS for MySQL, you still need a DB administrator.
RDS removes remarkably little of the pain of running a database instance (most of the pain that's removed is just the up front setup), and ends up adding a lot of inconveniences for your day-to-day operations.
Also don't count on their replication as your backup.
> Also don't count on their replication as your backup.
This statement seems like a complete non-sequitur: does anyone credible recommend replication as a backup strategy? It's like dinging a server vendor because you can't rely on RAID as a backup plan.
Any link to a page providing more info on those quircks ? I'm deciding between a traditionnal ec2 instance running postgres or rds at the very precise moment !
The long and short of it is that you do not have an account with super access (you can't perform normal actions like killing threads, viewing processlists, change variables, etc). You instead have to use special stored procedures to perform basic administrative functions.
Also, you can't take advantage of replication (aside from their own read replicas within other RDS instances) or binary backups. Anything that requires access to the machine itself is impossible except through Amazon's support channels.
Its gotten better than it was, but it's still a headache to monitor and manage as a DBA.
The guide does mention using a read replica to replicate from, an intermediate RDS instance between your offsite, but I've had no trouble replicating directly from the master instance.
One thing they don't cover is replication over SSL. AWS has failed to mention this shortcoming in the docs last time I checked. To have MySQL replicate over SSL, the master and slave both need an SSL certificate signed by the same CA, which would require you to obtain a cert+key signed by the AWS RDS CA.
Of course you have the option of tunneling the replication connection into a haproxy or stunnel running on an ec2 instance, but that has it's other shortcomings. You can't use the ELBs, since you can't register the RDS instance with an ELB.
Am I missing something or not following correctly? You can view processlist and kill queries like normal and you can edit variables via their control panel. I'd assume you can't edit every variable
Disagree with what? Keeping your own backups? Probably not - if you're not keeping offsite backups, you're begging to go out of business.
With needing a DBA, even if you're still on RDS? I don't see what that has to do with PCI compliance.
With RDS only removing the up-front setup pain, at the cost of ongoing maintenance... as someone who is also familiar with PCI (and HIPPA, and DOD) compliance, I respectfully disagree with your disagreement (well, if you're working with DOD, AWS isn't even an option to begin with).
Given the choice to run my own instance of a DB on a AWS instance (which carries the same certifications as RDS), and use RDS, I will run my own instance every time. The setup just isn't onerous enough to justify the daily productivity cost.
I disagree with the notion that RDS removes remarkably little of the pain of running a database instance.
Yes, there are projects where RDS is not a great solution, but it definitely simplifies a lot of stuff. The notion that it "only removes up-front setup pain" is silly. If you manage your databases correctly, up-front setup pain should be the vast majority of all your basic admin operations. The "at the cost of ongoing maintenance" part is a real head scratcher for me. RDS basically gives you everything you'd have with a DB on an AWS instance except a local login, which one tries to avoid using like the plague anyway.
> The "at the cost of ongoing maintenance" part is a real head scratcher for me
Let's look at a common problem that DBAs are typically given: "The Database is slow!". Let's troubleshoot this ficticous problem on RDS:
Am I being affected by a noisy neighbor? Can't tell; contact Amazon support.
Can I look at top to see if the load is high on the box, and potentially why? No. I can look at historical trends, but not with enough granularity or information to be useful.
Can I look at the disk iops to see if there's any kind of problem there? No. Complete black box here; contact Amazon support.
Can I look at the slow log? Kind of. They'll push the slow log data into the database for you to query, but then you can't use tools to do aggregate tracking.
Pause for a moment for a quick MySQL RDS tip: pt-query-digest has a mode of operation that lets you do a processlist every 1/100th of a second and turn that into a pseudo slow log, which does work for RDS.
Back on track - so no real analysis of a historical slow log, without writing your own tools. Possible, but time consuming.
Can I kill queries? Yes, using a stored procedure. Can't use any of the existing toolset around this (like pt-kill, which can help keep poorly written ad-hoc queries from getting out of hand).
So, after many hours swapping emails with Amazon support, we've determined that we're actually spending a lot of time waiting on malloc mutexes. The internet says that using a non-default version of malloc will help with that - can I do that?
Nope. You're stuck.
Other things you can't do:
* Offsite backups that are in any form but MySQL dumps.
* Take advantage of new index types and compression support from TokuDB.
* Zero downtime failovers (We were able to help someone fake this; it was a PITA).
* Cross-region replication.
* Automated failovers using a reputable tool (MMM, MHA, etc).
* Access the error logs.
* Run multiple instances on one machine.
* Alter the disk elevator (hopefully they're using something sane, like noop, but we'll never know)
* Alter the kernel swappiness.
* Troubleshoot crashes.
* Monitor and alert on a machine's vitals.
Now perhaps I'm just being a power-hungry admin, but these small things matter. They are the difference between a snappy DB which scales beautifully to 10,000+ QPS, and a sluggish DB that causes you to move to bigger hardware, because it's the only option open to you.
Databases just aren't that hard to set up. Install packages, install config files, start the DB, restore from a backup file, restart the DB, and you're golden. If you're particularly paranoid, set up the selinux contexts (I'd bet dollars to doughnuts that this isn't done on RDS instances), and create a security group that limits access to only the 22 and 3306 ports to your application hosts, and set up individual users.
This is particularly simple when you use an orchestration tool; I recommend Ansible personally.
> Am I being affected by a noisy neighbor? Can't tell; contact Amazon support.
Sure, you can. Spin up multiple RDS's and benchmark them.
> Can I look at top to see if the load is high on the box, and potentially why?
If you ar using top to monitor your box, you are already screwed. There is lots of support for remote monitoring.
> Can I look at the disk iops to see if there's any kind of problem there?
Disk iops are part of the built in monitoring and metrics provided with RDS.
> Can I look at the slow log? Kind of. They'll push the slow log data into the database for you to query, but then you can't use tools to do aggregate tracking.
If only there was a tool that could extract records from a database and compute aggregates...
> So, after many hours swapping emails with Amazon support, we've determined that we're actually spending a lot of time waiting on malloc mutexes. The internet says that using a non-default version of malloc will help with that - can I do that?
>Nope. You're stuck.
MySQL sucks. RDS provides no means to make it any better. Fortunately they do now provide PostgreSQL.
> Offsite backups that are in any form but MySQL dumps.
You can do that by replicating to an external MySQL server and doing whatever the heck you want with it.
> * Take advantage of new index types and compression support from TokuDB.
Yup. Until today it was also really hard to take advantage of different engines found in PostgreSQL. ;-) This is a totally different product.
In general, all of the stuff you are describing are features, not things that cause maintenance complexity. In fact, manipulating those things causes maintenance complexity.
> Databases just aren't that hard to set up. Install packages, install config files, start the DB, restore from a backup file, restart the DB, and you're golden.
I had no idea PCI compliance could be that simple. ;-)
> This is particularly simple when you use an orchestration tool; I recommend Ansible personally.
Yes, orchestration tools, if set up properly are exactly how you'd want to do this kind of thing. If you already have all that setup to manage your database, RDS is likely not going to help.
I think this is great news... PostgreSQL is definitively my favorite open-source database. It's also nice to see Amazon get into the game, as hosted pg options have been fairly limited. I am slightly disappointed to not see the server-side JS support baked in, and that apparently you can't do reads from distributed replicas. Just the same, I think there will be a lot of progress in this area.
Administering databases is a full-time type of responsibility. Yes, you can get pretty sane defaults, and up and running without much difficulty with MS-SQL, and mySQL has been a defacto standard in the LAMP stack. That said PostreSQL has been a rock solid RDBMS. The commercial extensions for replication have been cumbersome and expensive. Here's hoping that AWS will grow/expand the replications options/features, and that they'll grow to include JS procs as that feature stabilizes.
It looks like the Multi-AZ setup is using block level replication such as DRBD instead of the built-in replication:
> Database updates are made concurrently on the primary and standby resources to prevent replication lag.
Makes me feels better for setting up my own pg cluster on EC2 a week ago, which does allow reads from the replication slave. Plus, I can provision <1000 IOPS (provisioned IOPS is damn expensive with AWS), and get to use ZFS.
The time spent researching and building that cluster was time not spent building product. If you are low on capital then it was potentially a good tradeoff, but I'd much rather just scale up my read capacity by adding horsepower until they get around to adding read replicas. My goal is to spend as little time as possible administering my infrastructure, and running my own database cluster is the last thing on my mind.
From a very quick glance at the small end the Amazon options are a little more than half the price (Ireland single AZ instance pricing and even less reserved) but multi-AZ options are probably very similar.
I don't actually know whether Heroku can failover across an AZ failure.
By default our followers and HA is automatically cross AZ. You also have an ability to create followers across region, but we do not automatically failover on those due to latency.
One area is we're focused more on delivering more guidance and expertise around what you're doing with your database, in addition to ensuring your database is healthy and running. An example of this is notifications that we deliver around unused indexes, where you may benefit from other indexes, or other places where you can quickly optimize your DB. This starts to free up a DBA to add higher value tasks or for smaller shops lets you get by longer without a need for a DBA.
Another big area is features we deliver on top of Postgres. This ranges from followers which all you to easily scale read traffic, or allow for your database to be replicated across not just AZs but also regions. There's the other spectrum of this as well including dataclips. Dataclips make it easy to share data in a simple way, as well as build richer dashboards by integrating with google docs, or quickly prototyping APIs.
If you're curious on various technical details we'll be documenting that soon but would be happy to correspond via email, craig at heroku.com
I understand that "forks" and "follows" are easy concepts, stats are cools, etc. but I personally wouldn't want to pay double or triple for that (and I'm not a DBA, so I feel the pain). Not that my word counts much as I'm on the smallest Heroku production plan, so I guess my bill isn't exactly interesting for this kind of discussion. In my case, I would say that even saving $25 off $50 can be offset by these additional niceties. But, I don't know what I would think if I were a customer paying $1000/mo.
I just switched off heroku after using it for a couple years(sorry! I wish I could have stayed). I will attest dataclips are really cool, but your guy's pricing is out of wack (and customer support is eh).
> From a very quick glance at the small end the Amazon options are a little more than half the price (Ireland single AZ instance pricing and even less reserved) but multi-AZ options are probably very similar.
Did you add disk? People seldom do, I can attest it's a big part of the bill...
> I don't actually know whether Heroku can failover across an AZ failure.
"Followers" have preferred another AZ for quite some time -- almost since the beginning -- and Heroku Postgres HA is based on similar technology. So, yes.
> No, I didn't take storage or IOPs into account but you do get more memory and the flexibility if the Heroku options don't fit you.
No doubt about that. To date, Heroku's product model has preference for fewer, but common choices rather than more flexibility. Heroku's staff sees fit to give in sometimes, but coarsely speaking, that's pretty much how it goes.
One other issue to consider is that AWS Micro DB Instances have "low" I/O capacity and no provisioned IOPS.
Last I heard Heroku is hosted on top of AWS. Does anyone know if Heroku's cheapest Postgres plan is hosted on a dedicated AWS Micro instance, or do they buy large AWS instances and host multiple databases on each box thereby potentially providing more IO performance?
IIRC, you get a dedicated database not a dedicated PG cluster, so presumably their will be multiple plans hosted on the same underyling server instance.
The Multi-AZ options aren't similar, they're half the price as well (comparing a Multi-AZ RDS setup with a Heroku Postgres instance that has an equivalent follower instance.)
Yeah I was disappointed to see this.. Can you configure pgpool externally to talk to the replicas or are those multi-az instances a black box? I've never used RDS before so I'm curious as to how much flexibility is afforded.
Hmm, interesting development. But am I the only one to see a huge gap in services between Amazon and Heroku, and one which most certainly bodes bad for Heroku? Specifically, with Heroku's recent PG 2.0 service, their 'production grade' Standard plan ranges from $50 - $3500, but "Up to 1 hr downtime per mo." ?!?1! Huh, really, with up to an hour downtime /month, you couldn't be serious about the product that runs atop this tier - a non starter me any most other SaaS businesses. Heroku's cheapest "high availability option starts at $200/month, still with "Up to 15 minutes of downtime per month."... still, this is a concern for me...
Now, comparing to Amazon, their '1.7 GB memory Small DB, 1-year reserved, multi-region' is around $28/month (with storage & transfer for my app no more $35/ month). The equivalent Heroku plan, Tengu (1.7 GB mem) STARTS at $350/month!!! Wow, not I'm really rethinking my hosting platform.... Amazon looks more attractive, even if I have to do a bit of sys admin for my web server/cloud server.
Heroku will still be my go-to for quick tests and experiments. But for larger projects it would be foolish to not consider PostgresRDS + Docker on AWS. *assuming they both leave beta.
Doesn't their RDS add-on merely set DATABASE_URL to the URL that you provide when adding it? Are you sure it doesn't work with MySQL? There's no reason you'd need to wait for Heroku to add support for this, just spin up an RDS instance in us-east-1 and start using the URL Amazon gives you as your database URL.
I'm on the Heroku Add-ons team. This is correct. The RDS "add-on" is there simply for people who search for it. In reality, it's just setting DATABASE_URL, and you can do the same yourself.
this is awesome!
still, i miss one essential feature - compared to mysql on aws and heroku/postgres: there is no replication for read replicas. yes, you can deploy a "hot standby" replica in another availability zone for failover, but you cant read from it.
This isn't true anymore. RDS now exposes an external replication stream for Mysql. AWS actually doesn't seem to be that interested in vendor lock-in. They usually build what people ask for.
Somehow it seems pretty costly...Wouldn't it be cheaper for me to just run it on my virtual instance that's hosting the app and probably just back it up with S3.
If you place no value on you're time then yes, it would be cheaper to just run it on your own EC2 instance. There's a lot to be said for not having to worry about the day to day of maintaining a database server though.
I've only ever used S3, but some of these AWS offerings do look interesting for my little projects. My question is, how do hours get calculated for billing? If I had my super low traffic blog using RDS, would I incur a few microseconds of time per DB hit, or is it rounded up to an hour, or is it the total time the DB is available period?
You would be paying for 24 hours each day. You are renting a managed virtual server, and you pay for each hour this server is running in addition to usage fees for storage and IO operations when it's actually being accessed. It'd be $20-30/month to host the database for a very small blog.
Will anyone please explain the tactical reasons why PostgreSQL won? It's pretty obvious it has. I've basically ignored the database wars for a few years, so it's kind of interesting to see that everyone's using PostgreSQL now.
Some engineers I worked with went and interviewed people at various San Francisco startups about their experiences with their databases.
The MySQL startups tended to say "We love MySQL. We've gotten in the habit of taking an hour or two of downtime in the middle of the night every week to run all of our schema migrations, and we've had to build our process around that, but one we had it in place, everything's been fine."
The PostgreSQL startups said "We love PostgreSQL. We run schema migrations in real-time during the middle of our workday, and we don't have any problems."
pt-online-schema-change from Percona addresses this particular point with MySQL nicely for us, incidentally. We use their Percona XtraDB Cluster fork and are quite happy.
Entirely possible to use pt-online-schema-change on an RDS instance. It's just running a bunch of SQL commands and is very happy to do them against a remote database like an RDS instance.
For starters, the requirement of a third-party library introduces at least two problems:
1. An additional potential point of failure
2. The core software (a DB, in this case) can (and probably will) evolve independently of the third-party tool--thus introducing an additional layer of maintenance problems.
I'd argue further--and this is of course just an opinion--that such a basic feature as this ought to be supported out-of-the-box by anything that claims to call itself a "database" in the sense that MySQL does.
Percona and pt-online-schema-change is not just "some third-party library".
It's an industry-standard tool and one of the most respected forks of MySQL. It's not a layer of maintenance problems and they sell commercial support.
Looking through the documentation, though, it's even worse than just saying every alter is an offline alter. There are a few, but significant, schema changes that one could want to make -- like when dealing with a table that has full text search -- that need to be done offline; you have to tread real real lightly in dealing with online alters. I'd personally argue that online alter is not ready for prime time with 5.6; there are too many other things they need to do with the database engines to make it really useful.
I don't know if I would really say Postgres won. Among people who care about databases, Postgres has always been the more popular of the big two free databases. Postgres was actually dedicated to being a good relational database, whereas MySQL in its earlier days was willing to cut corners to the point where it wasn't even ACID in most common configurations. But MySQL was faster (because, again, cutting corners) and its tooling was a little more friendly (especially for the PHP crowd).
Over the years, Postgres has made up those shortcomings, so it retains its respect among database wonks and now other people can easily use it too, so you see a lot of people adopting it over the past few years.
But in overall usage numbers, I would be surprised if it were anywhere near MySQL. It takes a long time to overcome that kind of inertia.
It is a bit surprising how long it took for that to land. On the plus side, replication is very easy and low latency with 9.3.
There's still a bit to do on the auto-failover front, but that may end up being more of a third party undertaking. Postgres has failover facilities included, they just have to be driven by something external right now.
We ran PostgreSQL in production before 2010, and there were several solutions back then for replication but all of them had their own set of problems. We went with shipping the write-ahead log with rsync to the standby server, but this was trickier to set up and had much worse latency than the current built-in solution (which is basically streaming the write-ahead log over a tcp connection).
That is absolutely ridiculous. Less than 1% of 1% of mysql users are using replication. Suggesting it is a "total non-starter" for most production uses is crazy.
>Using a single, unreplicated database instance in production for anything serious is bizarre
It is incredibly common. Go check out a thousand businesses running mysql, you'll be able to count the ones using replication on your fingers.
>Failed hardware is hardly unheard of.
You don't need to use mysql replication to deal with that. Even the crappiest low end SAN storage devices do it vastly better than mysql does, without any of the bugs and problems mysql replication has.
I have heard stories of entire SANs failing so I would not advice trusting your SAN too much. For example there was a huge outage at a Swedish host caused by a failed SAN, all data was lost so people had to use daily backups to restore.
Replication can be done off-site so you at most lose a couple of seconds worth of data. I do not know anything about MySQL's replciation but my trust in PostgreSQL's is very high.
I'd wager that a huge number of MySQL users are using replication. Running a single database in a production environment is totally unacceptable and a major business continuity problem.
>Running a single database in a production environment is totally unacceptable and a major business continuity problem.
You don't need to use mysql's broken replication to get HA. Hell, I've seen more people (wisely) using DRBD for that than using mysql's replication. But even entry level storage devices do replication.
Given the number of companies who'll provide you with a fully managed Postgres installation there's no reason not start out on it for the MVP - as a user its no harder than MySQL. Even as a sysadmin its not significantly harder, just different to work with to MySQL.
Tactically, because PostgreSQL predictably releases robust software every year with major improvements and innovations.
Strategically, because PostgreSQL is not just trying to be a free database checking off features. It's trying to be something better -- lots of innovation that is having a bigger impact on what developers and DBAs can do.
One of many reasons is MySql being bought by Oracle and name change/lost of community backing etc...along with some missing advanced technical features....
I still use MySQL because trying out PostgreSQL recently scared me away. Maybe I was trying the wrong thing but I couldn't get pgAdmin to display my tables. I think I sank most of my afternoon trying unsuccessfully to do something MySQL can do in 5 minutes.
I recognize I'm used to MySQL but I was under the impression that because PostgreSQL has a similar syntax (SQL) it wouldn't take too long to pickup.
Unfortunately things that take 5 minutes in a complex application typically required an order of magnitude larger investment up front in manual reading and experience.
I deal with Postgres a good bit, but will be the first to tell you that pgAdmin has got some issues. It's a very clunky, buggy, moody piece of software.
I've had better luck learning how to use the psql command/shell than mess with pgAdmin, at least in certain cases. To take your example of displaying tables, psql in and type: \dt
I've used Sequel Pro and I don't understand why others love it; it's as clunky as anything I've ever used. My preference is for SQLyog - and my dream is that to be ported to OSX...
I have always stated postgres would become much more widely adopted if it was released on RDS and.... they finish the sequel pro support. Half way there ;)
Coming from a MySQL background, I sort-of assumed PostgreSQL worked the same way. I remember after setting up the server, pgAdmin was very difficult to navigate. I managed to get a database and table made and got some data in (using SQL) but I couldn't view any of the data. Compared to MySQL + Sequel Pro it was very difficult to use for a beginner.
Postgres is very powerful and can do a lot. I think I just need a good tutorial and some time to get used to it.
None of that has anything to do with postgresql. Pgadmin is crappy, don't use it. Use one of the dozens of database independent admin tools, or just stick with the best tool: psql.
You must be joking right? Either that or spending too much time on HN. PostgreSQL has won in the same sense that Golang or Clojure have won: in frequency of frontpage HN articles.
I think it all depends how you define winning here. Based on popularity alone - yeah, MySQL is still the king. Based on capabilities - I think it's fair to say that pgSQL is more capable and it's just that not everybody needs that one.
And even though I love PostgreSQL (and was working with all major databases), I still think that the real winner is sqlite :)
I'd venture a guess, and say that Postgresql didn't win, sqlite and mongodb won. I think a lot of the reason behind MySQLs initial success was blogs and various php-apps that weren't architected properly (from a db perspective). Some of these didn't need a ("proper") db, and most of them probably did, but "worked well enough" without one.
I never really understood why you'd want to use a separate dbms if it couldn't do proper constraints, triggers, transactions and materialized views for you -- then it starts to feel like a lot of wasted effort.
So for many of the use-cases where MySQL might have been appropriate, we now have sqlite, mongodb, memcache/redis and a few others.
Personally I don't really see any reasonable use-cases for mongodb, just as I didn't see many reasonable use-cases for mysql -- not that you couldn't build stuff on top of it, just that it wasn't a very good idea.
IMO it was when Oracle bought MySQL and effectively killed development. MySQL still lives on with MariaDB I suppose, but in the years since Postgres has leapfrogged it in terms of functionality.
I would like to say it is because MySQl has been sold to a company with no reason to keep it well. That's the one of biggest reason to make existing MySQL users to move on.
It was used, so still just there. But not anymore if they finally learned why safety in design philosophy is important by paying data corruption or service break. (anyway some people never learn…)
Safety is annoying. Safer system usually generate more errors to prevent accidental mistakes. Extremely safe system prohibits even booting up the product if it's not been properly configured. It simply has much more annoying safety net. If you care the safety, this kind of annoying errors are good sign to you. But if you're newbie, this is just a big obstacle, which makes stiff learning cube.
Really. We still see many people who hate to use safety belt. Because it's annoying. And everybody did before benefits of safety belt were widely known and accepted.
Also, number of users says nothing about how well the product works. Usually, cheapest product takes biggest user-base. MySQL is cheapest to start due to lack of safety. Search this thread for the name "natural219". And see why he chose MySQL over PostgreSQL. I believe that's why most people started MySQL at first.
And surprisingly, there're so many people really don't care data safety. (maybe not that much surprise. we always see those people in TV…)
Inertia, largely. Postgres is more scalable and has quite a few nonstandard additions that are insanely useful from time to time, but it's not always a drop-in MySQL replacement. If you're just writing something simple like a blog or CMS then you don't need the extra functionality.
Plus Wordpress has a hard requirement for MySQL, and like it or not a huge number of projects still use it as a framework.
I wish I lived in your world. Where I am, 90%+ of web developers believe there are two options for making web sites: "LAMP" and .net/sql server. I am constantly dealing with web developers who genuinely don't even know postgresql exists.
I hate PostgreSQL. Yes, I am a dummy application developer who doesn't understand database software.
Every time I try to install PostgreSQL it fails. Every time I install MySQL it installs successfully with no problems. Actually, that's the extent of my experience with it, and I guess I'm fine dealing with a database that doesn't validate date formats strictly if I can use the damn database without hassle. I am totally fine using Postgres at a company or with another DB developer who knows how to set up databases properly, but if I am starting a new project, I am going to use MySQL, period.
If you're on a Mac and don't want a menubar icon (I decided this recently), `brew install postgres`, and then write yourself some functions to make starting and stopping easier:
EnterpriseDB has a nice set of PostgreSQL installers for all platforms. It also bundles PgAdmin 3 if you need a GUI. It's not as hip as PG Commander, but it's a real client that can do pretty much anything (vs PG Commander where you can't even create a new db instance).
Postgres.app is nice when it works. When you install the latest version the psql tool it bundles and exposes in the menu assumes you have a db that matches your user name, which, of course, doesn't exist, so you can't connect to the Postgres :-)
It also doesn't work that great if you already have another version setup running on the default port.
So postgresapp.com's installer is great, but due to some xyz thingie, it limits at 27 or so connections for 9.2 (maybe still in 9.3? - haven't followed the GH issue). Startled me when I ran into that and motivated me to move to a regular install.
Um, point of this announcement is you no longer have to install PostgreSQL!!! You can let Amazon do it for you, and scale it, back it up, maintain it etc.
Are you sure that you aren't confusing installation with initial configuration?
The most utterly infuriating part of a new Postgres install is getting authentication set up. I shouldn't have to expressly edit a freaking INI file for basic user access, damnit!
With the tentative upsert changes made in 9.4's commitfest, the initial user config problems are my last major problem with Postgres...
No kidding. If you really can avoid all the caveats on MySQL, installing of PostgreSQL is nothing hard. It's just your belief that you can avoid all that caveats.
Also, why don't you use Heroku PostgreSQL app which is just one-click starting self-contained app?
If you're on FreeBSD/Linux desktop, how can you fail to install? Of course it doesn't work just right after install, and it needs manual initial configuration for security. But that shouldn't be hard to any Unix family server developer.
If you're on Windows desktop, why don't you run your own FreeBSD/Linux server VM instance for your own development?
If you're using Windows server… with MySQL… Hmm then I am sorry. I have no more idea.
When I start a new company project, one of the first things I do is make sure other devs can easily work on it. Encourage your tech lead to always make a script that installs and sets up necessary requirements.
Makes it very easy if you are Mac, Debian based shop --- or if you just use vagrant.
Currently this looks like:
1. Make sure we have the version of ruby we want
(if not use rvm to install)
2. Make sure our db is installed
(if not installed use brew (make sure we have brew) to install it)
3. Create our database and users on the new db server
3. Run rake db:setup
I'm using the AWS stack for http://www.soundslice.com/ and I've been using MySQL instead of Postgres, purely because my hatred for MySQL is less than my hatred for being a sysadmin. It was a tradeoff, and I miss Postgres dearly every time I use MySQL.
This new Amazon offering solves that.
I wrote a little more about my AWS setup here: http://www.holovaty.com/writing/aws-notes/