A table that should exist in all projects with a database

troels · on Dec 8, 2010

It's all true, but please take the lesson from rails and use timestamps rather than sequential id's.

stavros · on Dec 8, 2010

What's the advantage?

shimon · on Dec 8, 2010

Consistent ordering even if two developers happen to add migrations starting from the same source version.

mgedmin · on Dec 9, 2010

Could you elaborate on that?

nerme · on Dec 9, 2010

Schema version is at 200.

Developer A creates a migration numbered 201. Developer B creates a migration numbered 201. They push their code, pull, and get conflicts.

So instead, they use a timestamp based system. Developer A created his and got 20100604181252. Developer B got 20100604182530. Now, everyone is happy, and no one has to change their migrations.

alextgordon · on Dec 9, 2010

I think I prefer the first version. At least the conflict is obvious.

Developer A creates migration 20100604181252. Developer B creates migration 20100604181253. Developer B doesn't notice Developer A's migration: there's no way to know from the timestamps that there might be a conflict. Everything breaks.

vs

Developer A creates migration 201. Developer B creates migration 201. Developer B gets a handy notification that something has happened in the meantime, and gets a chance to patch the migration if necessary.

binbasti · on Dec 9, 2010

Actual conflicts from working with the same fields in the same tables are so rare, I haven't even experienced it once.

However, the second scenario would happen every single day in a team.

tim_iles · on Dec 9, 2010

I have occasionally modified a sproc, only to find that a conflicting check-in has also modified that same sproc. One of our modifications has been overwritten by the other.

I do concede it is rare, but it happened, and we only discovered the bug too late.

rahoulb · on Dec 9, 2010

The real issue is when you have feature branches and merge them in to master/trunk - so it's not a single file conflicting, but maybe a week or more's worth of work from both developers.

As you say, it would rely on notifications that there was the possibility of clashes, but unless you monitor every other developer's local feature branch, you would only know when the branches are merged into master.

kemiller · on Dec 9, 2010

Two developers pull code with migration version 4. Each writes a new feature involving a migration. Both get the number 5, which then collides when they both check in. But it doesn't collide at the source level, so you only find out when the db doesn't work right. This was a big pain in the ass in Rails until they fixed it.

kiddo · on Dec 9, 2010

Why don't they collide at the source level? I'm assuming the db updates get checked into your central source code repo. Wouldn't the collision happen at that time?

What happens when developers create branches and use time stamps? Developer A working on trunk might check in db updates at 4pm and 5pm. Developer B working on a branch checks db updates into his branch at 4:30pm and 5:30pm. When working on the branch his db updates don't affect the "trunk" db. As soon as he merges does his 4:30pm update get run on the "trunk" db, or is it skipped over because the 5pm db update already ran, and only the 5:30pm change gets run?

Where I come from we use int values for our db update script names. When I merge from a branch I renumber my db update script file names starting at the tip int value +1 of the trunk db update.

fyi - I'm not coming from a Ruby/Rails perspective, if that matters.

rahoulb · on Dec 9, 2010

There's a table listing the migrations that have already been run on the database. The list of migrations in source control are compared against the list in the database, the outstanding ones sorted into timestamp order and then run.

So if the two branches are separate there should be no issue - all my changes get run in order, irrespective of what the other developers have been doing.

kemiller · on Dec 9, 2010

They don't collide because the file names include more than just the number.

And yes, manually renumbering is exactly what you'd have to do. Timestamps obviate that.

Theoretically, you can still have collisions (two migrations alter the same column, eg) or violated prereqs this way, so some communication is still required, but not to the degree before.

gchpaco · on Dec 9, 2010

Posit repository A which is at schema state S(A) = N. Developer X branches it to make B and adds a new database migration, increasing S(B) to N+1. Developer Y branches it to make C and adds a new database migration, increasing S(C) to N+1. Now, merge B and C.

It's not necessary that schema versions be that rigid. All you really need is that they be monotonically increasing; a timestamp in UTC is sufficient for those purposes, although technically you could run into a situation where a UUID is necessary.

stavros · on Dec 8, 2010

Ah, very interesting. I'm sold.

wladimir · on Dec 9, 2010

Timestamps, UUIDs, and such have no dependency on a central source (and thus, single source fo failure) for generating them. Much better scalability.

Even with a single server, not relying on IDs being sequental is useful when you want to do a database migration without temporarily shutting down the site. It is much simpler to merge the two databases together.

moe · on Dec 9, 2010

A database/ORM lesson from... rails?

You may want to re-think that statement.

(and I'm not trolling, I use and like rails, but the database abstraction is amongst the weakest of all popular frameworks)

troels · on Dec 9, 2010

If that's not trolling, then I don't know what is.

But assuming that you are sincere, your reasoning is flawed. See, the fact that Rails' orm is in some way badly designed (your words - not mine) doesn't diminish this particular feature. The system could be overall crap, but still have one good thing about it. Besides, the schema-migration has very little to do with the ORM, except for the fact that they both interact with a database.

What you're doing, is called a straw man argument.

moe · on Dec 9, 2010

doesn't diminish this particular feature

Well, this particular feature is a bandaid for an underlying misfeature (manual schema management).

Proposing this as good practice is somewhat akin to proposing "If you design a new operating system then make sure the installation of virus scanners is as easy as on Microsoft Windows".

You may call it a strawman, I'd call it "putting something into perspective".

Besides, the schema-migration has very little to do with the ORM, except for the fact that they both interact with a database.

That's a nice theory but in practice most rails users do use the builtin migration mechanism, for the lack of a better alternative (at least to my knowledge, feel free to correct me).

Consequently I don't think it's unfair to compare that mechanism to the equivalent in similar frameworks - and in that comparison rails loses, big time.

troels · on Dec 9, 2010

"Well, this particular feature is a bandaid for an underlying misfeature (manual schema management)."

Could you expand on that? Maybe I'm missing something here.

moe · on Dec 9, 2010

Well, I was aiming at the "learn from rails" part.

The explicit mention of rails just stuck out to me (probably more than the author intended), as if rails had achieved something extra-ordinarily smart there. - While to me it's just an implementation detail of an otherwise rather inferior approach.

Modern schema migration tools compute the difference between the current and the desired state on-the-fly, and generate the migration commands accordingly. Explicit version numbers/timestamps/hashes are mostly used for documentation and transparency purposes there, not for the sake of conflict resolution.

These algorithms are naturally not flawless, but they work well for the proverbial 90% of cases. In such an environment the user doesn't normally have to futz with migration scripts or migration version-numbers at all.

swombat · on Dec 9, 2010

Modern schema migration tools?

Please list those widely used modern schema migration tools as they occur in other popular languages like PHP, Python, Perl, and so on.

In rails, users don't futz with migration version numbers. Those are handled automatically by the system. The user simply specifies a migration, e.g. "I want to add this column to this table." The migration is then automatically run when doing the next deployment. That doesn't work in 90% of the cases - it works in 100% of the cases.

Woe be unto the developer whose deployment process only works 90% of the time.

moe · on Dec 9, 2010

Please list

I already did in another comment but I'll repeat: django south, hibernate schemaupgrade, datamapper automigrate, postfacto (standalone python/postgres), sqlalchemy-migrate

e.g. "I want to add this column to this table."

The difference is that in Rails the user has to update the model and create a matching migration. In most other frameworks the second part is strictly optional and by default implicit.

Woe be unto the developer whose deployment process only works 90% of the time.

That's not what happens. In practice there's always a review step.

However, I've already been accused of strawmanning, so rather than taking this further offtopic let me just point you to http://south.aeracode.org/docs/tutorial/part1.html

kemiller · on Dec 9, 2010

At least several, maybe all, of those were directly inspired by rails migrations. Whether something automatic, but imperfect (renames, anyone?) is actually better than something explicit but easy enough seems a matter of taste.

moe · on Dec 10, 2010

At least several, maybe all, of those were directly inspired by rails migrations.

Actually most of them (except datamapper and postfacto) have existed before rails was created.

And I'll argue strongly against "matter of taste" when it comes to error-prone routine procedures.

rimantas · on Dec 9, 2010

Fowler himself was impressed by Rails implementation of ActiveRecord, and that was long before Arel (google for 2006 railsconf keynote by M. Fowler). I wonder, what are all others which are better according to you…

thibaut_barrere · on Dec 9, 2010

Well it's really a matter of taste.

Some small teams (including some I work with) find it more clear to have 001,002 etc, and just have the convention to get in touch before any schema change, which is overall often a good idea anyway.

neeleshs · on Dec 9, 2010

Timestamps, across distributed teams over different timezones. We ran in to sequencing issues and switched back to numbers.

troels · on Dec 9, 2010

You didn't consider adding the timezone offset?

rimantas · on Dec 9, 2010

Removing, more likely. Everything in GMT.

rmc · on Dec 8, 2010

Some frameworks take care of this automatically. Mango has an excellent library called South that does exactly this. It keep tracks of migrations and it can automatically detect nearly all migrations. It's much easier than doing it all by hand.

rmc · on Dec 8, 2010

s/Mango/Django/

Danny you autocorrect on my Android

RickHull · on Dec 8, 2010

"Danny you"... heh, priceless!

Mango would be a good nickname for Django + MongoDB

briancurtin · on Dec 9, 2010

Haha are you even reading what you are typing?

dbrannan · on Dec 9, 2010

http://mangoblog.org/

I know the developer - she is awesome!

alexitosrv · on Dec 9, 2010

I think he was talking about Django, but that Mango thing I didn't know it (a blog engine, by the way) and it looks great and interesting.

ewams · on Dec 8, 2010

Cassandra .7 keeps track of it for you as well.

stavros · on Dec 8, 2010

Haha, "Mango"?

rmc · on Dec 8, 2010

Damn you autocorrect!!!

jules · on Dec 8, 2010

Hobo too.

citricsquid · on Dec 8, 2010

Andrew is a cool cat.

tbrownaw · on Dec 8, 2010

Another approach I've seen used is to map the hash of your schema to an upgrade script (which would be an empty script for the latest version):

    while(true) {
      string current_version = <sha1 of the dictionary table contents>
      if (!upgrade_scripts.containsKey(current_version)) {
        complain_loudly()
        exit(1)
      } else {
        string script = upgrade_scripts[current_version]
        if (script.empty())
          break
        else
          execute(script)
      }
    }

checoivan · on Dec 8, 2010

Another option is to have the schema creation/upgrade as scripts (either hand made or autogenerated ) , then check them into source control.

And resist the temptation of hacking the schema directly in your servers, it ends being a larger effort :)

z2amiller · on Dec 8, 2010

Checking the schema and upgrade scripts into source control is extremely important, and this is how I have managed it in the past. (names like table.alter.12.sql). I regret not setting this type of versioning table up from the beginning, though - when you have multiple environments (dev, QA, staging, etc) it can get very confusing in a hurry as to which schema changes have been applied to which databases. With a versioning table and maybe even some deployment time scripts to sanity check versions between the release and the DB, mismatches become immediately apparent.

brown9-2 · on Dec 8, 2010

The team behind the iBatis OSS project (now known as MyBatis) has a good tool named "migrations" which does exactly this. The original OSS project is a SQL mapping library for .net and java but the tool is independent of platform/language/OS.

Each change is stored as a script which have names like "<timestamp>_<brief description>.sql", and the timestamps serve as IDs as well in the "changelog" table in the DB. Creating the changelog table is always the first change.

The "migrate" command of the tool then has options for "status", "up" "down" etc., and features to create scripts to go from v1 -> v2. Another nice concept they provide to the mix is adding an "Undo" section to each script so that you can rollback changes and downgrade schemas just as easily as upgrading them.

zaphar · on Dec 8, 2010

Yes, This is what we usually did in a previous job. The only benefit of storing in the database is that the code can check schema version if that's a necessary component. I could see some value there.

Nowadays I'm primarly working with non-relational datastores where this seems to be much less of an issue. There we version the entities that we store instead of a schema and that is done entirely in source control as well.

jasonkester · on Dec 9, 2010

There's no particular reason to keep a whole table for this, since all you really want to know is what version you're at.

We use a UDF called VersionNumber that returns an integer. When the continuous build processes a new database change script, that function gets modified to increment the value it returns. So you can always call VersionNumber() on any of our databases and know which version you're looking at. The builds actually use it to decide which scripts they need to run to get from where they are to where they need to be.

You already have all the commentary you need in source control and in the change scripts themselves. I don't see a reason to duplicate it in the database as well.

moe · on Dec 9, 2010

I'd argue that "all projects with a [relational] database [and an ORM]" should rather move to modern tooling instead of writing migrations by hand.

Rails is surprisingly anachronistic here, which probably stems from that awful conceptual separation between model and schema.

Most other platforms have semi-automatic schema evolution that usually works very well (datamapper automigrate, django south, hibernate SchemaUpdate, etc.).

lsb · on Dec 9, 2010

If you don't have enormous amounts of data, one thing I've found helpful on my own projects is to make a hot backup of the database every time the schema changes, as well as make hot backups daily, and then when you check out a version of code you pair it with the latest hot backup.

(This is far easier when using SQLite, which has its own tradeoffs.)

stavros · on Dec 8, 2010

Oh, django-south, how I love you.

GFischer · on Dec 8, 2010

I find this a more informative comment than the one above:

http://news.ycombinator.com/item?id=1984609

as I don't have to google around to find out what south is (I haven't used Django yet)

fendale · on Dec 8, 2010

Yes yes yes. In my experience in Enterprise Dev teams it seems to be way too common that people just don't version control their DB schema, or use tools to produce schema diffs between dev and prod to product upgrade scripts, all driven by using GUI's to create their tables and then exporting the DDL using a tool (eg TOAD if you are doing Oracle work).

A few years back, I built an installer in Ruby to apply 'plsql modules' to an Oracle database. This was a massive project with > 100 developers at this point and probably heading toward 1M lines of code.

My migrations table worked on modules, so there were a set of migrations per application area, but it was really just an extension of this idea (and I borrowed the idea from Rails too)!

smarterchild · on Dec 8, 2010

Migrator.NET uses a similar setup for .NET programmers.

rezaman · on Dec 9, 2010

Although the project is still pretty young, liquibase(http://www.liquibase.org/) is a solid opensource project for not only managing schema revisions, but inserting seed data as well as abstracting schema structure from DBMS.

Nathan also does a good job of applying submitted patches quickly.

quicksilver03 · on Dec 9, 2010

There are also other projects like:

DbDeploy http://dbdeploy.com/ DbMaintain http://www.dbmaintain.org/

shashivelur · on Dec 13, 2010

I wrote a blog post on the same DB migration topic about 2 years ago. This can be applied to any RDBMS and any programming language. Check it out: http://shashivelur.com/blog/2008/07/hibernate-db-migration

mise · on Dec 9, 2010

I have small PHP/MySQL projects I develop across multiple home computers and the server. This is a problem that's sometimes a bit hampering. Files are source controlled through SVN. Any suggestions on a tool that would be lightweight enough to be worth my time using it.

chopsueyar · on Dec 9, 2010

You could use phing.

Here is an older article... http://www.davedevelopment.co.uk/2008/04/14/how-to-simple-da...

stepancheg · on Dec 8, 2010

<ads> Just use http://bitbucket.org/stepancheg/mysql-diff/ to compare schemas. </ads>

terra_t · on Dec 8, 2010

Its generally true, but I've developed systems that go sideways and fork into multiple versions, so the data structures get more complicated.

BrandonM · on Dec 8, 2010

Why manually specify a key as a string instead of using an int auto_increment?

brandon · on Dec 8, 2010

In this case, the key corresponds to the filename containing the migration statements.

Further, consider if your company had three deployments of your product. One for bleeding edge testing, one for staging/QA, and live. You'd want the schema version to be consistent across all three schemas.

jfb · on Dec 9, 2010

Why introduce an surrogate key unnecessarily?