Hacker News new | past | comments | ask | show | jobs | submit login
A table that should exist in all projects with a database (cherouvim.com)
146 points by fogus on Dec 8, 2010 | hide | past | favorite | 63 comments



It's all true, but please take the lesson from rails and use timestamps rather than sequential id's.


What's the advantage?


Consistent ordering even if two developers happen to add migrations starting from the same source version.


Could you elaborate on that?


Schema version is at 200.

Developer A creates a migration numbered 201. Developer B creates a migration numbered 201. They push their code, pull, and get conflicts.

So instead, they use a timestamp based system. Developer A created his and got 20100604181252. Developer B got 20100604182530. Now, everyone is happy, and no one has to change their migrations.


I think I prefer the first version. At least the conflict is obvious.

Developer A creates migration 20100604181252. Developer B creates migration 20100604181253. Developer B doesn't notice Developer A's migration: there's no way to know from the timestamps that there might be a conflict. Everything breaks.

vs

Developer A creates migration 201. Developer B creates migration 201. Developer B gets a handy notification that something has happened in the meantime, and gets a chance to patch the migration if necessary.


Actual conflicts from working with the same fields in the same tables are so rare, I haven't even experienced it once.

However, the second scenario would happen every single day in a team.


I have occasionally modified a sproc, only to find that a conflicting check-in has also modified that same sproc. One of our modifications has been overwritten by the other.

I do concede it is rare, but it happened, and we only discovered the bug too late.


The real issue is when you have feature branches and merge them in to master/trunk - so it's not a single file conflicting, but maybe a week or more's worth of work from both developers.

As you say, it would rely on notifications that there was the possibility of clashes, but unless you monitor every other developer's local feature branch, you would only know when the branches are merged into master.


Two developers pull code with migration version 4. Each writes a new feature involving a migration. Both get the number 5, which then collides when they both check in. But it doesn't collide at the source level, so you only find out when the db doesn't work right. This was a big pain in the ass in Rails until they fixed it.


Why don't they collide at the source level? I'm assuming the db updates get checked into your central source code repo. Wouldn't the collision happen at that time?

What happens when developers create branches and use time stamps? Developer A working on trunk might check in db updates at 4pm and 5pm. Developer B working on a branch checks db updates into his branch at 4:30pm and 5:30pm. When working on the branch his db updates don't affect the "trunk" db. As soon as he merges does his 4:30pm update get run on the "trunk" db, or is it skipped over because the 5pm db update already ran, and only the 5:30pm change gets run?

Where I come from we use int values for our db update script names. When I merge from a branch I renumber my db update script file names starting at the tip int value +1 of the trunk db update.

fyi - I'm not coming from a Ruby/Rails perspective, if that matters.


There's a table listing the migrations that have already been run on the database. The list of migrations in source control are compared against the list in the database, the outstanding ones sorted into timestamp order and then run.

So if the two branches are separate there should be no issue - all my changes get run in order, irrespective of what the other developers have been doing.


They don't collide because the file names include more than just the number.

And yes, manually renumbering is exactly what you'd have to do. Timestamps obviate that.

Theoretically, you can still have collisions (two migrations alter the same column, eg) or violated prereqs this way, so some communication is still required, but not to the degree before.


Posit repository A which is at schema state S(A) = N. Developer X branches it to make B and adds a new database migration, increasing S(B) to N+1. Developer Y branches it to make C and adds a new database migration, increasing S(C) to N+1. Now, merge B and C.

It's not necessary that schema versions be that rigid. All you really need is that they be monotonically increasing; a timestamp in UTC is sufficient for those purposes, although technically you could run into a situation where a UUID is necessary.


Ah, very interesting. I'm sold.


Timestamps, UUIDs, and such have no dependency on a central source (and thus, single source fo failure) for generating them. Much better scalability.

Even with a single server, not relying on IDs being sequental is useful when you want to do a database migration without temporarily shutting down the site. It is much simpler to merge the two databases together.


A database/ORM lesson from... rails?

You may want to re-think that statement.

(and I'm not trolling, I use and like rails, but the database abstraction is amongst the weakest of all popular frameworks)


If that's not trolling, then I don't know what is.

But assuming that you are sincere, your reasoning is flawed. See, the fact that Rails' orm is in some way badly designed (your words - not mine) doesn't diminish this particular feature. The system could be overall crap, but still have one good thing about it. Besides, the schema-migration has very little to do with the ORM, except for the fact that they both interact with a database.

What you're doing, is called a straw man argument.


doesn't diminish this particular feature

Well, this particular feature is a bandaid for an underlying misfeature (manual schema management).

Proposing this as good practice is somewhat akin to proposing "If you design a new operating system then make sure the installation of virus scanners is as easy as on Microsoft Windows".

You may call it a strawman, I'd call it "putting something into perspective".

Besides, the schema-migration has very little to do with the ORM, except for the fact that they both interact with a database.

That's a nice theory but in practice most rails users do use the builtin migration mechanism, for the lack of a better alternative (at least to my knowledge, feel free to correct me).

Consequently I don't think it's unfair to compare that mechanism to the equivalent in similar frameworks - and in that comparison rails loses, big time.


"Well, this particular feature is a bandaid for an underlying misfeature (manual schema management)."

Could you expand on that? Maybe I'm missing something here.


Well, I was aiming at the "learn from rails" part.

The explicit mention of rails just stuck out to me (probably more than the author intended), as if rails had achieved something extra-ordinarily smart there. - While to me it's just an implementation detail of an otherwise rather inferior approach.

Modern schema migration tools compute the difference between the current and the desired state on-the-fly, and generate the migration commands accordingly. Explicit version numbers/timestamps/hashes are mostly used for documentation and transparency purposes there, not for the sake of conflict resolution.

These algorithms are naturally not flawless, but they work well for the proverbial 90% of cases. In such an environment the user doesn't normally have to futz with migration scripts or migration version-numbers at all.


Modern schema migration tools?

Please list those widely used modern schema migration tools as they occur in other popular languages like PHP, Python, Perl, and so on.

In rails, users don't futz with migration version numbers. Those are handled automatically by the system. The user simply specifies a migration, e.g. "I want to add this column to this table." The migration is then automatically run when doing the next deployment. That doesn't work in 90% of the cases - it works in 100% of the cases.

Woe be unto the developer whose deployment process only works 90% of the time.


Please list

I already did in another comment but I'll repeat: django south, hibernate schemaupgrade, datamapper automigrate, postfacto (standalone python/postgres), sqlalchemy-migrate

e.g. "I want to add this column to this table."

The difference is that in Rails the user has to update the model and create a matching migration. In most other frameworks the second part is strictly optional and by default implicit.

Woe be unto the developer whose deployment process only works 90% of the time.

That's not what happens. In practice there's always a review step.

However, I've already been accused of strawmanning, so rather than taking this further offtopic let me just point you to http://south.aeracode.org/docs/tutorial/part1.html


At least several, maybe all, of those were directly inspired by rails migrations. Whether something automatic, but imperfect (renames, anyone?) is actually better than something explicit but easy enough seems a matter of taste.


At least several, maybe all, of those were directly inspired by rails migrations.

Actually most of them (except datamapper and postfacto) have existed before rails was created.

And I'll argue strongly against "matter of taste" when it comes to error-prone routine procedures.


Fowler himself was impressed by Rails implementation of ActiveRecord, and that was long before Arel (google for 2006 railsconf keynote by M. Fowler). I wonder, what are all others which are better according to you…


Well it's really a matter of taste.

Some small teams (including some I work with) find it more clear to have 001,002 etc, and just have the convention to get in touch before any schema change, which is overall often a good idea anyway.


Timestamps, across distributed teams over different timezones. We ran in to sequencing issues and switched back to numbers.


You didn't consider adding the timezone offset?


Removing, more likely. Everything in GMT.


Some frameworks take care of this automatically. Mango has an excellent library called South that does exactly this. It keep tracks of migrations and it can automatically detect nearly all migrations. It's much easier than doing it all by hand.


s/Mango/Django/

Danny you autocorrect on my Android


"Danny you"... heh, priceless!

Mango would be a good nickname for Django + MongoDB


Haha are you even reading what you are typing?


http://mangoblog.org/

I know the developer - she is awesome!


I think he was talking about Django, but that Mango thing I didn't know it (a blog engine, by the way) and it looks great and interesting.


Cassandra .7 keeps track of it for you as well.


Haha, "Mango"?


Damn you autocorrect!!!


Hobo too.


Andrew is a cool cat.


Another approach I've seen used is to map the hash of your schema to an upgrade script (which would be an empty script for the latest version):

    while(true) {
      string current_version = <sha1 of the dictionary table contents>
      if (!upgrade_scripts.containsKey(current_version)) {
        complain_loudly()
        exit(1)
      } else {
        string script = upgrade_scripts[current_version]
        if (script.empty())
          break
        else
          execute(script)
      }
    }


Another option is to have the schema creation/upgrade as scripts (either hand made or autogenerated ) , then check them into source control.

And resist the temptation of hacking the schema directly in your servers, it ends being a larger effort :)


Checking the schema and upgrade scripts into source control is extremely important, and this is how I have managed it in the past. (names like table.alter.12.sql). I regret not setting this type of versioning table up from the beginning, though - when you have multiple environments (dev, QA, staging, etc) it can get very confusing in a hurry as to which schema changes have been applied to which databases. With a versioning table and maybe even some deployment time scripts to sanity check versions between the release and the DB, mismatches become immediately apparent.


The team behind the iBatis OSS project (now known as MyBatis) has a good tool named "migrations" which does exactly this. The original OSS project is a SQL mapping library for .net and java but the tool is independent of platform/language/OS.

Each change is stored as a script which have names like "<timestamp>_<brief description>.sql", and the timestamps serve as IDs as well in the "changelog" table in the DB. Creating the changelog table is always the first change.

The "migrate" command of the tool then has options for "status", "up" "down" etc., and features to create scripts to go from v1 -> v2. Another nice concept they provide to the mix is adding an "Undo" section to each script so that you can rollback changes and downgrade schemas just as easily as upgrading them.


Yes, This is what we usually did in a previous job. The only benefit of storing in the database is that the code can check schema version if that's a necessary component. I could see some value there.

Nowadays I'm primarly working with non-relational datastores where this seems to be much less of an issue. There we version the entities that we store instead of a schema and that is done entirely in source control as well.


There's no particular reason to keep a whole table for this, since all you really want to know is what version you're at.

We use a UDF called VersionNumber that returns an integer. When the continuous build processes a new database change script, that function gets modified to increment the value it returns. So you can always call VersionNumber() on any of our databases and know which version you're looking at. The builds actually use it to decide which scripts they need to run to get from where they are to where they need to be.

You already have all the commentary you need in source control and in the change scripts themselves. I don't see a reason to duplicate it in the database as well.


I'd argue that "all projects with a [relational] database [and an ORM]" should rather move to modern tooling instead of writing migrations by hand.

Rails is surprisingly anachronistic here, which probably stems from that awful conceptual separation between model and schema.

Most other platforms have semi-automatic schema evolution that usually works very well (datamapper automigrate, django south, hibernate SchemaUpdate, etc.).


If you don't have enormous amounts of data, one thing I've found helpful on my own projects is to make a hot backup of the database every time the schema changes, as well as make hot backups daily, and then when you check out a version of code you pair it with the latest hot backup.

(This is far easier when using SQLite, which has its own tradeoffs.)


Oh, django-south, how I love you.


I find this a more informative comment than the one above:

http://news.ycombinator.com/item?id=1984609

as I don't have to google around to find out what south is (I haven't used Django yet)


Yes yes yes. In my experience in Enterprise Dev teams it seems to be way too common that people just don't version control their DB schema, or use tools to produce schema diffs between dev and prod to product upgrade scripts, all driven by using GUI's to create their tables and then exporting the DDL using a tool (eg TOAD if you are doing Oracle work).

A few years back, I built an installer in Ruby to apply 'plsql modules' to an Oracle database. This was a massive project with > 100 developers at this point and probably heading toward 1M lines of code.

My migrations table worked on modules, so there were a set of migrations per application area, but it was really just an extension of this idea (and I borrowed the idea from Rails too)!


Migrator.NET uses a similar setup for .NET programmers.


Although the project is still pretty young, liquibase(http://www.liquibase.org/) is a solid opensource project for not only managing schema revisions, but inserting seed data as well as abstracting schema structure from DBMS.

Nathan also does a good job of applying submitted patches quickly.


There are also other projects like:

DbDeploy http://dbdeploy.com/ DbMaintain http://www.dbmaintain.org/


I wrote a blog post on the same DB migration topic about 2 years ago. This can be applied to any RDBMS and any programming language. Check it out: http://shashivelur.com/blog/2008/07/hibernate-db-migration


I have small PHP/MySQL projects I develop across multiple home computers and the server. This is a problem that's sometimes a bit hampering. Files are source controlled through SVN. Any suggestions on a tool that would be lightweight enough to be worth my time using it.


You could use phing.

Here is an older article... http://www.davedevelopment.co.uk/2008/04/14/how-to-simple-da...


<ads> Just use http://bitbucket.org/stepancheg/mysql-diff/ to compare schemas. </ads>


Its generally true, but I've developed systems that go sideways and fork into multiple versions, so the data structures get more complicated.


Why manually specify a key as a string instead of using an int auto_increment?


In this case, the key corresponds to the filename containing the migration statements.

Further, consider if your company had three deployments of your product. One for bleeding edge testing, one for staging/QA, and live. You'd want the schema version to be consistent across all three schemas.


Why introduce an surrogate key unnecessarily?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: