Hacker News new | past | comments | ask | show | jobs | submit login
Rails 4 UUID primary keys with PostgreSQL (labria.github.io)
71 points by labria on April 27, 2013 | hide | past | favorite | 37 comments



(Caveat: I know this about SQL server, not Postgres)

If you use random UUID's (as opposed to sequential UUID's) for your primary key your database will spend an extra hunk of time on reordering your PK index on disk on insert. This bit us at Stackoverflow. So remember: just because you can do something doesn't mean you should.


Yes, what you said also apply to MySQL/InnoDB and MongoDB. Someone did an experiment with MongoDB some times ago for various id schemes: http://i.imgur.com/clm9D.png (https://groups.google.com/d/topic/mongodb-user/1gPqVmFHExY/d...), I don't know if Postgres is being different but people should do their own testings instead of blindly listening to advices from the Internet.


What's the advantage of a sequential UUID over an auto-incrementing integer?

(Keeping in mind that both can be started at N on different shards, or prefixed.)


You can merge any two data buckets, without any risk of id collision.



No you didn't. Not fully at least. OP is saying that there is a penalty just for doing the inserts themselves.


Just use a k-ordered uuid.


I commented on the OP's article, but do this for the pk:

    create_table(:users, id: :uuid) do |t|
      t.string 'name'
    end
And to make sure the extension is enabled, (in your migrations) do:

    enable_extension 'uuid-ossp'
For now, make sure to use SQL schema dumping. I don't think Ruby schema files will work (as I haven't written a test for it yet).

EDIT: I forgot to mention, you can enable hstore by using "enable_extension('hstore')"


Sorry for the up coming stupid question but...

What is the value of this? Why can't unique identification be done using just regular increments on an ID column? Or even a composite key?


As pallinder said, it can be very handy: the IDs can be generated by the nodes, not the db server. Very useful in disconnected environments. Imagine being able to create data on a smartphone whilst sitting on the plane, and not having to do anything messy with ID replacement when you sync with the server in the office.

The cost? The keys are larger, and (unless using a sequential algorithm) are poor candidates for clustered keys (because they force page splits). The impact can be rather large (this lead to terrible performance in early versions of Sharepoint).


To be clear, MAC-based UUIDs are sequential. PRNG (v4) UUIDs are random and don't work well with indexes.


It's also useful if you don't want your URLs to expose how many of a certain thing you have, whether it be users, posts, payments etc. A lot of sites let you derive how much activity they have based on how fast their numeric IDs increment. You could use a separate token alongside the pkey to do the same but this feature just makes it simpler.



Only downside: this makes the URLs very long sometimes.


Imagine a distributed system where you want to preserve uniqueness across the board. Using a uuid more or less (by the sheer number of possibilities) guarantees that this will be the case.


UUID v1 uses the MAC address of the computer doing the generation as a part of the UUID, which ensures uniqueness so long as you aren't cloning MAC addresses in your infrastructure.

The downside of this is that it can leak information about the machine that generated the UUID, but if you require deterministic uniqueness, there you go.


Another not mentioned here: sharding becomes much easier. You don't need a central authority that controls the increment.


At the DB level, it facilitates master-master replication set-up, eliminating the auto_increment collision problems. Master-master replication, in turn, allows building distributed applications that can handle net splits reasonably well.


I think the primary_key line should be:

    t.primary_key :id, :type => :uuid


It's actually: t.primary_key :id, :uuid


Sorry, fixed that.


That makes a lot more sense, I was staring at the example, and then just figured I'd have to try it later.


I was really wondering where the difference to the normal syntax was. Thanks for the clarification


If you'd like to use UUIDs in Rails 3, I've been maintaining the activeuuid plugin for a while now:

https://github.com/jashmenn/activeuuid



Or just use Sequel already, works with all versions of Rails (and without Rails too).


Now if only I could get this working with Postgres.app


Running on rails 4 it should be:

   $ rails g migration add_uuid_to_postgresql

   class AddUuidToPostgresql < ActiveRecord::Migration
     execute("CREATE EXTENSION uuid-ossp;")
   end

   $ rake db:migrate && rake db:test:prepare
Then create a table as shown in the article, and it should work.


For this to work, the PG role that the Rails app is using has to be a superuser, as AFAIK only superuser roles can execute CREATE EXTENSION.

Does anyone consider this an issue? I have been using non-superuser roles within my Rails apps, and using an outside superuser role to add extensions with an external tool (like pgAdmin).


Yes, that's exactly I didn't include it in the migration in the example.


There's a shortcut in the Active Record PostgreSQLAdapter:

  enable_extension 'uuid-ossp'


Can you use find_in_batches or find_each with UUIDs?


Of course, both of those methods are members of ActiveRecord and will work regardless of how you filter your records.


But don't those methods rely on the sequential ordering of auto incrementing ids?


After reading the documentation more clearly. http://edgeapi.rubyonrails.org/classes/ActiveRecord/Batches..... It looks like there may need to be a patch for this feature, however the batching process forces the database to make a query in Ascending order for the primary keys so as long as you can generate new UUIDS in ascending order you should be fine. However problems look like they will occur if you generate a new record that has a UUID that would be inserted randomly between two numbers in the UUID range. I will see if I can bang something out and issue a pull request.


I was dealing with a similar issue on a project that all id's are UUID's because I wanted non-sequential id's.

I found it was just as simple of selecting ranges based of their date_created attribute.


No because ActiveRecord will try and do > uuid comparisons that won't evaluate.

Also, if it's not sequential, you may miss records while iterating through a collection.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: