Hacker News new | past | comments | ask | show | jobs | submit login
Pgslice: Postgres partitioning as easy as pie (github.com/ankane)
137 points by craigkerstiens on Sept 20, 2016 | hide | past | favorite | 32 comments



> Battle-tested at instacart

If you don't mind me asking, for how long?

I've found using EXECUTE in insert triggers to be very problematic - it consumes a txid, AIUI, and every 200m inserts, a bunch of anti-wraparound autovacuums have to to scan allofthethings. My insert triggers have big blocks of IF/ELSIF branches (although it's straightforward enough to construct those in pl/pgsql, just requires an extra level of indirection when preparing the partitioning).


We've been using the tool for around 5 months, but the same approach/triggers for over a year. I wasn't aware the IF/ELSIF trigger approach had potential advantages over EXECUTE, but will definitely look into it.


As a recent postgres convert (from mysql) -- would be interested to know more about the general pattern or examples which I might keep in mind for:

My insert triggers have big blocks of IF/ELSIF branches


EXECUTE itself doesn't cause xid consumption. The increased cost over "plain" queries comes from re-parsing and re-planning queries.


Update: Hey ak4g, the latest version of pgslice uses this approach. We're seeing roughly 20% less load on inserts with it (in the short time it's been in production). Thanks for the suggestion!


I wrote this a while back: https://github.com/fiksu/partitioned supports partitioning in rails.

It's best to start here: https://github.com/fiksu/partitioned/blob/master/PARTITIONIN...

for an overview.

the examples should prove its use.


Why an external program and not something in PG/SQL or at least a PG extension.

Also, how does it compare with pg_partman[1] which also has similar goals?

[1] https://github.com/keithf4/pg_partman


We currently use Amazon RDS, which doesn't support custom extensions, so pg_partman wasn’t an option for us. I haven't used pg_partman myself, but it looks to have different features.

I think an external tool is a bit friendlier for people who aren't super comfortable with installing Postgres extensions. Also, it doesn't require a restart.

I'm optimistic that Postgres will make all of this trivial at some point.

Edit: changed "more features" to "different features". One of the main goals is the ability to partition existing production tables without downtime, which seems different than pg_partman.


> I'm optimistic that Postgres will make all of this trivial at some point.

There's considerable ongoing work towards that. If you're interested in helping out, consider testing and reviewing the patchset. That doesn't necessarily require a lot of postgres internals knowledge, user interface feedback after trying is very welcome.



Looks right, yes.


>Why an external program and not something in PG/SQL or at least a PG extension.

I have no association with the linked project, but here are some reasons why I might take the same approach:

1. Source control. Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

2. Not being strongly tied to a single software. It's true that this is currently directly intended to be used for PG, but keeping it separate would make it easier to re-adapt for another database software in the future.

3. Lack of expertise with PG internals. This one is just a reason, not a justification.

Note that I'm not defending any software design decisions here, just speculating in a way I find reasonable. I also have the same questions and would be curious to hear the author(s)'s responses.

(Edited for formatting issues with the numbered list - and then again for spelling errors)


>> 1. Source control. Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

Not entirely true, PG extensions can be versioned/packaged and deployed just like any versioned code. Just deploy the new version of the extension and ```drop extension foo; create extension foo``` to update to the new version.

It's not as slick and greasy as ```git push heroku master```, but still, not so bad.

(I'm not the author, but I have written a non-trivial Pl/PgSQL extension)


Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

Why not?


You're updating code that needs to be executed against an external layer instead of just the application layer.


But you could say the same for any code that say, touches the filesystem.


I meant that the code has to be deployed to a separate system.

Application code that touches the file system is still deployed along with the application code.

You might commit schema.sql into your application repository, but that's not the state of your database.


I think you are confused. A PG extension is code like any other. For the use cause such as partitioning you should not need to create any schema, what you are doing would be to write a code that acts on schema.

Schema usually is needed for application that will be using a database. I also don't understand the scare about using schema. Schema is FAR FAR FAR simpler to deploy than a Java application.


Great to see more options for partition management in postgresql.

We took a similar approach using a ruby library to manage partitions at gilt groupe starting many years ago (2009?). It worked quite well and is still in use today.

Since then have also used the referenced https://github.com/keithf4/pg_partman on RDS - and it works great! Very happy to have all the partition management in the core database vs. in other languages, and many of the features Keith provides in pg_partman just work. He's done a great job with the project and would encourage folks to take a serious look at his work.

If you are interested in details of how we applied pg_partment to RDS w/out requiring extensions, take a look at:

https://github.com/flowcommerce/lib-postgresql/blob/master/R...

We created a simple process to load the scripts directly into the database similar to any one of our other migrations. We then create higher level wrappers for our teams to use to apply consistent use.


Looks useful, are you using this in production?


Yes, we do at Instacart.


Could you speak a bit of the use cases you've found this to be a better tool than competing alternatives, whatever they may be, if you're able to?


We didn't really find anything that fit our needs. We wanted a tool that was simple to use with the ability to do time-based partitioning of existing production tables without downtime.


> Battle-tested at Instacart

I assume so.


Ummm https://github.com/keithf4/pg_partman has been around for a long time and is used all around the world... AND it doesn't require cron jobs. Much tighter solution


I also wrote a Rails gems that makes the create and read processes completely transparent with Active Record when using date partitions. In case someone is insterested github.com/pacuna/partitionable


Can I ask why you decided to partition Postgres like this? It looks like you have time-series data based on the examples, and you wanted quick dropping, am I right?


That's correct. It allows us to manage the database size over time for tables where we can archive data after a specific period.


This is cool. Now if only it somehow magically allowed a unique index over the whole setup, it'd make my day!


What do you mean? For uniqueness, simple scheme like "Partition ID + SERIAL" will be unique enough (or just guid or similar). For performance, table index is fine, since you already know which partition request is for.


The problem is the unique IDs are GUIDs generated independently (and offline) by the client and then passed to the database; we need to make sure that no event is processed twice.

Currently there's one big log table and a uniqueness constraint on the GUID field. I'd love to partition the table out but until I can identify a clean route to satisfying that requirement I'm keeping one large log.

Though, now that I think about it, maybe partitioning on the first character of the GUID string is feasible, as we can ensure uniqueness per-table then


I suspect the parent poster wants to (quickly) process a unique constraint on the table via an index over the involved columns.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: