Pgslice: Postgres partitioning as easy as pie

ak4g · on Sept 21, 2016

> Battle-tested at instacart

If you don't mind me asking, for how long?

I've found using EXECUTE in insert triggers to be very problematic - it consumes a txid, AIUI, and every 200m inserts, a bunch of anti-wraparound autovacuums have to to scan allofthethings. My insert triggers have big blocks of IF/ELSIF branches (although it's straightforward enough to construct those in pl/pgsql, just requires an extra level of indirection when preparing the partitioning).

akane · on Sept 21, 2016

We've been using the tool for around 5 months, but the same approach/triggers for over a year. I wasn't aware the IF/ELSIF trigger approach had potential advantages over EXECUTE, but will definitely look into it.

dfsegoat · on Sept 21, 2016

As a recent postgres convert (from mysql) -- would be interested to know more about the general pattern or examples which I might keep in mind for:

My insert triggers have big blocks of IF/ELSIF branches

anarazel · on Sept 21, 2016

EXECUTE itself doesn't cause xid consumption. The increased cost over "plain" queries comes from re-parsing and re-planning queries.

akane · on Sept 23, 2016

Update: Hey ak4g, the latest version of pgslice uses this approach. We're seeing roughly 20% less load on inserts with it (in the short time it's been in production). Thanks for the suggestion!

keithgabryelski · on Sept 21, 2016

I wrote this a while back: https://github.com/fiksu/partitioned supports partitioning in rails.

It's best to start here: https://github.com/fiksu/partitioned/blob/master/PARTITIONIN...

for an overview.

the examples should prove its use.

takeda · on Sept 21, 2016

Why an external program and not something in PG/SQL or at least a PG extension.

Also, how does it compare with pg_partman[1] which also has similar goals?

[1] https://github.com/keithf4/pg_partman

akane · on Sept 21, 2016

We currently use Amazon RDS, which doesn't support custom extensions, so pg_partman wasn’t an option for us. I haven't used pg_partman myself, but it looks to have different features.

I think an external tool is a bit friendlier for people who aren't super comfortable with installing Postgres extensions. Also, it doesn't require a restart.

I'm optimistic that Postgres will make all of this trivial at some point.

Edit: changed "more features" to "different features". One of the main goals is the ability to partition existing production tables without downtime, which seems different than pg_partman.

anarazel · on Sept 21, 2016

> I'm optimistic that Postgres will make all of this trivial at some point.

There's considerable ongoing work towards that. If you're interested in helping out, consider testing and reviewing the patchset. That doesn't necessarily require a lot of postgres internals knowledge, user interface feedback after trying is very welcome.

akane · on Sept 21, 2016

Is this the latest work? https://www.postgresql.org/message-id/ad16e2f5-fc7c-cc2d-333...

anarazel · on Sept 21, 2016

Looks right, yes.

aisofteng · on Sept 21, 2016

>Why an external program and not something in PG/SQL or at least a PG extension.

I have no association with the linked project, but here are some reasons why I might take the same approach:

1. Source control. Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

2. Not being strongly tied to a single software. It's true that this is currently directly intended to be used for PG, but keeping it separate would make it easier to re-adapt for another database software in the future.

3. Lack of expertise with PG internals. This one is just a reason, not a justification.

Note that I'm not defending any software design decisions here, just speculating in a way I find reasonable. I also have the same questions and would be curious to hear the author(s)'s responses.

(Edited for formatting issues with the numbered list - and then again for spelling errors)

s_kilk · on Sept 21, 2016

>> 1. Source control. Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

Not entirely true, PG extensions can be versioned/packaged and deployed just like any versioned code. Just deploy the new version of the extension and ```drop extension foo; create extension foo``` to update to the new version.

It's not as slick and greasy as ```git push heroku master```, but still, not so bad.

(I'm not the author, but I have written a non-trivial Pl/PgSQL extension)

gaius · on Sept 21, 2016

Code at the level of PG/SQL into source control isn't as easily deployed and managed as, say, Java code.

Why not?

danneu · on Sept 21, 2016

You're updating code that needs to be executed against an external layer instead of just the application layer.

gaius · on Sept 21, 2016

But you could say the same for any code that say, touches the filesystem.

danneu · on Sept 23, 2016

I meant that the code has to be deployed to a separate system.

Application code that touches the file system is still deployed along with the application code.

You might commit schema.sql into your application repository, but that's not the state of your database.

takeda · on Sept 23, 2016

I think you are confused. A PG extension is code like any other. For the use cause such as partitioning you should not need to create any schema, what you are doing would be to write a code that acts on schema.

Schema usually is needed for application that will be using a database. I also don't understand the scare about using schema. Schema is FAR FAR FAR simpler to deploy than a Java application.

michaelbryzek · on Sept 21, 2016

Great to see more options for partition management in postgresql.

We took a similar approach using a ruby library to manage partitions at gilt groupe starting many years ago (2009?). It worked quite well and is still in use today.

Since then have also used the referenced https://github.com/keithf4/pg_partman on RDS - and it works great! Very happy to have all the partition management in the core database vs. in other languages, and many of the features Keith provides in pg_partman just work. He's done a great job with the project and would encourage folks to take a serious look at his work.

If you are interested in details of how we applied pg_partment to RDS w/out requiring extensions, take a look at:

https://github.com/flowcommerce/lib-postgresql/blob/master/R...

We created a simple process to load the scripts directly into the database similar to any one of our other migrations. We then create higher level wrappers for our teams to use to apply consistent use.

qaq · on Sept 20, 2016

Looks useful, are you using this in production?

akane · on Sept 20, 2016

Yes, we do at Instacart.

aisofteng · on Sept 21, 2016

Could you speak a bit of the use cases you've found this to be a better tool than competing alternatives, whatever they may be, if you're able to?

akane · on Sept 21, 2016

We didn't really find anything that fit our needs. We wanted a tool that was simple to use with the ability to do time-based partitioning of existing production tables without downtime.

khy · on Sept 20, 2016

> Battle-tested at Instacart

I assume so.

postwait · on Sept 21, 2016

Ummm https://github.com/keithf4/pg_partman has been around for a long time and is used all around the world... AND it doesn't require cron jobs. Much tighter solution

pacuna · on Sept 21, 2016

I also wrote a Rails gems that makes the create and read processes completely transparent with Active Record when using date partitions. In case someone is insterested github.com/pacuna/partitionable

sargun · on Sept 21, 2016

Can I ask why you decided to partition Postgres like this? It looks like you have time-series data based on the examples, and you wanted quick dropping, am I right?

akane · on Sept 21, 2016

That's correct. It allows us to manage the database size over time for tables where we can archive data after a specific period.

tomc1985 · on Sept 21, 2016

This is cool. Now if only it somehow magically allowed a unique index over the whole setup, it'd make my day!

elmigranto · on Sept 21, 2016

What do you mean? For uniqueness, simple scheme like "Partition ID + SERIAL" will be unique enough (or just guid or similar). For performance, table index is fine, since you already know which partition request is for.

tomc1985 · on Sept 21, 2016

The problem is the unique IDs are GUIDs generated independently (and offline) by the client and then passed to the database; we need to make sure that no event is processed twice.

Currently there's one big log table and a uniqueness constraint on the GUID field. I'd love to partition the table out but until I can identify a clean route to satisfying that requirement I'm keeping one large log.

Though, now that I think about it, maybe partitioning on the first character of the GUID string is feasible, as we can ensure uniqueness per-table then

mjevans · on Sept 21, 2016

I suspect the parent poster wants to (quickly) process a unique constraint on the table via an index over the involved columns.