SQL, Scaling, and What's Unique About Postgres

hcarvalhoalves · on Jan 12, 2015

I've read a recipe that used Postgres table inheritance to achieve sharding in a way it's mostly transparent to the client. The blog post read like a conversation between two people. Unfortunately I'm not finding the link anymore.

EDIT:

Found it at https://raw.githubusercontent.com/fiksu/partitioned/master/P... in case anyone find interesting.

ddorian43 · on Jan 12, 2015

How do you decide to put as sharding-key? Something like 'user_id'. What if a user becomes too big? Something like id=uuid. Then you lose the semi-free user_id index.

I believe the holy-grail of sharding is range-based (bigtable, hbase, hypertable etc).

needusername · on Jan 13, 2015

> I believe the holy-grail of sharding is range-based

You mean like Oracle has supported since version 8 or so?

ddorian43 · on Jan 13, 2015

If it is automatic (meaning you don't have to manually define the ranges)+ ranges can split/join, then probably yes.

Brian-Puccio · on Jan 13, 2015

That's super neat, thanks for posting.

slagfart · on Jan 13, 2015

Is this the same as the primary key hashmap thing that Teradata has been doing for many moons? Seems so. If not, why?

Trying to break down the mumbo jumbo in database product releases is a nightmare - it took me a couple of days with SAP HANA, which I think boiled down to in-memory with row and column selection per-table.

It feels like nobody is brave enough to offer an apples-to-apples comparison in their press releases.

samstave · on Jan 13, 2015

Can you give a quick brain dump on hana? good bad and ugly.

nl · on Jan 13, 2015

AFAIK Teradata doesn't allow you to intercept SQL after parsing like Postgres does.

I think you are trying to ask if the sharding example in this post is similar to "the primary key hashmap thing that Teradata has*. I think the answer is yes: teradata does hash-based sharding, and this is how pg_shard implements the same outcome.

EGreg · on Jan 12, 2015

I don't understand why the indexes aren't just as fast. Is it basically because smaller indexes are loaded into memory and the loader can reason about what it needs better? Because obviously the prefix used to load the indez can also be IN the actual searching of the index.

I think sharding at the app level is more efficient. I has helped facebook grow to where it is without worrying about whether MySQL can handle trillions and quadrillions of rows. The main constraint there is in the network topology in holdng MxN connections between M webservers and N database servers.

ddorian43 · on Jan 12, 2015

What would be good is an api for the storage engine so it can support clustered-table (the whole row is together with the primary-key). This will probably make it better for compression since things similar are sorted-together and probably make range-queries faster since sequential-disk-access (fractal-indexes, lsm-trees etc).

sitharus · on Jan 12, 2015

That doesn't sit well with Postgres' MVCC system, since between updates and VACUUMs you'd either need a secondary heap or the heap would need row-adjacent space reserved for the updates.

FWIW MS SQL Server has this problem with snapshot isolation. An inserted or updated row lives in TempDB until the DB finds time to push the rows back to the cluster (I can't recall exactly when), so the more insert/update heavy your workload is the more tempdb space you'll need. This also makes selects slower since it has to check the row map for ones that are current in tempdb.

Could work well for infrequently changing tables though.

doug1001 · on Jan 14, 2015

but if the snapshot is read-only? can the master rely on this config to avoid having to check the row map?

emcrazyone · on Jan 12, 2015

@ddorian43 I agree but I'm also wondering if perhaps I missed something?

From the article:

"SQL means different things to different people. Depending on the context, it could mean transactional workloads, short reads/writes"

Really?

I thought SQL was just the query language on top of a database. All the transactional safety has nothing to do with SQL but more of a database technology topic like is the database ACID compliant.

TLDR but from the first few lines of the article, it seems like the author is confusing SQL or expecting it to do something it was never intended.

zaphar · on Jan 12, 2015

The author is using SQL in the colloquial sense where it stands in for ACID compliant Relational Database. He's not confused.

emcrazyone · on Jan 12, 2015

@zapher: Could please explain this comment:

"SQL doesn't have the notion of "distribution" built into the language. This can be added into the database, but it isn't there in the language."

Are we expecting SQL to need to support "distribution." ??

I write database software on a very small scale with low transaction hits in your classic sense (select, update, etc..) and never needed sharding but one thing that has recently come up in my world is the need for data to be mirrored at geographically separate locations for both selects (reporting) and updates.

I used software and database technologies to over come these but never any crafty SQL calls. Like inserting into two or more databases to keep them in sync...

Are the DB engineers of the world trying to progress SQL to overcome these things at the SQL level?

One Example: Oracle and Postgres both have streaming replication that has nothing to do with SQL (yes I know you can use SQL commands in postgres to start & stop it but it's postgres specific) but as far as I know, these are replication technologies below SQL.

So I was genuinely curious or thought I was missing something; I still kind of do as I've not stayed very current with the state of the art.

zaphar · on Jan 12, 2015

I'm certainly no expert but I can take a stab at it.

I think the author was talking less about the syntax and more about the semantics of SQL there. Specifically the query plan. For instance in a SQL statement there is no indication which fields are indexed. You don't even have to use a primary key to find the data. So the database has to take both the data you provide in your AST as well as data about the tables and fields you specify to figure out where to find your data in a distributed database.

Contrast that with the redis example where the client must specify a key that redis then knows how to map to a shard. The logic is much simpler and easier to implement.

Redis has shards builtin to the method you use to query it which allows Redis to simplify the logic of distributed queries.

SQL does not have a notion of sharding builtin in to it so this complicates the query plan logic when it comes to sharding.

fubarred · on Jan 12, 2015

Streaming: it really depends on the scale of use-case: denormalized datawarehousing vs. financial tranasctions vs. MMORPG user state vs. etc. For some apps, a subscribe / notify pattern for responding to streams of messages (a-la MQTT) might be more workable.

An explanation of Postgres / Oracle similarities:

Postgres Plus AS (the commercial product) aims to be a drop-in replacement for Oracle, so it's no surprise that Postgres tends to overlap with regard to Oracle functionality. It also makes it safer to choose Postgres as a default starting point, because the migration options to Oracle / Postgres Plus / etc. are there.

http://www.enterprisedb.com/products-services-training/produ...

Disclaimer: unaffiliated pimping Postgres Plus because it likely subsidizes Postgres OSS dev. (Postgres OSS code is generally well-engineered.)

Someone · on Jan 12, 2015

SQL has three parts (at least):

- description of your data (create table basics. This is called Data Description Language or DDL in the standard)

- the queries that is the first thing people think of when they hear 'SQL'

- all kinds of extra info that controls efficiency, resilience, etc. ((partial) indices, choice of storage engine, whether to materialize views, how to grow database files, etc.).

That last part could easily include ways to specify automatic replication (but probably wouldn't be standardized soon)

nl · on Jan 13, 2015

Some non-SQL query languages have explicit constructs that deal with distributed cases. For example, SPARQL can do federated queries using the SERVICE keyword. [1] gives an example.

[1] http://stackoverflow.com/questions/14980386/connecting-linke...

twic · on Jan 12, 2015

I was under the impression that the primary key was stored as part of the row. Or do you mean the primary key index? If so, then there is already clustering:

http://www.postgresql.org/docs/9.4/static/sql-cluster.html

Although it's somewhat soft in that tables are only clustered on demand, and not kept in rigorously clustered order after that.

ddorian43 · on Jan 12, 2015

I mean the full row is stored in the primary-index.

Secondary-indexes contain the 'primary-key' and not the 'heap-position' of the row.

How mysql-innodb, mssql, tokudb have them.

rosser · on Jan 12, 2015

Index-organized tables (can) work very well if your access pattern is only ever by index (whether that's the pkey or "clustered" index, or a supplementary index that points to the leaf node of the clustered index on which the desired row(s) reside).

As soon as you find yourself doing a table (or "sequential") scan, however, you're incurring a tremendous amount of random IO, because you have to walk the index by breadth. Even if you're fully cached, or your db lives on SSD, that's a ridiculous amount of overhead.

The general rule of thumb, in my experience, puts the inflection point between whether it's better to use an index or do a table scan and then filter, is when you're >= 10% of the table. Less, the index will win; more, and the scan wins. It's surprising just how often you actually exceed that threshold in real world scenarios.

EDIT: See also the reply to your original comment, describing the need for additional row-adjacent (or supplementary heap) space to handle updates. Index-organized tables are far more sensitive to becoming unbalanced than indexes, and maintaining that additional space imposes a further IO penalty, because you still have to read those disk pages into memory, whether there's "live" data on them or not.

duaneb · on Jan 13, 2015

This is the approach by the distributed SQL engine F1. It uses a hierarchical table approach where root rows are guaranteed to be on the same shard.