Breaking Through Scaling Barriers with Bigtable

jpgvm · on March 8, 2022

Not sure I would have opted for Bigtable here over sharded PostgreSQL because it unduly limits flexibility but within this constrained use case it works fine.

The big thing to remember which is covered by this article is your only real performant option to return multiple rows for BT is range scans so your keys should be setup to support this. If you need more than 1 index you are essentially shit out of luck - hence my preference to stay with PG as long as possible, even if that means sharding to multiple servers.

stangles1 · on March 8, 2022

Good points - sharded Postgres is more likely to be a better choice in most instances. I wouldn't be surprised even if sharded Postgres would have worked well for us, but as you mentioned, for this constrained use case, Bigtable works fine.

sitkack · on March 8, 2022

What is the recommended way to manage and interact with a shared Postgres cluster? Is it a fully connected mesh where every node running business logic talks to every Postgres node?

Would you use chain replication or a hash ring?

ntietz · on March 8, 2022

We did consider sharding PostgreSQL and opted for something that would be fully managed for us to minimize our management overhead. Managing sharded databases can be complicated business and that wasn't something we wanted to take on if we could avoid it.

ithkuil · on March 8, 2022

Planetscale does a good job at that, but for MySQL

jeffbee · on March 8, 2022

Small nit: bigtable does support transactions. They just have to be contained within a single row.

bbulkow · on March 8, 2022

It is a reasonable shorthand to say the only interesting transactions are multi row, and thus big table doesn't support transactions. More correctly, the only transactional guarantees are single row, sure. But saying it does support transactions is misleading.

Spoken as a guy whose database did not support multi row transactions and was always told that.

random3 · on March 9, 2022

It'd be reasonable to understand the underlying model, however. A "row" in BigTable is multi-level map that can grow large (100s MB). So you can essentially encode and entire star schema on that. The full structure is row -> column-families -> qualifiers -> timesamps -> values

stangles1 · on March 8, 2022

You're right, it does and we actually did do some performance testing that relied on single-row transactions early on in development but ultimately found slightly better performance with prefix/range scans (in addition to avoiding some limitations with retries and replication IIRC).

jvolkman · on March 8, 2022

Surprised they didn't mention Cloud Spanner at all.

ntietz · on March 8, 2022

I dug back through our design docs from when we designed this, and we largely chose not to use Cloud Spanner due to unknowns. We were more confident we could predict the read and write performance in BigTable (especially due to the constraints you get when you drop relational features).

stangles1 · on March 8, 2022

To be honest, it never even made it onto our radar, not for any particular reason though :)

IIRC, Spanner relies on precise timing to make certain guarantees, which is definitely relevant to our use case. I wonder how its write performance would stack up against Postgres and Bigtable.

bbulkow · on March 8, 2022

Why would it matter how they do it under the covers? You see correct transactions. The interesting tech in this space is aws redshift, in my testing a few years ago they dominated in price performance when you used the then new node types.

stangles1 · on March 8, 2022

>Why would it matter how they do it under the covers?

It doesn't really matter, I suppose I just got nerd-sniped recalling the details of Spanner's internals and their (somewhat superficial) relevance to the issues of timekeeping mentioned in the blog post :)

bgm1975 · on March 8, 2022

I’d be curious to know if cockroachdb was evaluated as a potential candidate. The various literature bits I’ve read seem to indicate its whole reason for existence is to solve these kinds of problems at scale while still providing some semblance of ACID compliance.

bigdubs · on March 8, 2022

CockroachDB targets a different workload, namely, lots of reads and writes of individual records, versus returning large chunks of even larger tables.

So it might help with inserts but would struggle with larger queries.

srinijdi · on March 8, 2022

Just out of curiosity, did you evaluate options like yugabyte db ?

stangles1 · on March 8, 2022

I have never heard of Yugabyte DB until just now, so no :)

One of our constraints that Bigtable met for us is native GCP support. I do see Yugabyte has a GCP provider though.

throwawayboise · on March 8, 2022

I would never build anything critical on Google's services.

johndfsgdgdfg · on March 8, 2022

Yes, I wholee-heartedly agree. Google invades our privacy, forces us to see ads and then hold users hostages for money. Google should be avoided at any cost.