Hacker News new | past | comments | ask | show | jobs | submit login
New PostgreSQL Interface for Cloud Spanner (cloud.google.com)
201 points by aleyan on Oct 12, 2021 | hide | past | favorite | 41 comments



The Cloud Spanner SQL frontend/analyzer has been open sourced by google as ZetaSQL. https://github.com/google/zetasql

It is amazingly good.

You give it textual SQL (+ schema + all your function definitions) and it returns a really clean logical query plan. It is also happy to do this via a protobufs so you can use it from languages other than C++. It is also tested and documented up the wazoo. It has been such a pleasure to work with.

Anyway, the big problem with ZetaSQL is that it is not a common SQL dialect.

It seems that the only reasonable way to do this PostgreSQL interface for Cloud Spanner is to add a second parser (and other extensions) to ZetaSQL. If I am correct, I really really hope they open source that part of ZetaSQL as well - it would be a massive step forward for open source SQL tooling.


Wow, thanks for that info, very useful. When you say “I really hope they open source that part”, do you mean the mechanism to plug other dialects in, or the PostgreSQL interface?


I mean the postgres parser (and semantic changes) for ZetaSQL. The zetasql parser is in a file called zetasql/parser/bison_parser.y, I strongly suspect they now have a file called something like zetasql/pgparser/bison_parser.y as well (and much more pervasive changes to support the deeper differences in the dialects).

This is the lexical structure and syntax docs for the new postgres inteface to cloud spanner:

https://cloud.google.com/spanner/docs/postgresql/lexical

And this is the zetasql lexical structure and syntax docs:

https://github.com/google/zetasql/blob/master/docs/lexical.m...

Notice that the new PG docs are an edit of the Zeta ones - evidence that my hypothesis is correct.


Looks awesome! A few data types not yet supported: https://cloud.google.com/spanner/docs/postgresql/data-types, like interval or array, which is a shame.

Is just like to mention how awful the Google cloud docs are on a slow connection. Each link does some weird SPA bullshit that seems to take way too long, and you seem to often end up in an inconsistent state where refreshing the page takes you to a completely different article. Then pressing back seems to re-load the current page you are on whilst it does more SPA bullshit.

Small, simple HTML doc pages that are fast to load please. Copy how AWS does it.


Google cloud docs are garbage on a fast connection, too.


Back in the dinosaur age, I built a WiFi hotspot billing system using PostgreSQL. MySQL had a larger following, but was less reliable and lacked transactions. PostgreSQL was a competitor to Oracle, which was seen as the undisputed leader for large-scale databases containing mission-critical business data.

Fast-forward to 2021 and I am not surprised at all that PostgreSQL now seems to be taking over everywhere. The ethic of the PostgreSQL team seems to revolve around correctness and reliability rather than features and speed. I congratulate the team for sticking to their mettle all these years and achieving such success.


> I am not surprised at all that PostgreSQL now seems to be taking over everywhere

To a large extent, Postgres didn't out-compete Mysql in mind-share by being a better database. Arguably it was a better DB in the Web 1.x LAMP days and everyone knew it - it just was SO hard to setup replication, which is where it still hurts (not for RDS though, which is another part of the story).

Mysql went out of its way to lose free users which was compounded when it was bought out by Oracle.

The death-blow of sorts for Mysql+Java was the GPL licensing of the jdbc binaries. This made Mysql a very sticky backend to use with a Spring or similar JVM based framework in general, because you had to convince someone that GPL is okay for a product that never leaves your building.

The postgres community on the other hand reaps the rewards of being very very friendly towards people who want to embrace its client protocols or even outright cut-pasting their code into a proprietary system which offers postgres compat but nothing else under the hood.


Until 2011~~ PG had a terrible replication / no official system so not so much "mission-critical business data".


> the PostgreSQL interface implements a core subset of PostgreSQL data types and DDL syntax.

> Familiarity and portability are the goals, not 100% compatibility

So, is there a compatibility matrix?

In general I see this as a good thing. It allows using Spanner without requiring a big rewrite should one want to migrate away from GC. Or using Spanner as a secondary data warehouse with the same queries.

But both of those do require a high level of conformance.


This page describes what is or isn't supported.

https://cloud.google.com/spanner/docs/postgresql/overview


Notably missing: JSON(B), triggers, sequences/serials, arrays, custom functions.


Definitely seems like a work in progress, which I suppose is why it's marked pre-GA.


and I'm out

Lack of JSON(B) is a complete no-go for me


jsonb has become the crutch on which I build everything. And I'm not going back.


Same. PG jsonb is so fantastic that it’s closer to exoskeleton than a crutch.


What do you use JSONB columns for? The only use case that comes immediately to mind is stuff like settings or config blobs for individual entities


I run a multi-tenant SaaS where every tenant can define custom properties on every entity. PG allows us to dump everything into a single jsonb column, and still have superb performance for filtering / analytics / searching etc. The backend codebase remains mostly agnostic to the data shape, and we rarely have to change the database during new releases.


Postgres jsonb + redis_fdw = you can JOIN your structured Postgres data against your unstructured app-layer Redis data. Not a good idea to write any code that does this—just move the Redis data into the DB, for crying out loud—but for BI use-cases where you’re writing one-off SQL queries and you need presentation-layer data that lives in Redis, it’s perfect.


PostgreSQL json functions allow me to dispense with the ORM. Just send and receive JSON (for regular table field types, note, not jsonb).


You know it’s funny because I’m looking into using the JSON type for spanner. Is that comparable to JSONB


Wow - I did NOT see this coming.

Crazy - and GREAT!

Cloud Spanner in my mind was always "cool" - but I never saw myself understanding how to use it.

I think for folks bridging from something like postgresql, this could also make spanner more accessible.

My own use cases can all be handled so simply with postgresql I don't see needing the spanner scale, but for others perhaps valuable.


Quick follow-up. They are now offering Spanner at < $700/month (which was old pricing I think for 1 node). Ie, you can do something like processing units in increments of 100, so maybe $70/month.

Kind of weird, google paying attention to what users want!

What's next, google home / assistant will support google work calendars? Alexa has supported google work calendars forever, but google's own devices did not - despite plenty of requests.

I always wrote google off as so disconnected from users / developers wasn't worth dealing with them, but this is interesting to see some changes that look user / developer directed (vs just wowing with google fu features with modest uptake).


Re your Assistant question, think this should be available by: https://support.google.com/assistant/answer/10265998

Do not know why it's not more marketed... a lot of people miss this!


This wasn't supported for years.

https://services.google.com/fb/forms/googleassistantforgsuit...

was where you could try to get into a beta.

Google Home, Home Mini Home Mox Home Hum Nest Mini etc did not support gsuite accounts during launch as far as I can tell.


Ah I see, yeah this is a newish feature and still doesn't support GSuite accounts as the main driver.

It just lets you link your consumer account to your GSuite (for calendars, etc.)

But as it seems so many people wanted it (including myself), just surprised it wasn't marketed


Is anyone able to comment on the extent to which Cloud Spanner and CockroachDB are in competition with each other? (CockroachDB is also wire-compatible with Postgres and originates as an implementation of something similar to Spanner)


Just being "Postgres" is not simple. Both Spanner+CRDB provides some custom domain specific language that helps the data distribute itself around the cluster (or the world). Or for example, change the primary key of a table, so that the database can change the distribution of a table without having any down time.

I feel like the usual "lift and shift" claims that Cloud providers claims never goes as planned, and it is not something that you can just easily find by just reading documentation. For example, only after I started using CRDB, I found out that case-insensitive indexes is not yet implemented.

But I have been using CRDB+k8s for >1 year now, and it has been a great database. Basically 0 downtime, with minimal effort, including database upgrades.


There’s also YugabyteDB which is built on actual Postgres 11.2, reuses most of Postgres itself.


They achieve the same goal, but cockroachDB usually relies on the NTP and Spanner makes use of specialized clocks on Google datacenter to be able to sync the multiple instances.


Both have managed versions but crdb has “run it yourself “ option as well as completely free version (missing chngefeeds and incremental backups i believe but other than that is pretty feature complete)


Spanner, CockroachDB and similar are really exciting, but the fact that none of them support stored procedures begs the question of whether it's inherently hard/impossible to do so in a distributed database, or whether that just hasn't been a priority...


Why would they prioritize stored procedures ? They’re pretty much seen as an anti-pattern by most people today, outside of very niche use cases


Procedures are very useful to enforce business logic in the database, rather than in the application. It’s like putting a higher level application-specific API on top of your database schema.

For Cockroach, it could help substantially with latency for multi-statement transactions, since the procedure could be executed on a database node closest to the leaseholder for the data. Executing a multi-statement transaction on the client means incurring per-statement latency between the client and gateway node, and between the gateway node and the leaseholder.


It's also a bad idea.


> They’re pretty much seen as an anti-pattern by most people today

That's a bold statement :)


I worked at CRL. There's nothing about the architecture of CRDB that makes it hard or impossible. The company has bigger fish to fry with limited product and engineering hours.



> To enable PostgreSQL tools that don’t know anything about Spanner, the PostgreSQL interface also includes an open-source adapter that runs as a sidecar proxy, close to a client.

I wonder why they decided to do this instead of building a postgres-compatible endpoint into the spanner service itself.


This is awesome, but what are the limitations? Unless I can use it with ORMs that I already tend to depend on (like Sequelize), it's probably a no-go. And I'm assuming no joins, or very limited left joins?


That’s very cool. Does anyone know if there is a good list of other databases that include a PostgreSQL interface?


I'm surprised people are applauding this. The net result of this support (and cockroach's) is that it will take mindshare away from Postgres which has worked hard for decades to build a loyal fanbase.

It's also step 1 of the e.e.e playbook, not that I'm suggesting Google want to extinguish Postgres, but it's clearly aimed at building their market share at the expense of Postgres's.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: