You give it textual SQL (+ schema + all your function definitions) and it returns a really clean logical query plan. It is also happy to do this via a protobufs so you can use it from languages other than C++. It is also tested and documented up the wazoo. It has been such a pleasure to work with.
Anyway, the big problem with ZetaSQL is that it is not a common SQL dialect.
It seems that the only reasonable way to do this PostgreSQL interface for Cloud Spanner is to add a second parser (and other extensions) to ZetaSQL. If I am correct, I really really hope they open source that part of ZetaSQL as well - it would be a massive step forward for open source SQL tooling.
Wow, thanks for that info, very useful. When you say “I really hope they open source that part”, do you mean the mechanism to plug other dialects in, or the PostgreSQL interface?
I mean the postgres parser (and semantic changes) for ZetaSQL. The zetasql parser is in a file called zetasql/parser/bison_parser.y, I strongly suspect they now have a file called something like zetasql/pgparser/bison_parser.y as well (and much more pervasive changes to support the deeper differences in the dialects).
This is the lexical structure and syntax docs for the new postgres inteface to cloud spanner:
Is just like to mention how awful the Google cloud docs are on a slow connection. Each link does some weird SPA bullshit that seems to take way too long, and you seem to often end up in an inconsistent state where refreshing the page takes you to a completely different article. Then pressing back seems to re-load the current page you are on whilst it does more SPA bullshit.
Small, simple HTML doc pages that are fast to load please. Copy how AWS does it.
Back in the dinosaur age, I built a WiFi hotspot billing system using PostgreSQL. MySQL had a larger following, but was less reliable and lacked transactions. PostgreSQL was a competitor to Oracle, which was seen as the undisputed leader for large-scale databases containing mission-critical business data.
Fast-forward to 2021 and I am not surprised at all that PostgreSQL now seems to be taking over everywhere. The ethic of the PostgreSQL team seems to revolve around correctness and reliability rather than features and speed. I congratulate the team for sticking to their mettle all these years and achieving such success.
> I am not surprised at all that PostgreSQL now seems to be taking over everywhere
To a large extent, Postgres didn't out-compete Mysql in mind-share by being a better database. Arguably it was a better DB in the Web 1.x LAMP days and everyone knew it - it just was SO hard to setup replication, which is where it still hurts (not for RDS though, which is another part of the story).
Mysql went out of its way to lose free users which was compounded when it was bought out by Oracle.
The death-blow of sorts for Mysql+Java was the GPL licensing of the jdbc binaries. This made Mysql a very sticky backend to use with a Spring or similar JVM based framework in general, because you had to convince someone that GPL is okay for a product that never leaves your building.
The postgres community on the other hand reaps the rewards of being very very friendly towards people who want to embrace its client protocols or even outright cut-pasting their code into a proprietary system which offers postgres compat but nothing else under the hood.
> the PostgreSQL interface implements a core subset of PostgreSQL data types and DDL syntax.
> Familiarity and portability are the goals, not 100% compatibility
So, is there a compatibility matrix?
In general I see this as a good thing. It allows using Spanner without requiring a big rewrite should one want to migrate away from GC. Or using Spanner as a secondary data warehouse with the same queries.
But both of those do require a high level of conformance.
I run a multi-tenant SaaS where every tenant can define custom properties on every entity. PG allows us to dump everything into a single jsonb column, and still have superb performance for filtering / analytics / searching etc. The backend codebase remains mostly agnostic to the data shape, and we rarely have to change the database during new releases.
Postgres jsonb + redis_fdw = you can JOIN your structured Postgres data against your unstructured app-layer Redis data. Not a good idea to write any code that does this—just move the Redis data into the DB, for crying out loud—but for BI use-cases where you’re writing one-off SQL queries and you need presentation-layer data that lives in Redis, it’s perfect.
Quick follow-up. They are now offering Spanner at < $700/month (which was old pricing I think for 1 node). Ie, you can do something like processing units in increments of 100, so maybe $70/month.
Kind of weird, google paying attention to what users want!
What's next, google home / assistant will support google work calendars? Alexa has supported google work calendars forever, but google's own devices did not - despite plenty of requests.
I always wrote google off as so disconnected from users / developers wasn't worth dealing with them, but this is interesting to see some changes that look user / developer directed (vs just wowing with google fu features with modest uptake).
Is anyone able to comment on the extent to which Cloud Spanner and CockroachDB are in competition with each other? (CockroachDB is also wire-compatible with Postgres and originates as an implementation of something similar to Spanner)
Just being "Postgres" is not simple. Both Spanner+CRDB provides some custom domain specific language that helps the data distribute itself around the cluster (or the world). Or for example, change the primary key of a table, so that the database can change the distribution of a table without having any down time.
I feel like the usual "lift and shift" claims that Cloud providers claims never goes as planned, and it is not something that you can just easily find by just reading documentation. For example, only after I started using CRDB, I found out that case-insensitive indexes is not yet implemented.
But I have been using CRDB+k8s for >1 year now, and it has been a great database. Basically 0 downtime, with minimal effort, including database upgrades.
They achieve the same goal, but cockroachDB usually relies on the NTP and Spanner makes use of specialized clocks on Google datacenter to be able to sync the multiple instances.
Both have managed versions but crdb has “run it yourself “ option as well as completely free version (missing chngefeeds and incremental backups i believe but other than that is pretty feature complete)
Spanner, CockroachDB and similar are really exciting, but the fact that none of them support stored procedures begs the question of whether it's inherently hard/impossible to do so in a distributed database, or whether that just hasn't been a priority...
Procedures are very useful to enforce business logic in the database, rather than in the application. It’s like putting a higher level application-specific API on top of your database schema.
For Cockroach, it could help substantially with latency for multi-statement transactions, since the procedure could be executed on a database node closest to the leaseholder for the data. Executing a multi-statement transaction on the client means incurring per-statement latency between the client and gateway node, and between the gateway node and the leaseholder.
I worked at CRL. There's nothing about the architecture of CRDB that makes it hard or impossible. The company has bigger fish to fry with limited product and engineering hours.
> To enable PostgreSQL tools that don’t know anything about Spanner, the PostgreSQL interface also includes an open-source adapter that runs as a sidecar proxy, close to a client.
I wonder why they decided to do this instead of building a postgres-compatible endpoint into the spanner service itself.
This is awesome, but what are the limitations? Unless I can use it with ORMs that I already tend to depend on (like Sequelize), it's probably a no-go. And I'm assuming no joins, or very limited left joins?
I'm surprised people are applauding this. The net result of this support (and cockroach's) is that it will take mindshare away from Postgres which has worked hard for decades to build a loyal fanbase.
It's also step 1 of the e.e.e playbook, not that I'm suggesting Google want to extinguish Postgres, but it's clearly aimed at building their market share at the expense of Postgres's.
It is amazingly good.
You give it textual SQL (+ schema + all your function definitions) and it returns a really clean logical query plan. It is also happy to do this via a protobufs so you can use it from languages other than C++. It is also tested and documented up the wazoo. It has been such a pleasure to work with.
Anyway, the big problem with ZetaSQL is that it is not a common SQL dialect.
It seems that the only reasonable way to do this PostgreSQL interface for Cloud Spanner is to add a second parser (and other extensions) to ZetaSQL. If I am correct, I really really hope they open source that part of ZetaSQL as well - it would be a massive step forward for open source SQL tooling.