These are just KPI projects as their performance are judged by meaningless metrics. OceanBase WAS open sourced, after the news reports and publicity, Alibaba deleted all code and replaced it with a Chinese announcement says it will no longer be an open source project.
Also, Postgres 12 introduced pluggable storage, which might help to implement a shared-nothing architecture without huge changes to vanilla Postgres (I haven't looked at how large their delta is)
Citus Data enables scale out, while also being a pure extension. That means you can upgrade Postgres like normal to the latest point release using whatever normal upgrade process you want (e.g. OS packages).
It has worked for a long time without the need for Postgres 12. However, the new APIs introduced in v12 did enable us to offer columnar compression as an option, which complements a lot of scale-out use cases.
OTOH the PolarDB specific changes seem to be contained enough that if you decide to run it in production, you can probably just apply most of the changes from the v11 branch yourself.
But I agree it's not a very good look to code-drop something on a .2 release when there's been 2,5 years of fixes.
Even if the conflicts are minor, it's going to be annoying to try to work it out. If you are hitting a specific crash, there's a good chance you can backport the fix cleanly, but I doubt you can just pull in all of the fixes proactively without some knowledge of the details of the fork.
I haven't really looked at the details... perhaps PolarDB already has many (or all) of the fixes since 11.2. Also I haven't actually tried a merge, I'm just assuming the difficulty based on the number of diffs (and my experience doing minor version merges in the past).
(Disclaimer: I work for Citus Data. Citus takes the approach of a pure extension, which means it works on unmodified Postgres, and minor upgrades typically don't interfere at all.)
For fully managed (similar to Heroku Postgres) Crunchy Bridge [1] supports replicas across regions including say EU to AU. Which would actually couple really well with Fly.io for the app side.
The simple solution is to run Postgres replicas in those regions and do local reads for your app while directing the writes to the master region. This works well for read-heavy apps, and you can also put the master in a geographically central location to help with write latency.
If you need cross-region multi-master then I recommend something like CockroachDB or Yugabyte that have regional distribution as a core feature.
Heroku is only available in EU and US east. Latency from Australia is somewhat unavoidable. You can maybe reduce it by using Cloudflare or similar to terminate TLS as close to your users as possible.
Cloudflare give different Anycast IPs for each plan and they don't always hit the local ingestion point. Some people shot themselves in the foot with free/pro plan in certain markets (India, aus etc..) I don't believe in May 2021 local ingestion (TLS termination) is happening on anything less than Business plan in MEL or SYD
It seems that Alibaba is going more open source than Amazon for their clouds services. That’s interesting and they may use this approach as a key differentiator.
They did it only after ES decided to abandon the Apache license. I doubt that Amazon would have done it if they were not forced.
Redshift is also Postgres based and Amazon never released something. So based on that metric Alibaba is more open.
There would be no open source Elastic had Amazon not taken it on. It was a smart move and benefits their balance sheet to do so. The point though is that move was greater than replicating existing open source work.
This is effectively required when open sourcing something that was previously internal. I've done it at a company. There could be company internal things that leak in old revisions of code and commit messages. Even if the latest commit on master is clean, it is really hard to know that every revisions, throughout the whole history is clean.
Good point. In such a case, I think it would still be nice if at least trying to split the total change into a few separate commits, that builds upon each other. That way the commit log could be a good start to look at for someone who wants to understand the code base.
A quick note on Crunchy Bridge, it is pure Postgres, not "based" on Postgres. We haven't forked or modified code at all. And some of the above (Timescale) are extensions which do not change Postgres, but hook into the extension APIs.
Yes this is an important distinction between and extension for Postgres vs a fork of.
The short-sightedness of forking boggles my mind every time. Oh, you don't want a team of hundreds of literally the best Postgres programmers in the world working for you full time all the time for free? Ok dude, have fun with your fork.
I'm still really upset that confluence bought and ruined pipeline db. It solved my problem perfectly and I haven't seen one yet that fits what I needed as well.
Do you also dream of loss of certain ACID semantics or crippling performance issues?
In the real world, it is extremely hard to provide all the same guarantees you get out of a single instance of <database vendor> if you turn around and spread it across the internet.
If you have super deep control over the physical & temporal environment around your system, you can cheat the rules a little bit (i.e. Google).
Yes I know that the perfect database cannot exist. I think some trade-off are possible though. Today I use postgresql with a primary and replicas, together with couchdb in the same application. They complement each other, but I think something good between is possible.
Yes. The core database is fully Apache 2 licensed with no strings attached: https://docs.yugabyte.com/latest/legal/. Maybe it will change if the company behind feels they're ripped off. Who knows, but for now, yes, you could.
CockroachDB has an explicit clause is the licensing: Yes, employees and contractors can use your internal CockroachDB instance as a service, but no people outside of your organization will be able to use it without purchasing a license: https://www.cockroachlabs.com/docs/v21.1/licensing-faqs.html....
NGL, I've had to explain to enough people by now that the "funny name" is because it was made by ex-Googlers with a view towards resiliency, as per the idiom that only cockroaches will survive global nuclear war.
True, it is developed by Microsoft and available as a service on Azure. It is also open source, actively maintained and improved, and it's a PG13-compatible Postgres extension that adds both distributed database capabilities and columnar storage. :)
I'd love to try this out. What I'm missing though is at least some Docker based deployment not involving building stuff from sources and reinventing distributed architecture myself.
Yugabyte is different DB technology than postgres but with PG compatibility layer, means you can use existing postgres query/application, the rest is different beast.
This one is more comparable with citus, an PG extension. Meaning it runs on top postgres. However from their github page they also offer patched version of PG which I suppose offer some tighter integration with the extension.
HA isn't really about 100% availability. Any single system that promises such is extremely likely to be misleading you somehow. Your in-flight query is going to get interrupted no matter how fancy your clustering is, and I struggle to even come up with hypothetical use cases where this is something you can't afford to have happen, ever.
All you need is that in the event of a failure the clustered system can still recover quickly enough (to a well-defined state!) that the application layer can deal with the transient failure without significant impact on users, maintaining the illusion of availability.
Such a rocket would likely have two or more independent systems that would each have to agree on the adjustment, so one of them temporarily failing would not pose a problem. Though I doubt there are any rockets using database queries as part of their control system.
In those kinds of systems I suspect the approach is to enumerate every possible scenario and prove that the system behaves correctly in all of them, and if you can't do that, the system may be too complex and you need to redesign it to be simpler so that you can guarantee that it does not fail.
You have N >= 3 nodes, or N >= 2 and a non-compute arbitrator. One of them goes down, stops responding. You still have quorum, data processes continue just fine. That's high-availability in a CP system.
How do you think the next dominant
Database platform will start? Is your expectation that innovation requires polish before hitting the (free open source) “marketplace”?
My expectation is that the next dominant platform will offer something unique which creates enough value that I'm willing to overlook the unfinished parts that deduct value.
When Neo4j and other graph databases went big in the past decade, they value they created: "finally! I don't have to faff around with RDBMS tables and Cobb purists to store and query my non-relational object-graph!" - despite the fact that Neo4j then (and still?) didn't support running serving databases concurrently or schema enforcement or even transactional atomicity (I think they fixed that recently?)
So in this context, what business-value does PolarDB add or create that makes it worthwhile to deal with its expected short-life?
> It extends PostgreSQL to become a share-nothing distributed database, which supports global data consistency and ACID across database nodes, distributed SQL processing, and data redundancy and high availability through Paxos based replication. PolarDB is designed to add values and new features to PostgreSQL in dimensions of high performance, scalability, high availability, and elasticity. At the same time, PolarDB remains SQL compatibility to single-node PostgreSQL with best effort.
CockroachDB seems to be a distributed database system written in Go which has implemented a Postgres query/wire protocol compatibility layer.
PolarDB is a Postgres fork actually using the Postgres codebase and extending it to a distributed database system. Maybe one day they can unfork because it's possible to implement PolarDB on top of Postgres as an extension and/or they contribute/get all their changes into Postgres core.
> OceanBase, the database of Alibaba's fintech company Ant Group, will be open-source soon, possibly as early as June 1, according to Sina Tech.
https://cntechpost.com/2021/05/27/ant-groups-in-house-databa...
Also the team published many papers about PolarDB.
https://scholar.google.com/scholar?hl=en&as_sdt=0%2C48&q=%22...