Local and distributed query processing in CockroachDB

sixdimensional · on June 8, 2017

Nice write up and discussion. I like the approach being taken by CockroachLabs, it feels a lot like how open RethinkDB was about their development - very pragmatic. What is being done here (distributed SQL engine) is a complex problem, and I for one welcome more open implementations and people working on the problem.

marknadal · on June 9, 2017

I had the pleasure of chatting with Spencer the other day, great guy. We're still very opposite in database tradeoffs that we believe in, but it joys me that at least both ends of the spectrum are covered in the OSS community. As I've expressed before, I don't think Master-Slave globally strongly consistent databases are the direction the future is headed, but at least if I'm wrong ;) we'll have Spencer & Co(ckroach) to save the day for Open Source (hearing his vision and emphasis on OSS was very refreshing and affirming too, especially after the last year of database announcements/failures/crippleware).

So they have definitely won my heart over, although I'll still make critiques where appropriate. This particular article was very well done, thoughtful, and insightful. So thank you! Being Postgres wire compatible is a daunting task though, one that to me seems unnecessary (we're implementing SQL on top of our decentralized graph database, but not at the wire level). But it once again showcases our polar opposite views. Obviously, their extra effort will result in remarkably better SQL compatibility, performance, and experience. So they are the hands up winner, but I'm curious to see the extent of full SQL use (versus approximations) in the industry over the next decade.

Congrats guys, great article.

lwansbrough · on June 9, 2017

pgwire compatibility is a huge benefit to us. Not all of us want to use this year's hottest language, so it's nice when we can use an existing library and immediately get to work. I would strongly recommend anyone who is making a database from scratch to either: write libraries for every popular language so people can use it (don't do this) or interface with one of the many existing protocols that has been thoroughly tested and has strong support across many platforms (do this.)

Maybe I'm out of my depth here, but I'm not sure your comparison of CockroachDB SQL being "wire level" whereas your SQL is "on top of a decentralized graph" makes much sense. CockroachDB is built on a key value store. More on that here: https://www.cockroachlabs.com/blog/sql-in-cockroachdb-mappin...

I suspect your technology, too, would be built on something similar? The difference being in how you implement the "front end."

marknadal · on June 9, 2017

That is a really good point. There are a lot of really incredible drivers for different languages out there, and reusing them is a major selling point.

Right, the difference being is no SQL would actually be sent over the wire. The SQL parsing happens on the client (so it is front end only), then it is converted to our wire graph spec, and then sent out. So it is more SQL emulation/approximation. Even though CockroachDB is key/value underneath, they are actually running SQL on top. Which is why their system would always be better than ours.

You sound really smart! If you are interested in these things, you should jump in on your favorite DB projects, or start your own!

state_machine · on June 8, 2017

In case anyone is interested in even more of the technical specifics, the original design RFC might be interesting too: https://github.com/cockroachdb/cockroach/blob/master/docs/RF...

redwood · on June 8, 2017

Is anyone using this software yet?

irfansharif · on June 9, 2017

[cockroachdb engineer] Baidu and Heroic Labs are ones we publicly announced in our 1.0 release[1], stay tuned for more~

[1]: https://www.cockroachlabs.com/blog/cockroachdb-1-0-release/

EGreg · on June 9, 2017

I wonder, why aren't graph databases used more often? Why is neo4j relatively alone?

It seems obvious to me that graph databases are much more parallelizable AND more scalable, since you are essentially able to break up parts of the graph into their own computing nodes quite easily.

The lookups are usually O(1) instead of O(log N) and instead of indexes and table scans to do joins you literally just traverse a graph at runtime. Plus you have more flexibility because instead of relational algebra you can literally run any code at any poit to walk a graph.

Why aren't they supplanting relational databases despite being faster and more parallelizable and more powerful?

felixgallo · on June 9, 2017

Using the word 'literally' doesn't magically imbue speed into a system. Traversing a graph -- how does that work in a transaction? Is it going to be quicker than striding a packed in-memory hash?

EGreg · on June 9, 2017

Simple. You store the exact pointer to related data, so you go and get it in O(1). In a join, you have to do a O(log N) search through an index. And all indexes usually have to be loaded into memory, to boot.

elvinyung · on June 10, 2017

> Simple. You store the exact pointer

How would that work in a scale-out, distributed cluster? What is a pointer? How do I figure out what machine an object is really located? What happens if that machine is down? What if I want to move the object/rebalance the cluster? How do I keep multiple copies of an object (for e.g. fault tolerance)? How do I figure out which copy is the right one?

How do I organize the pointers? Would I use a hash table? A tree? A graph? How would that data structure be distributed? Would every machine store a copy of the lookup data structure, or just some specific machines? What if those machines fail? How do I maintain copies? How do I keep the lookup data structure up to date?

mhuffman · on June 8, 2017

I swear this DB could solve all the technical problems in the world and it will still have an image problem ... unless that is the point.

dang · on June 8, 2017

"Please avoid introducing classic flamewar topics unless you have something genuinely new to say about them."

https://news.ycombinator.com/newsguidelines.html

atomical · on June 9, 2017

Do you have a list of classic flamewar topics?

dang · on June 9, 2017

No; such a list would encourage flamewars about the topics not on it and metaflamewars about the list itself.

atomical · on June 10, 2017

So basically the list is in your head? That's not very open.

dang · on June 10, 2017

HN is moderated! That means humans making interpretations and judgment calls. There's no way to make that 'open' in the sense I imagine you mean, but we try our best to be 'open' in the sense of being clear about what we're doing and answering questions about particular cases.

What we don't do is formalize everything, because a) that's impossible and b) what a nightmare it would be to try.

kainolophobia · on June 8, 2017

Assuming that the animal was chosen based on it's survival traits, I think C. elegans might be a better choice.

ElegantDB?

delinka · on June 8, 2017

I think it's a perfect opportunity to combine a rooster and a doobie into a logo.

logicchains · on June 9, 2017

Someone should write a formal proof of correctness in Coq.

sebastian · on June 8, 2017

Same discussion every time there is a post about CockroachDB. It's time to get over it.

Splatter · on June 9, 2017

I actually clicked on the comments to this article specifically to see, and laugh at, the flamewar I knew was going to ensue.

m_sahaf · on June 9, 2017

Someone has a solution to your problem:

https://github.com/tschottdorf/bikesheddb

sellaname · on June 8, 2017

Sounds to me like an opportunity for a dual trademark business model. Similarly to a dual license model where you receive the software with a restrictive license like AGPL whose only purpose is to make the software undesirable to companies and a paid permissive licene like MIT you could also use an undesirable trademark for the free version and a professional trademark that companies are willing to associate themselves with.

ryanmarsh · on June 8, 2017

[flagged]

anon12345690 · on June 8, 2017

"lol" is the only response to this. fine then, don't use it. why did you even click on this story?

it must be tough being so sensitive, let along being so irrational about judging a piece of tech on actual merit. big diff between cockroachdb and genital warts

ryanmarsh · on June 9, 2017

Thanks for your concern. I'll be fine.

knome · on June 8, 2017

This sort of reply seems so ludicrous to me. It's a bug. How can you guys use the net, knowing what spider-related metaphors lie hidden in its tangled web?

swsieber · on June 8, 2017

Easy. It has deeper roots in the internet for me than in insects. You know, multiple definitions.

Cockroach on the other hand is very strongly associated with insects and disgust.

I do think it interferes with adoption. I don't think it'll kill it though.

JusticeJuice · on June 8, 2017

I think the name's actually pretty good at selling it's key value, particularly to business people / management.

"It's called cockroach db - because it'll survive anything"

nilved · on June 8, 2017

That connotation seems obvious but I didn't pick up on it until months after I heard about CockroachDB. For one, a minority of people who know about cockroaches know about their resilience. For two, people who _do_ know that still know that cockroaches are disgusting. The reptile brain is going to react with revulsion even if the higher brain finds the metaphor.

nl · on June 9, 2017

The reptile brain is going to react with revulsion

As someone who knows someone who is very scared of reptiles but had a pet cockroach the irony of this statement amuses me greatly.

Can_Not · on June 9, 2017

For me, I've overlooked it because by the name I assumed it was meant to be a "worse mongoDB". I've found out recently it's more like Postgre.

ryanmarsh · on June 8, 2017

I stepped on a cockroach last night. Seemed pretty fragile to me.

Can_Not · on June 9, 2017

The colony in your wall endured zero downtime.

ryanmarsh · on June 8, 2017

It's not called the World Wide Puss Filled Spider Bite, it's called the World Wide Web, I honestly never thought of the spider connotation till you mentioned it.

_pmf_ · on June 9, 2017

To be fair, in Ye Olden Days, web crawlers[0] were called spiders.

[0] Of course, web crawlers are no longer called web crawlers, either