Hacker News new | past | comments | ask | show | jobs | submit login
The SQL layer in CockroachDB (github.com/cockroachdb)
85 points by ivank on April 21, 2017 | hide | past | favorite | 62 comments



A lot of that looks quite familiar from working on postgres. Partially that's because it's "just the right way" to do some things, but I also wonder if there's some more looking at pg than just reusing the wire protocol?

PS: Should any cockroach contributors have opinions on limitations & evolution of the protocol: I'd be interested.


I work at CockroachDB on the implementation of the Postgres protocol, and I'm a maintainer of lib/pq, a Go client implementation of the protocol.

Overall I think it is a very good protocol. I can't come up with any gripes I have with it, nor do I remember any "it would be nice if" moments when adding support for some of the various parts of it. There's certainly some difficulty in some of the more complicated parts, like managing all of the type hinting back and forth for prepared statements. But in general I found the documentation to be complete and well done.

The one thing that we did have trouble with was figuring out the exact formats for all types when using the binary encoding format. We wrote some code to sniff real Postgres servers so we could figure out what was going on. It would have been easier, obviously, if it was documented, but this wasn't a huge deal since it is likely a rarely needed thing.


> I work at CockroachDB on the implementation of the Postgres protocol, and I'm a maintainer of lib/pq, a Go client implementation of the protocol.

Thanks for replying!

> Overall I think it is a very good protocol. I can't come up with any gripes I have with it, nor do I remember any "it would be nice if" moments when adding support for some of the various parts of it. There's certainly some difficulty in some of the more complicated parts, like managing all of the type hinting back and forth for prepared statements. But in general I found the documentation to be complete and well done.

Cool. I've more complaints than you in that case :) Although drivers, especially libpq (no pipelining, no mixed binary/text results), are more frequently an issue.

One thing I was wondering whether you might also be interested is something like 'commit identifiers' (WAL LSNs in pg's case) for slightly relaxed consistency models. That allows things like committing on one node, and when later doing queries on replicas specify that data needs to have replicated at least up to that identifier.

> The one thing that we did have trouble with was figuring out the exact formats for all types when using the binary encoding format. We wrote some code to sniff real Postgres servers so we could figure out what was going on. It would have been easier, obviously, if it was documented, but this wasn't a huge deal since it is likely a rarely needed thing.

Yea, I think that's a fair complaint. I wasn't around yet when the current binary format was devised (2003ish IIRC?), I don't know how we ended up deciding that code was the documentation for that.


> Although drivers, especially libpq (no pipelining, no mixed binary/text results), are more frequently an issue.

The Go lib/pq driver is some pretty old Go code and has numerous similar problems. We try to keep it moving along but it's got some stuff that makes it hard to work on.

> One thing I was wondering whether you might also be interested is something like 'commit identifiers' (WAL LSNs in pg's case) for slightly relaxed consistency models. That allows things like committing on one node, and when later doing queries on replicas specify that data needs to have replicated at least up to that identifier.

CockroachDB is always serializable. It is linerizable on individual keys. We've discussed having a causality token before that could be passed to other transactions or connections that I think would allow for more linerizable guarantees. If I'm understanding your question correctly then yes, I think these would be very useful to us. (I've asked someone more familiar with our consistency model to address this more because I'm not confident in the accuracy of my reply.)


(Cockroach Labs CTO) Yes, the commit identifiers sound like something we could use, as long as they're not tied too closely to pg's WAL LSNs. For us, the token would be a 96-bit hybrid logical timestamp, and we'd want semantics roughly similar to HTTP cookies (the server sends it back with the response, and then that client will include it on any future requests. With our current implementation the client would only need to send the token on new connections, but including room for it on a per-request basis seems like a good idea).

Also on the subject of the protocol, we've run into a need for a (server-initiated) ping message (https://github.com/cockroachdb/cockroach/pull/10188).


> (Cockroach Labs CTO) Yes, the commit identifiers sound like something we could use, as long as they're not tied too closely to pg's WAL LSNs.

There was some discussion around this before, and the idea was basically to just include some text-payload after COMMIT. PG would produce something like "COMMIT 343/3f0037a0", but the spec for would just be text. Some drivers might do something with the knowledge that it looks like an LSN, but that's hard to avoid.

> Also on the subject of the protocol, we've run into a need for a (server-initiated) ping message (https://github.com/cockroachdb/cockroach/pull/10188).

Maybe I'm missing something here (I admittedly only skimmed ticket and things it references), but shouldn't server triggered tcp keepalives be sufficient for this case? In postgres that's configured using tcp_keepalives_idle/tcp_keepalives_interval/tcp_keepalives_count. Several drivers (including at least libpq and pgjdbc) can do the same to protect against vanishing servers.

Doing keepalives on the pg wire protocol level has some disadvantages, but also some issues. Namely you've to be ready to send ping/pongs forth/back in almost any protocol state. Not necessarily easy if there's partial writes and such.


Ah, right. We did turn on TCP keepalives so we probably don't need a protocol-level ping (although we often deploy behind haproxy, so tcp keepalives need to be configured at each hop)


Similarly, tcp level keepalive not necessarily sufficient, if there's intermediate "pgwire" protocol level connection poolers like pgbouncer. Nor do they protect against applications that are effectively hung somewhere.

So I do think "pgwrite" level pings make some sense, it's just not exactly straightforward to add them :/


> CockroachDB is always serializable. It is linerizable on individual keys.

My expectation would be that at some point you'd want to be able to optionally relax that (on a per sql transaction or table basis maybe?). E.g. on a "counter" table you instead might want do use something CRDT like, because you don't want to incur the latency of having to contact other servers. On an append only table without problematic constraints, it can be very interesting to do so too, particularly useful for logging & timeseries data. Not saying that has to have any sort of priority ;)


This is awesome.

I was looking at using/extracting their pgwire implementation to provide an SQL-like interface to the in-memory state of a particular application we use, this document will make it way way easier.


You can also use Calcite if you're on the JVM:

https://calcite.apache.org


That's much nicer than hacking stuff out, unfortunately/fortunately (not sure anymore) I am not a java programmer, this particular app is in go.


I always find that when building architecture documentation after the fact, you want to re-factor or at least re-engineer a lot of the code base to account for areas where lines are crossed or decisions were made that might be performant and functional but lead to poor maintainability or opacity to all but the most involved project participants.

It would be great to see this document evolve into a guideline rather than a "this is how most of the code works today" document.


The name isn't a problem, it's smart and memorable.


It is. Associations and images create emotions which can't be easily overuled by rational thinking. That's why names like HyperDB still work for most people despite being obviously silly. Look at the marketing industry. I'm posting this to urge the guys behind it to rename it. The name simply create unnecessary friction. (But don't name it UnicornDB either. :-))


I strongly disagree. It conjures up thoughts that are unpleasant. That isn't a good marketing strategy.


Most of the bad things I can think up of about cockroaches are good in the context of databases. Replication, being hard to kill, being around for ages, collective decision-making..

As someone downthread said, it's not a consumer product and doesn't need to be marketed like one.


Yes just keep shouting that from the rooftops. I'm still glad my parents didn't name me Quasimodo.


Might as well name it after the most vile insect known to man, what could go wrong? I named my son gonorrheapenis after my struggles with fertility and nobody has respect for this, I don't understand why. This bacteria is very hard to stop and can make people sterile. I believed it was a good example of the power of my son and our database.

If you don't think the name is hurting adoption you live in fairyland. A place where startups get extra karma for being edgy and mentally challenged.


To add further arguments. Name your product CockSomething, what could go wrong?

cockroach is a word rarely encountered my non native English speakers. Expect them to only know the "cock" part, not sure what "roach" could mean.

Thus this database is often interpreted as CockDB :D


I had no idea people have such a problem with cockroaches. I've lived with them (developing country), and they're annoying - they make way too much noise while I'm trying to sleep - but they're certainly not an animal I'd pick to hate. That'd be mosquitoes, or scorpions. But I know plenty of people who love scorpions, so...


I've got a little transactional KV store in the wings and have been thinking about writing a Go frontend for it. I'd love full SQL support, but there's no way I'm going to implement my own query optimizer. How hard would it be to repurpose this layer to back onto a generic sorted KV store?

It this deeply tied in to the rest of cockroachdb's architecture, or does it talk through a simple generic KV store interface?


while there is an explicit delineation between the internal `kv` package[1] and the overarching `sql` one[2], your specific concern regarding a query optimizer is not strictly an isolated component at this moment in time. query optimizations, last I checked, was not an explicit pre-processing stage but rather done so when constructing the AST representation of the query with each node being an `iterator` as described in the Volcano Evaluation System paper[3]. there was some work underway to move towards an intermediate representation[4] to go through the standard query optimizations, no context into how far along this effort is.

[1]: https://github.com/cockroachdb/cockroach/tree/master/pkg/kv

[2]: https://github.com/cockroachdb/cockroach/tree/master/pkg/sql

[3]: https://paperhub.s3.amazonaws.com/dace52a42c07f7f8348b08dc2b...

[4]: https://github.com/cockroachdb/cockroach/pull/10055


I wrote the rfc for this "effort": there will be an IR in CockroachDB, just not right now. It's a lot of work and we're not going to do this in one go.


Distributed DBs are hard; bringing new contributors up to speed is really hard. They'll be able to hire some, but finding random OSS contributors is HARD. Very hard.


Their number 1 problem is the name.


I rarely comment on HN....but I just came to say that I effing love the name. Don't ever change it. Screw the VC's and middle managers.


Yes, screw the people who allocate capital and make purchasing decisions. Who would ever need their favor to found a successful enterprise software company...


If they base their decisions on things like that you've got a bigger problem there...


I really don't get this meme. Who cares? Lots of projects have weird names. It's a database, not a consumer packaged good brand. Surely those who are evaluating the engineering tradeoffs in database technology aren't basing their decision making on a name.


There is an inner bias and emotion reactions to names. It always weights in in the decision.

If you think you are not affected. Would you try a product named "octopussy"? Would you deploy it across your organization and gives presentation about it?

P.S. Octopussy is a real tool that solves a hard problem I forgot about.


How about git? I.e. mild British insult.


well personally I thought fleshing out a distributed key-value store atop thousands of raft groups whilst maintaining strong consistency guarantees would be their number one problem but I think we might have different definitions here.


Tell that to the MongoDB-gang..

"Mongo" is a norwegian/swedish(?) slang for a retarted person. It's derived from the word "mongoloid" which was an person with Downs or a similar syndrome. The latter word is not commonly used anymore.


In German as well.


And in Spanish


Actually, mongoloid (and similar) is the word. "Mongo" alone is not that bad and it's a prefix to many words.


Don't know why downvoted.

They can keep thinking the name is not a problem, or they can decide to grow up and change the name.

Only naive engineers who know nothing about branding/marketing will think this is OK. It's fine if they're just doing their own thing, but it's a whole different story if they're operating a business


I'm usually on the side of rationality and not judging a book by its cover, but I have to agree here. That name is extremely unfortunate.

First impressions actually matter in human interactions, and when you're named after one of the most disgusting pests known to man, that's definitely not helping your case.

Sorry, not everyone (especially management and others you have to explain it to who aren't rational super-men) get a good impression on a technology you're peddling... it's associated literally with cockroaches. Good luck selling that to them. What's next, "Cancer-DB"?


Next is aidsDB.

Because "aid" means to give assistance and that database is here to help you ;)


Because it's unrelated to the post - which is about the architecture - and because it's brought up every single time?


Yeah the constant obsession with the name is so weird that it really makes me wonder if there's some astroturfing campaign going on. I'm not usually a conspiracy theorist but I mean seriously, the name is not that weird and the tech seems very interesting. The first time I saw comments about it I thought, "huh, yeah I guess that is sort of a weird name now that you mention it". The second time u thought, "ok I've heard this before". The third time I thought, "hmm, this is pretty tedious". And this is the 7th or so time now and I just think, "wtf is going on here?".


Really? You think it's astroturfing that makes people dislike a name with "Cockroach" in it? Do you think there are many people think "cockroach" has positive connotations?

The name is one thing that anyone can have an opinion about, the average person can't really speak to the internals of the database, but anyone can look at the name and say "I don't like it and wouldn't want to say I have 'cockroachdb' in my stack".

The very fact that it comes up every time there's a story about the database should be a pretty strong sign that the name is overshadowing the product. If it was named after an animal that doesn't have nearly universal disdain like "Koala-db" or "dragonfly-db", or even "ant-db", then I doubt people would be bringing up the name every time it's mentioned


git is also a really bad name, and I've had awkward conversations with hiring managers about it in the past ("You want someone with experience in... what?!"). ("git" is a mild oath in British English, for those who don't know. I think Linus did know.)

But the reason that people don't complain about the name every time git is brought up is because git comes up in conversation a lot. CockroachDB is also a really bad name - I do agree - but the reason almost every thread about it on here is partially derailed by chatter about its name is, I think, simply because it doesn't come up all that often.


> git is also a really bad name,

That's stretching it a bit too far. Just because it's a "mild oath" in British english, doesn't mean it's bad name for rest of the world. I for one have never heard about it and don't think git is a bad name at all.

Cockroach, on the other hand, I don't know of anyone who would get a good image in their mind hearing it.

So you think if people keep saying cockroach a lot, suddenly cockroach will magically become some nice clean creature in people's mind? That's like saying if you keep saying "ShitDB", someday people will think it's a good name.


> So you think if people keep saying cockroach a lot, suddenly cockroach will magically become some nice clean creature in people's mind?

Something like that. People will associate the name with the database, not the insect. This does happen, over time you almost forget the the original meaning.

Some examples: when somebody says Oracle or Delphi, I'm thinking databases and Pascal successor, not ancient Greek stuff. When somebody says Pascal, I'm thinking programming language not the French mathematician. Fedora is a Linux distribution (and also a type of hat), Twitter is a social network (and also a type of bird vocalization) and so on.


You really think those analogy is relevant to this case? All the examples you pointed out all have either neutral or positive connotations. What you should have presented was any example of (if any) a successful tech brand with a name that brings strong negative feelings.

Cockroach not only has negative connotation, but the negativity scale is pretty strong.

Bring me a single person who will say they get good emotion when they hear "cockcroach". Sometimes you need to be stubborn about your business decisions, but if you're being stubborn about naming your business after a word most of the population will get negative feelings the first time they hear, then you're doing it wrong. It's not even a subjective thing, with a bad name, people are less likely to talk about your brand than if you had a positive name.


OK, how about: I would not like ants crawling my room the same way as I would not like cockroaches crawling my room. But I don't mind using Apache Ant.

Or, for a not-exactly-elegant name: Ubuntu 4.10 Warty Warthog. You are launching a brand-new Linux distribution, why not call it after a warty wild pig. Worked OK for them.


Ant and warthog do not universally evoke negative emotion. In many cultures ants are in fact associated with creatures that work hard, some people when they hear ants they think of the emergent behavior their colonies manifest. Warthog is more of a neutral term. Hearing that word gives me neither good or bad emotion, so it really depends on the context you use it.

"Cockroach" on the other hand, like i said, I know of absolutely NO one who would get positive emotion upon hearing that word. Just like "rape", or "nazi" don't evoke positive emotion for anyone.


I'm honestly curious: what does it matter to you? You keep comparing it to very extreme things and generally seem to care a lot, but I just don't get what the big deal is. "Cockroach" is just not such an offensive word that it's worth getting so worked up about.


I'm saying that if CockroachDB came up in discussion frequently, people would lose interest in complaining about its name, because after a period of time, the number of people who'd been exposed to it and previous conversations about its name would have eclipsed the number of people thinking, "Gosh, what an awful name! I wonder if anyone else has had this revelation, I better post about it right now!"


You're missing my point, because you're talking about AFTER this reaches that status.

I'm saying these people are doing their own technology disservice by making the brand difficult to talk about and take seriously.

It's their choice, but there's a difference between "stupid names don't matter, look at Google!" and "stupid names don't matter, look at FuckDB! RapeDB! CockroachDB!" (I know i'm going extreme with this but just trying to get the point across, that "stupid" names and "offensive" names are completely different things)

If you still disagree with me, I don't really have much intention of trying to persuade you nor care much about this company succeeding. I'm just saying it's a bad business decision to make it hard for your customers to advocate for your product. If there was a company with the same technology and they had a normal name, they would probably get much better traction (assuming that their tech is unique and useful).


I think it's weird that anybody would have any strong feelings about the name of a database project at all.


I have to think about cockroaches every time I see a post about this database. It is fair game.


Seriously? I reflexively think about specific, often negative, things when I read about a lot of topics. That doesn't mean those thoughts have to be shared. I think a good rough guideline is trying to only comment if that comment has a chance of advancing the discussion, answer somebody's questions, or asks for more information. Restating repeatedly made comments doesn't do that.


man, they are committed to spreading infestation...

"Roaches on Open Water! CockroachDB on DigitalOcean" https://www.cockroachlabs.com/blog/roaches-on-open-water-coc...

Next up - Annual cockroachdb conference: roach motel.


they should have named it after a fruit ... like mangodb or something


Somebody already thought of that one! https://github.com/dcramer/mangodb

proprietary auto sharting algorithm makes me laugh every time.


Mango IS tasty. Much more tasty than cockroach IMHO. Still though, I doubt the name is much of an issue.


I just had a horrible vision of mango and cockroach together :(

You guys are ruining my lunch time.


it was a word play on mongodb and apple :)

but i guess no one got the joke , because you know apple is like a famous tech company and mongodb is a popular nosql db

so apple+mongodb = mangodb .. got it .. no .. :(


croachdb.com is available...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: