> I work at CockroachDB on the implementation of the Postgres protocol, and I'm ...

mjibson · on April 22, 2017

> Although drivers, especially libpq (no pipelining, no mixed binary/text results), are more frequently an issue.

The Go lib/pq driver is some pretty old Go code and has numerous similar problems. We try to keep it moving along but it's got some stuff that makes it hard to work on.

> One thing I was wondering whether you might also be interested is something like 'commit identifiers' (WAL LSNs in pg's case) for slightly relaxed consistency models. That allows things like committing on one node, and when later doing queries on replicas specify that data needs to have replicated at least up to that identifier.

CockroachDB is always serializable. It is linerizable on individual keys. We've discussed having a causality token before that could be passed to other transactions or connections that I think would allow for more linerizable guarantees. If I'm understanding your question correctly then yes, I think these would be very useful to us. (I've asked someone more familiar with our consistency model to address this more because I'm not confident in the accuracy of my reply.)

bdarnell · on April 22, 2017

(Cockroach Labs CTO) Yes, the commit identifiers sound like something we could use, as long as they're not tied too closely to pg's WAL LSNs. For us, the token would be a 96-bit hybrid logical timestamp, and we'd want semantics roughly similar to HTTP cookies (the server sends it back with the response, and then that client will include it on any future requests. With our current implementation the client would only need to send the token on new connections, but including room for it on a per-request basis seems like a good idea).

Also on the subject of the protocol, we've run into a need for a (server-initiated) ping message (https://github.com/cockroachdb/cockroach/pull/10188).

anarazel · on April 22, 2017

> (Cockroach Labs CTO) Yes, the commit identifiers sound like something we could use, as long as they're not tied too closely to pg's WAL LSNs.

There was some discussion around this before, and the idea was basically to just include some text-payload after COMMIT. PG would produce something like "COMMIT 343/3f0037a0", but the spec for would just be text. Some drivers might do something with the knowledge that it looks like an LSN, but that's hard to avoid.

> Also on the subject of the protocol, we've run into a need for a (server-initiated) ping message (https://github.com/cockroachdb/cockroach/pull/10188).

Maybe I'm missing something here (I admittedly only skimmed ticket and things it references), but shouldn't server triggered tcp keepalives be sufficient for this case? In postgres that's configured using tcp_keepalives_idle/tcp_keepalives_interval/tcp_keepalives_count. Several drivers (including at least libpq and pgjdbc) can do the same to protect against vanishing servers.

Doing keepalives on the pg wire protocol level has some disadvantages, but also some issues. Namely you've to be ready to send ping/pongs forth/back in almost any protocol state. Not necessarily easy if there's partial writes and such.

bdarnell · on April 22, 2017

Ah, right. We did turn on TCP keepalives so we probably don't need a protocol-level ping (although we often deploy behind haproxy, so tcp keepalives need to be configured at each hop)

anarazel · on April 22, 2017

Similarly, tcp level keepalive not necessarily sufficient, if there's intermediate "pgwire" protocol level connection poolers like pgbouncer. Nor do they protect against applications that are effectively hung somewhere.

So I do think "pgwrite" level pings make some sense, it's just not exactly straightforward to add them :/

anarazel · on April 22, 2017

> CockroachDB is always serializable. It is linerizable on individual keys.

My expectation would be that at some point you'd want to be able to optionally relax that (on a per sql transaction or table basis maybe?). E.g. on a "counter" table you instead might want do use something CRDT like, because you don't want to incur the latency of having to contact other servers. On an append only table without problematic constraints, it can be very interesting to do so too, particularly useful for logging & timeseries data. Not saying that has to have any sort of priority ;)