>We would love to see someone in the etcd community integrate the etcd Jepsen tests directly into the existing etcd release pipeline.
I consider this to a be an issue of higher priority than any of the bugs they just found, because this will ensure preventable bugs don't crop up in the future. It's shocking to me that Jepson goes through all the effort and than very few projects build a permanent pipeline for it. It's debatable these bugs would've existed if a Jepson pipeline had been consistently in use from the 0.4.x days. I'm sure it's no simple task, but neither is a lot of the existing testing infrastructure for etcd.
> It's debatable these bugs would've existed if a Jepson pipeline had been consistently in use from the 0.4.x days.
I don't think it would have helped: the Jepsen tests I wrote in 2014 only checked single-key gets, puts, and CaS operations; the problems we found in this report were in watches and locks.
That's a good point that I hadn't remembered (first-class locks weren't totally baked back then), but I think the intent of my original comment holds true: we should have valued the Jepson test suite more and continuously leveraged and improved our usage of it, rather than doing one-off tests every now and then. As a result, the community has no idea if there were regressions between now and then. I will admit what I don't know: I have no idea if this was actually feasible or desirable for us or you at the time, but I'd feel more comfortable if every release of etcd, Zookeeper, etc... had some kind of Jepson stamp of approval on it in terms of API coverage. I'm pretty sure CockroachDB has tried this[0], but it's been a few years and I don't know how it turned out for them long term.
CockroachDB runs the Jepsen test suite nightly. We've been following along Aphyr's recent test additions (`multi-register` for instance, which immediately caught [0]), porting them over when appropriate. We definitely have work to be done incorporating the more DDL focused tests that tripped up YugaByte.
>We would love to see someone in the etcd community integrate the etcd Jepsen tests directly into the existing etcd release pipeline.
I consider this to a be an issue of higher priority than any of the bugs they just found, because this will ensure preventable bugs don't crop up in the future. It's shocking to me that Jepson goes through all the effort and than very few projects build a permanent pipeline for it. It's debatable these bugs would've existed if a Jepson pipeline had been consistently in use from the 0.4.x days. I'm sure it's no simple task, but neither is a lot of the existing testing infrastructure for etcd.