1. can this be used without clickhouse as just a zookeeper replacement?
2. am i correct in that its using s3 as disk? so can it be run as stateless pods in k8s?
3. if it uses s3, how are latency and costs of PUTs affected? does every write result in a PUT call to s3?
1. Yes, it can be used with other applications as a ZooKeeper replacement, unless some unusual ZooKeper features are used (there is no Kerberos integration in Keeper, and it does not support the TTL of persistent nodes) or the application tests for a specific ZooKeeper version.
2. It could be configure to store - snapshots; - RAFT logs other than the latest log; in S3. It cannot use a stateless Kubernetes pod - the latest log has to be located on the filesystem.
Although I see you can make a multi-region setup with multiple independent Kubernetes clusters and store logs in tmpfs (which is not 100% wrong from a theoretical standpoint), it is too risky to be practical.
3. Only the snapshots and the previous logs could be on S3, so the PUT requests are done only on log rotation.
2. ok. so can i rebuild a cluster with just state in s3? eg: i create a cluster with local disks and s3 backing. entire cluster gets deleted. if i recreate cluster and point to same s3 bucket, will it restore its state?
It depends on how the entire cluster gets deleted.
If one out of three nodes disappears, but two out of three nodes are shut down properly and written the latest snapshot to S3, it will restore correctly.
If two out of three nodes disappeared, but one out of three nodes is shut down properly and written the latest snapshot to S3, and you restore from its snapshot - it is equivalent to split-brain, and you could lose some of the transactions, that were acknowledged on the other two nodes.
If all three nodes suddenly disappear, and you restore from some previous snapshot on S3, you will lose the transactions acknowledged after the time of this snapshot - this is equivalent to restoring from a backup.
TLDR - Keeper writes the latest log on the filesystem. It does not continuously write data to S3 (it could be tempting, but if we do, it will give the latency around 100..500 ms, even in the same region, which is comparable to the latency between the most distant AWS regions), and it still requires a quorum, and the support of S3 gives no magic.
The primary motivation for such feature was to reduce the space needed on SSD/EBS disk.
Sometime back, I tried using clickhouse-keeper as zookeeper alternative with few other systems like kafka, mesos, solr, Wrote some notes here: https://pradeepchhetri.xyz/clickhousekeeper/
1. Absolutely. clickhouse-keeper is distributed as a standalone static binary or .deb package or .rpm package. You can use it without clickhouse as ZooKeeper replacement.
2. It's not recommended to use slow storage devices for logs in any coordination system (zookeeper, clickhouse-keeper, etcd and so on). Good setup will be small fast SSD/EBS disk for fresh logs and old logs + snapshots offloaded to S3. In such setup the amount of PUT requests will be tiny and latency will be as good as possible.
1. Yeah, we mention at the end of the post the P99 produce latency is ~400ms.
2. MSK still charges you for networking to produce into the cluster and consumer out of it if follower fetch is not properly configured. Also, you still have to more or less manage a Kafka cluster (hot spotting, partition rebalancing, etc). In practice we think WarpStream will be much cheaper to use than MSK for almost all use-cases, and significantly easier to manage.
one of the reasons Bazel needs BUILD files with explicit inputs /outputs defined per file is to do fine grained incremental builds and test runs. so if i change say foo.c, i only need to recompile foo.obj and run ‘foo-tests’. Moon seems to take globs as input. Thus modifying even a single file inside ‘src’ dir will trigger rebuild/retest of the entire ‘project’
Our configuration is using globs but under the hood we content hash all the files that match the glob, and only run/cache if the aggregated hash has changed.
For the languages we currently support, this is more than enough. Once we dive deeper into compiled languages (probably starting with Rust), we'll look into more granular reactivity and possible use something like sccache.
Bazel's glob does the same thing. I'm not sure what the OP means, Bazel's incrementality is aided by fine-grained build input specification, but it's incrementaility is at core a combination of deterministic build rules, storage of build rule execution history, and early-stopping.
I don't see how that solves the problem mentioned by OP. If the build rule mentions foo.c, we only need to recompile that one object and relink. When you are using globs, changing one file changes the aggregated hash and then would necessitate recompiling every object file.
actually that timestamp is deceptive since the frame it opens at is the real jensen. this timestamp shows the cgi jensen and its very obvious due to the distance and animation that it is indeed cgi.
https://youtu.be/eAn_oiZwUXA?t=3761
how do you know which steps to skip or rerun each run. in your react example, how do you know when to reinstall yarn to latest version and when to skip. https://layerci.com/docs/examples/react
one thing that doesn't get talked about RDS is that network cost for replication of data for RDS multi A-Z deployments is free. Depending on how much you write to RDS, this cost can dominate cpu/memory costs on non-RDS installations.
if you run Redis you need at least 25% of extra RAM on top the instance memorysize if you want to avoid a lot of nasty OOM scenarios. Memorystore gives this memory by default and in aws you need to tweak the reserved-memory. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/...
on the comparison with Consolas section, it says consolas is wider than JBMono, and JBMono is taller. However, the lines of code in the example run longer in the JBMono version than Consolas.
Why is that? is the comparison flawed?