More

pdeva1 · 2023-09-27T15:52:32.000000Z

1. can this be used without clickhouse as just a zookeeper replacement? 2. am i correct in that its using s3 as disk? so can it be run as stateless pods in k8s? 3. if it uses s3, how are latency and costs of PUTs affected? does every write result in a PUT call to s3?

zX41ZdbW · 2023-09-27T16:25:49.000000Z

1. Yes, it can be used with other applications as a ZooKeeper replacement, unless some unusual ZooKeper features are used (there is no Kerberos integration in Keeper, and it does not support the TTL of persistent nodes) or the application tests for a specific ZooKeeper version.

2. It could be configure to store - snapshots; - RAFT logs other than the latest log; in S3. It cannot use a stateless Kubernetes pod - the latest log has to be located on the filesystem.

Although I see you can make a multi-region setup with multiple independent Kubernetes clusters and store logs in tmpfs (which is not 100% wrong from a theoretical standpoint), it is too risky to be practical.

3. Only the snapshots and the previous logs could be on S3, so the PUT requests are done only on log rotation.

pdeva1 · 2023-09-27T17:49:16.000000Z

2. ok. so can i rebuild a cluster with just state in s3? eg: i create a cluster with local disks and s3 backing. entire cluster gets deleted. if i recreate cluster and point to same s3 bucket, will it restore its state?

zX41ZdbW · 2023-09-27T18:20:53.000000Z

It depends on how the entire cluster gets deleted.

If one out of three nodes disappears, but two out of three nodes are shut down properly and written the latest snapshot to S3, it will restore correctly.

If two out of three nodes disappeared, but one out of three nodes is shut down properly and written the latest snapshot to S3, and you restore from its snapshot - it is equivalent to split-brain, and you could lose some of the transactions, that were acknowledged on the other two nodes.

If all three nodes suddenly disappear, and you restore from some previous snapshot on S3, you will lose the transactions acknowledged after the time of this snapshot - this is equivalent to restoring from a backup.

TLDR - Keeper writes the latest log on the filesystem. It does not continuously write data to S3 (it could be tempting, but if we do, it will give the latency around 100..500 ms, even in the same region, which is comparable to the latency between the most distant AWS regions), and it still requires a quorum, and the support of S3 gives no magic.

The primary motivation for such feature was to reduce the space needed on SSD/EBS disk.

pradeepchhetri · 2023-09-27T16:15:10.000000Z

Sometime back, I tried using clickhouse-keeper as zookeeper alternative with few other systems like kafka, mesos, solr, Wrote some notes here: https://pradeepchhetri.xyz/clickhousekeeper/

alesapin · 2023-09-27T16:11:08.000000Z

1. Absolutely. clickhouse-keeper is distributed as a standalone static binary or .deb package or .rpm package. You can use it without clickhouse as ZooKeeper replacement. 2. It's not recommended to use slow storage devices for logs in any coordination system (zookeeper, clickhouse-keeper, etcd and so on). Good setup will be small fast SSD/EBS disk for fresh logs and old logs + snapshots offloaded to S3. In such setup the amount of PUT requests will be tiny and latency will be as good as possible.

pdeva1 · on Aug 7, 2023

1. dont producers now have much higher latency since they have to wait for writes to s3.

2. if the '5-10x cheaper' is mostly due to cross AZ savings, isnt that offered by AWS MSK offering too?

richieartoul · on Aug 7, 2023

(WarpStream founder)

1. Yeah, we mention at the end of the post the P99 produce latency is ~400ms. 2. MSK still charges you for networking to produce into the cluster and consumer out of it if follower fetch is not properly configured. Also, you still have to more or less manage a Kafka cluster (hot spotting, partition rebalancing, etc). In practice we think WarpStream will be much cheaper to use than MSK for almost all use-cases, and significantly easier to manage.

kstrauser · on Aug 7, 2023

How does the cost compare if follower fetch is properly configured?

pdeva1 · on Aug 7, 2023

1. what payload size and flush interval is that latency measured against?

richieartoul · on Aug 7, 2023

1. By payload size do you mean record size? They're ~1KiB 2. Flush interval was 100ms, the agent defaults to 250ms though I believe.

pdeva1 · on Feb 21, 2023

one of the reasons Bazel needs BUILD files with explicit inputs /outputs defined per file is to do fine grained incremental builds and test runs. so if i change say foo.c, i only need to recompile foo.obj and run ‘foo-tests’. Moon seems to take globs as input. Thus modifying even a single file inside ‘src’ dir will trigger rebuild/retest of the entire ‘project’

mileswjohnson · on Feb 21, 2023

Our configuration is using globs but under the hood we content hash all the files that match the glob, and only run/cache if the aggregated hash has changed.

For the languages we currently support, this is more than enough. Once we dive deeper into compiled languages (probably starting with Rust), we'll look into more granular reactivity and possible use something like sccache.

thundergolfer · on Feb 21, 2023

Bazel's glob does the same thing. I'm not sure what the OP means, Bazel's incrementality is aided by fine-grained build input specification, but it's incrementaility is at core a combination of deterministic build rules, storage of build rule execution history, and early-stopping.

kccqzy · on Feb 21, 2023

I don't see how that solves the problem mentioned by OP. If the build rule mentions foo.c, we only need to recompile that one object and relink. When you are using globs, changing one file changes the aggregated hash and then would necessitate recompiling every object file.

pdeva1 · on Aug 13, 2021

actually that timestamp is deceptive since the frame it opens at is the real jensen. this timestamp shows the cgi jensen and its very obvious due to the distance and animation that it is indeed cgi. https://youtu.be/eAn_oiZwUXA?t=3761

toxik · on Aug 13, 2021

And the super weird body language. It's cool that they tried but really this is so far from what can be done using video editing tools today.

EveYoung · on Aug 13, 2021

Is it just me or does it sound and look a bit out of sync?

drcongo · on Aug 13, 2021

It's not just you, it's laughably bad.

deelowe · on Aug 13, 2021

Wow. This is quite bad. It looks like animation from the late 90s.

pdeva1 · on Aug 13, 2021

note that this is only for 14 seconds of the video, when it is very obvious in the video that it is indeed a cgi figure.

pdeva1 · on Feb 23, 2021

I do think Facebook has the correct direction on this.

pdeva1 · on Feb 1, 2021

how do you know which steps to skip or rerun each run. in your react example, how do you know when to reinstall yarn to latest version and when to skip. https://layerci.com/docs/examples/react

colinchartier · on Feb 1, 2021

We monitor which files are read by which steps and map that back to the changes you've made automatically!

pdeva1 · on Feb 8, 2020

one thing that doesn't get talked about RDS is that network cost for replication of data for RDS multi A-Z deployments is free. Depending on how much you write to RDS, this cost can dominate cpu/memory costs on non-RDS installations.

pdeva1 · on Feb 6, 2020

what do you mean by ‘extra memory overhead’

ssambros · on Feb 6, 2020

if you run Redis you need at least 25% of extra RAM on top the instance memorysize if you want to avoid a lot of nasty OOM scenarios. Memorystore gives this memory by default and in aws you need to tweak the reserved-memory. https://docs.aws.amazon.com/AmazonElastiCache/latest/red-ug/...

pdeva1 · on Jan 15, 2020

on the comparison with Consolas section, it says consolas is wider than JBMono, and JBMono is taller. However, the lines of code in the example run longer in the JBMono version than Consolas. Why is that? is the comparison flawed?