Hacker News new | past | comments | ask | show | jobs | submit login

I am looking for optimal storage engine(KV) which can store operational telemetry (temporarily) at source node. As we know, operational telemetry is generated frequently and need to merge similar operations frequently (little compaction). Once it reaches good amount of size (100mb), we can transfer it to dedicated time series database engines through various mechanisms. I am struggling to find a fast, write heavy, memory optimal storage for this.

RocksDB seems to fit few boxes but there could be much better solution as we don't need deletes/range scans sort of operations.

Any suggestions?




If you don't need range scans, have you looked at WiscKey approaches? https://www.usenix.org/system/files/conference/fast16/fast16...

There are a number of implementations including Badger (used in dgraph) and a variant that's "RocksDB for large-value use cases" (https://rocksdb.org/blog/2021/05/26/integrated-blob-db.html)


You could store it in a hash and flush it to disk using something like https://en.wikipedia.org/wiki/Cdb_(software), there are a few variants and implementations that might do what you want.


Victoria metrics can store data at local node and forward to central when network is functioning




Deletes (tombstones) shouldn't really get in your way if you don't need them. Similarly, range scans just come for free from SSTables being sorted. Archiving RocksDB SSTable files can be a decent strategy.

One thing to pay attention to is if your telemetry data is indexed by timestamp (i.e. you're writing to the keyspace in order), the compaction of immutable SSTables layers could be wasteful? Although, the author's nice example of non-overlapping SSTables key ranges suggests there may be minimal write amplification here too.


Pipe it to a write-ahead file on the source host, and read it back and store an offset for reading when you commit. That will be a very optimal solution.

But really, I would benchmark sqlite first...


Speedb is a great option Please check us out and join the most active RocksDB community - https://discord.gg/5fVUUtM2cG You can share your logs there or send them directly to us for analysis Contact me if you have any questions - dan@speedb.io


I am building my own KV store (www.Didgets.com) that can store 10 million KV pairs in about 4 seconds (using my Ryzen 5950X machine). Those pairs take up about 300MB of disk space and have about the same memory footprint when loaded. The values have a variety of data types (strings, integers, doubles, datetime, etc.) The software is still in beta but is available for free download.


I think Rocksdb would be a good choice, you really can't beat a LSM design for write-heavy workload. Depending on how the mem-table is implemented, they can write at practically RAM-speed. And although you might think that you don't need range scans now, it's a very useful for any kind of time series data.


> you really can't beat a LSM design for write-heavy workload

Depending on the write pattern, you actually can, because standard LSM-trees write the same data repeatedly, into each layer, and again if recompacting a layer. They make up for it by writing so much sequentially that the gains can outweigh the cost of writing the same data many times, and sorting having a cost no matter how you do it. However, if data is being written in batches in mostly sequential key order, then LSM-trees have a type of write amplification that's worse than efficient B-trees.

However, RocksDB deviates from LSM-trees when writing in bulk (or can be made to), to reduce or avoid this form of write amplification.

The optimal balance depends on the write pattern, but neither standard LSM-trees nor B-trees are consistently better than the other for all write patterns.


> Depending on how the mem-table is implemented, they can write at practically RAM-speed.

Proof needed I think. When last I looked I could get it to just under a gib/s. The disk itself could do 2-3 and ram is 20.

It’s fast but it’s definitely a long long way off from RAM speed. The reason is that the memtable is quite pricey to maintain - you’re having to constantly keep a non trivial amount of data sorted and that sort is expensive.


What kind of telemetry exactly?

Maybe I don't understand the problem, but can you not just store it in memory (e.g., with a map from key to current value), update (for instance, increment) it as you go, and whenever you want to take a timeseries value just push the set of current values back to a vector?


Likely needs to survive a crash (can't lose 100MB chunks)


Write Apache arrow parquet files directly? A lot of other dbs can ingest those directly anyway.



Any reason you can't shove it into Kafka?


Too many network calls. Technically it's feasible, operationally it's expensive for Telemetry usecase. Ex: Imagine we are capturing API telemetry. If there are 1000 API calls per minute per node, then we will end up somewhere 1000*10 calls per minute to Kafka. It's not efficient.


I can assure you from deep experience working on telemetry products that Kafka will handle this load like a champ.

Batching sends under the covers to reduce network round trips is all baked in.

This is also one of the things that most existing telemetry clients handle for you ie batching telemetry in memory and shipping it out on an interval, so there's a great deal of existing work you can draw from if not outright copy.


I didn't catch the part where "Parent wants storage at the source node.".

So if the goal is to eventually have the timeseries data merged back to a timeseries DB, and latency isn't too much of a concern then wouldn't batch writing to Kafka (Kinesis, etc) be tolerable?


Parent wants storage at the source node.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: