I am looking for optimal storage engine(KV) which can store operational telemetr...

DocSavage · on April 20, 2023

If you don't need range scans, have you looked at WiscKey approaches? https://www.usenix.org/system/files/conference/fast16/fast16...

There are a number of implementations including Badger (used in dgraph) and a variant that's "RocksDB for large-value use cases" (https://rocksdb.org/blog/2021/05/26/integrated-blob-db.html)

howerj · on April 20, 2023

You could store it in a hash and flush it to disk using something like https://en.wikipedia.org/wiki/Cdb_(software), there are a few variants and implementations that might do what you want.

liketochill · on April 20, 2023

Victoria metrics can store data at local node and forward to central when network is functioning

bbss · on April 20, 2023

Might want to look at LMDB (https://en.wikipedia.org/wiki/Lightning_Memory-Mapped_Databa...)

dboreham · on April 20, 2023

Also WiredTiger https://source.wiredtiger.com/11.1.0/index.html

seedless-sensat · on April 20, 2023

Deletes (tombstones) shouldn't really get in your way if you don't need them. Similarly, range scans just come for free from SSTables being sorted. Archiving RocksDB SSTable files can be a decent strategy.

One thing to pay attention to is if your telemetry data is indexed by timestamp (i.e. you're writing to the keyspace in order), the compaction of immutable SSTables layers could be wasteful? Although, the author's nice example of non-overlapping SSTables key ranges suggests there may be minimal write amplification here too.

winrid · on April 20, 2023

Pipe it to a write-ahead file on the source host, and read it back and store an offset for reading when you commit. That will be a very optimal solution.

But really, I would benchmark sqlite first...

dkarpas · on April 21, 2023

Speedb is a great option Please check us out and join the most active RocksDB community - https://discord.gg/5fVUUtM2cG You can share your logs there or send them directly to us for analysis Contact me if you have any questions - dan@speedb.io

didgetmaster · on April 20, 2023

I am building my own KV store (www.Didgets.com) that can store 10 million KV pairs in about 4 seconds (using my Ryzen 5950X machine). Those pairs take up about 300MB of disk space and have about the same memory footprint when loaded. The values have a variety of data types (strings, integers, doubles, datetime, etc.) The software is still in beta but is available for free download.

dikei · on April 20, 2023

I think Rocksdb would be a good choice, you really can't beat a LSM design for write-heavy workload. Depending on how the mem-table is implemented, they can write at practically RAM-speed. And although you might think that you don't need range scans now, it's a very useful for any kind of time series data.

jlokier · on April 20, 2023

> you really can't beat a LSM design for write-heavy workload

Depending on the write pattern, you actually can, because standard LSM-trees write the same data repeatedly, into each layer, and again if recompacting a layer. They make up for it by writing so much sequentially that the gains can outweigh the cost of writing the same data many times, and sorting having a cost no matter how you do it. However, if data is being written in batches in mostly sequential key order, then LSM-trees have a type of write amplification that's worse than efficient B-trees.

However, RocksDB deviates from LSM-trees when writing in bulk (or can be made to), to reduce or avoid this form of write amplification.

The optimal balance depends on the write pattern, but neither standard LSM-trees nor B-trees are consistently better than the other for all write patterns.

vlovich123 · on April 20, 2023

> Depending on how the mem-table is implemented, they can write at practically RAM-speed.

Proof needed I think. When last I looked I could get it to just under a gib/s. The disk itself could do 2-3 and ram is 20.

It’s fast but it’s definitely a long long way off from RAM speed. The reason is that the memtable is quite pricey to maintain - you’re having to constantly keep a non trivial amount of data sorted and that sort is expensive.

foota · on April 20, 2023

What kind of telemetry exactly?

Maybe I don't understand the problem, but can you not just store it in memory (e.g., with a map from key to current value), update (for instance, increment) it as you go, and whenever you want to take a timeseries value just push the set of current values back to a vector?

wnolens · on April 20, 2023

Likely needs to survive a crash (can't lose 100MB chunks)

digikata · on April 20, 2023

Write Apache arrow parquet files directly? A lot of other dbs can ingest those directly anyway.

ddorian43 · on April 20, 2023

Check how https://github.com/netdata/netdata does it.

RhodesianHunter · on April 20, 2023

Any reason you can't shove it into Kafka?

nitinreddy88 · on April 20, 2023

Too many network calls. Technically it's feasible, operationally it's expensive for Telemetry usecase. Ex: Imagine we are capturing API telemetry. If there are 1000 API calls per minute per node, then we will end up somewhere 1000*10 calls per minute to Kafka. It's not efficient.

RhodesianHunter · on April 20, 2023

I can assure you from deep experience working on telemetry products that Kafka will handle this load like a champ.

Batching sends under the covers to reduce network round trips is all baked in.

This is also one of the things that most existing telemetry clients handle for you ie batching telemetry in memory and shipping it out on an interval, so there's a great deal of existing work you can draw from if not outright copy.

chrisjc · on April 20, 2023

I didn't catch the part where "Parent wants storage at the source node.".

So if the goal is to eventually have the timeseries data merged back to a timeseries DB, and latency isn't too much of a concern then wouldn't batch writing to Kafka (Kinesis, etc) be tolerable?

dboreham · on April 20, 2023

Parent wants storage at the source node.