Hacker News new | past | comments | ask | show | jobs | submit login

Quantile Digest (Q-digest) or something similar is what I believe is desired here.

From what I understand it's a fixed size data structure, represents quantiles as a tree of histogram bands, pruning nodes with densities that vary the least from their parents to achieve error / size trade-off. They also have the property that you can merge them together and re-compress in order to turn second data into minute data, or compress more accurate (large) archival digests into smaller ones to say, support stable streaming of a varying number of metrics across a stream of varying bandwidth by sacrificing quality.

They're pretty simple because they're designed for sensor networks, but I think you could design similar structures with a dynamic instead of fixed value range, and variable size (prune nodes based on error threshold instead of or in addition to desired size).

If anyone knows of a time-series system using something like this, I'd love to learn about it.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: