I honestly don't know what time-series databases do that's particularly unique. I've worked databases for 20+ years and a date or datetime has always been an integral part of the dataset and thus everything to me is time-series. I always seem them compared against key-value stores or document-oriented databases or NoSQL platforms, which more speaks to people not knowing how to use the correct datastore in the first place than any particular feature of a TSDB.
Even looking at your benchmark queries, I'm confused what value it provides over a standard OLTP or OLAP setup.
It might help to think about time-series databases from the requirements they're addressing. "Time-Series Database Requirements" [0] is a good summary of the problem space.
The timeseries databases I have used are good (and fast) at answering queries like "mean value of sensor_1 for every 10 min bucket". It can answer this fast. It can handle that some buckets have 1000 points in them, some have 0 or 1. It can calculate the moving average, again correctly with possible missing/unevenly spaced values. It can calculate the rate of change (the deriviative) fast.
Often there are other time-related stuff in there as well, but I think the vast majority of use is fast calculation of "mean/max/first value of sensor(s) for X-second buckets".
> I honestly don't know what time-series databases do that's particularly unique.
I understand the "spirit" of your comment and I agree in general.
However something unique that comes to my mind about some TSDBs is automated aggregation of data points (using min/max/avg/whatever functions) into something less granular after that a metric's specific time becomes older than X.
By using a TSDB for example you'll be able to look at datapoints with a max resolution of 10 seconds for metrics collected during the past 7 days, but after that their max resolution will be (aggregated into) e.g. 60-seconds intervals, and so on.
I think that the theory behind it is that you're probably interested at the details for recent stuff, but the older the stuff gets the more you just want to look at the general trend without caring about the details. I can agree with that. In the end by doing these kinds of aggregations in theory everything should be faster & should use less storage (as for older data there are fewer data points).
I did use Graphite/Carbon for some years, but I didn't like a lot its architecture and had some performance problems => I've replaced it with Clickhouse (I'm not doing any kind of data aggregation) and that's using less space and is quicker (respectively it uses a lot less CPU & I/O) :)
Time series databases are actually a subset of OLAP databases. The main difference between time series databases and OLTP databases is the amounts of data stored and processed. While OLTP databases can process billions of rows per node, time series databases can deal with trillions of rows per node.
The main requirements for time series databases:
- Fast data ingestion (millions of rows per second).
- Good compression for the stored data, since the amounts of time series data is usually huge (trillions of rows per node). The compression also may improve query speed, since it reduces the amounts of data that needs to be read from disk during heavy queries.
- Fast search for time series with the given labels. For instance, search for temperature measurements across all the sensors in the given country with millions of temperature sensors.
- Fast rows processing for the found time series on the given time range. Usually the number of rows to process exceeds hundreds of millions per query.
Typical OLTP databases cannot meet these requirements.
Perhaps I'm wrong, but "timeseries" databases are typically some combination of LSM style append only logs and eventual consistency. In an ACID relational database, i'm not sure you can simultaneously write and read millions of rows per second? If you can I'd love to learn something new :)
You can write several million rows per second through indexing and storage while reading consistent views but it is not trivial. It requires a pretty sophisticated database kernel design even on modern hardware. LSM-style is not a good choice if you require these write rates. Time-series data models are relatively simple to scale writes for as such things go.
I would not want to try this on a traditional relational database kernel, they are not designed for workloads that look like this. They optimize their tradeoffs for slower and more complicated transactions.
They do you can do some pretty good optimizations if you know what type of data is being written to a table, and with Timescale, you are converting a specific table into a Timescale hypertable, which give you different tradeoffs from a standard Posgresql table. End result is really great performance for inserts and queries while maintaining ACID guarantees.
Even looking at your benchmark queries, I'm confused what value it provides over a standard OLTP or OLAP setup.