Just wanted to say I am super impressed with the work TimescaleDB has been doing.
Previously at NGINX I was part of a team that built out a sharded timeseries database using Postgres 9.4. When I left it was ingesting ~2 TB worth of monitoring data a day (so not super large, but not trivial either).
Currently I have built out a data warehouse using Postgres 11 and Citus. Only reason I didn't use TimescaleDB was lack of multi-node support in October of last year.
I sort of view TimescaleDB as the next evolution of this style of Postgres scaling. I think in a year or so I will be very seriously looking at migrating to TimescaleDB, but for now Citus is adequate (with some rough edges) for our needs.
If you're looking at scaling monitoring timeseries data you may also wanter to consider more Availability leaning architecture (in the CAP theory sense) with respect to replication (i.e. quorum write/read replication, strictly not leader/follower - active/passive architecture) then you might also want to check out the Apache 2 project M3 and M3DB at m3db.io.
I am biased obviously as a contributor. Having said that I think it's always worth understanding active/passive type replication and the implications and see how other solutions handle this scaling and reliability problem to better understand the underlying challenges that will be faced with instance upgrades, failover and failures in a cluster.
Neat! Hadn't heard of M3DB before, but cursory poke around the docs seems like it's a pretty solid solution/approach.
My current use case isn't monitoring, or even time series anymore, but will keep M3DB in mind next time I have to seriously push a time series/monitoring solution.
We were successfully ingesting hundreds of billions of ad serving events per day to it. It is much faster at query speed than any Postgres-based database (for instance, it may scan tens of billions of rows per second on a single node). And it scales to many nodes.
While it is possible to store monitoring data to ClickHouse, it may be non-trivial to set up. So we decided creating VictoriaMetrics [2]. It is built on design ideas from ClickHouse, so it features high performance additionally to ease of setup and operation. This is proved by publicly available case studies [3].
ClickHouse's intial release was circa 2016 IIRC. The work I was doing at NGINX predates ClickHouse's initial release by 1-2 years.
ClickHouse was certainly something we evaluated later on when we were looking at moving to a true columnar storage approach, but like most columnar systems there are trade-offs.
* Partial SQL support.
* No transactions (not ACID).
* Certain workloads are less efficient like updates and deletes, or single key look ups.
None of these are unique to ClickHouse, they are fairly well known trade-offs most columnar stores make to improve write throughput and prioritize high scaling sequential read performance. As I mentioned before, the amount of data we were ingesting never really reached the limits of even Postgres 9.4, so we didn't feel like we had to make those trade-offs...yet.
I would imagine that servicing ad events is several factors larger scale than we were dealing with.
Previously at NGINX I was part of a team that built out a sharded timeseries database using Postgres 9.4. When I left it was ingesting ~2 TB worth of monitoring data a day (so not super large, but not trivial either).
Currently I have built out a data warehouse using Postgres 11 and Citus. Only reason I didn't use TimescaleDB was lack of multi-node support in October of last year.
I sort of view TimescaleDB as the next evolution of this style of Postgres scaling. I think in a year or so I will be very seriously looking at migrating to TimescaleDB, but for now Citus is adequate (with some rough edges) for our needs.