The article recognizes that several time series databases already exist. They al...

busterarm · on Jan 25, 2018

Every time I read about some new solution to storing time series data, I always feel like I must be doing something wrong, but I've run into _zero_ problems yet.

Every time I have to store time series data, I never really need ACID transactions. I definitely don't ever need to do updates or upserts. It's always write-once-read-many. ElasticSearch has always been the obvious choice to me and it has worked extremely well. Information retrieval is incredibly robust and for times that I'm worried about consistency, I use Postgres's JSON capabilities and write first to there. You can have your application sanity-check between the two if you're worried about ElasticSearch not receiving/losing data.

I find it really hard to beat.

pfranz · on Jan 25, 2018

I had set up ElasticSearch for logging events and was thinking about using it to store metrics as well. It seemed like it ticked all the boxes and would work pretty well when I was messing around with it.

I ended up leaving that job and set up something specifically for metrics from scratch. I didn't compare 1 to 1, but this was much faster to query and had much lower requirements (disk space, memory, etc).

Both use-cases are using it as a time-series database, and there's no reason ElasticSearch couldn't work for both of those use-cases for many people. When using two different backends (one for events and another for metrics) I meant drawing a line when creating/logging data and two ways to query...which sucks for training. You also would have to maintain/archive two different data stores.

cevian · on Jan 25, 2018

We've had some clients try this, find consistency issues between elastic nodes or postgres and elastic and start using TimescaleDB as a way to simplify the stack and application. But obviously YMMV and this is highly dependent on your query needs.

akulkarni · on Jan 25, 2018

A few reasons why you may want to use TimescaleDB vs other time series DBs:

1. For some developers, just having a SQL interface to time-series data (while maintaining insert/query performance at scale) is good enough reason to use TimescaleDB. For example, when trying to express complex queries, or when trying to connect to SQL-based visualization tools (e.g., Tableau), or anything else in the PostgreSQL ecosystem.

2 For others, the difference is being able to fully utilize JOINs, adding context to time-series data via a relational table. Other time-series DBs would require you to denormalize your data, which adds unnecessary data bloat (if your metadata doesn't change too often), and data rigidity (when you do need to change your metadata).

To quote a user in our Slack Support forums[1]: "Retroactively evolving the data model is a huge pain, and tagging “everything” is often not viable. Keeping series metadata in a separate table will allow you to do all kinds of slicing and dicing in your queries while keeping it sane to modify the model of those metadata as you gain more knowledge about the product needs. At least that is something we have been successful with"

3. For others, it's some of the other benefits of a relational database: e.g., arbitrary secondary indexes.

One thing to think about with NoSQL systems (including every other time-series db) is that their indexing is much more limited (eg, often no numerics) or you need to be very careful about what you collect, as costs grow with cross-product of cardinality of your label sets (strings). We have heard from multiple people that they explicitly didn't collect labels they actually wanted when using [another time series db], because it would have blown up memory requirements and basically OOM'd DB. (Timescale doesn't have this issue with index sizes in memory, and you can create or drop indexes at any time.) You often hear about this as the "cardinality problem", which TimescaleDB does not have.

4. There's also the benefit of being able to inherit the 20+ years of work of reliability on Postgres.

There was no database that did all of this when we decided to build Timescale. (If one did exist, we would have used it).

You can read more about that story here: https://blog.timescale.com/when-boring-is-awesome-building-a...

And for more on our technical approach: https://blog.timescale.com/time-series-data-why-and-how-to-u...

[1] http://slack-login.timescale.com/

jnordwick · on Jan 25, 2018

I hate to be "that one" but:

> There was no database that did all of this when we decided to build Timescale. (If one did exist, we would have used it).

If you removed the based on Postgres part, except for KDB, IQ, Vertica, and a few others. I can definitely see a price argument though (i.e, the same but cheaper) as those all tend to be a expensive.