Timescale raises $110M Series C

akulkarni · on Feb 22, 2022

(Timescale co-founder / CEO)

I just want to say that we wouldn't be here without the support, feedback -- and yes, even the honest critiques -- from the HN community. So thank you everyone.

As we like to say, we've come a long way in the past 5 years, but we're just getting started :-)

And we're hiring globally for our remote-first team!

https://www.timescale.com/careers

pcthrowaway · on Feb 22, 2022

As a user of Timescale, one of the things I find most lacking with Timescale (and postgres also to be honest) is good educational content on par with MongoDB University; structured courses that teach you database design concepts from first principles and then what problems postgres/timescale solve on top of them. Hands on experience working with datasets and an interactive way of learning more about the types of things you can do.

I realize Timescale and Mongo are very different things, but when I got professionally started with software 10 years ago, the MongoDB courses (and the stanford online Intro to DB course) were immensely helpful. Working with Timescale professionally now I'm often unsure whether I'm doing things suboptimally, e.g. making tables hypertables when a regular table might be better, flexibility and capabilities of indexes with hypertables, and application-facing tooling.

djk447 · on Feb 22, 2022

Totally fair and something that I'm actually forming a team to work on! We're starting with some very foundational material [1], that may well be review and it's not as formal / professional as Mongo University or the like, but I am going to be continuing this course and then we'll be iterating more from there. I'd really love some feedback and also your questions, ie what you want to cover or what you find confusing. You can leave comments on the video or in our community Slack channel[2] or forum[3]. Thanks for the feedback and I hope we'll be able to do some of that for you over the coming months!

[1]: https://www.youtube.com/watch?v=tLJm2oStD9w [2]: timescaledb.slack.com [3]: https://www.timescale.com/forum/

akulkarni · on Feb 23, 2022

+1

We agree that the world needs more PostgreSQL and TimescaleDB educational content, and welcome any help!

miohtama · on Feb 23, 2022

TimescaleDB has some of the best documentation available for open source and super helpful developer advocates. They are also happy to reach out to dev teams using TimescaleDB (even open source version). We recently did a guest blog post with them:

https://www.timescale.com/blog/how-trading-strategy-built-a-...

pcthrowaway · on Feb 23, 2022

(original commenter suggesting course-style educational content here)

I agree completely with this. The documentation is great and the community on slack has been incredibly helpful when I've posted there. Even still, I personally think a lot of people would benefit from course content like MongoDB provides.

akulkarni · on Feb 23, 2022

Thank you for the wonderful guest post (and the kind words!)

thunkshift1 · on Feb 23, 2022

What is the meaning of ‘from first principles’? Sorry I am not up to date on the terms

fourseventy · on Feb 22, 2022

Thanks for making Timescale, my company ThoughtMetric is an early adopter and we currently ingest about 1M rows of data per day using it. One highly requested feature that we desperately need is continuous aggregates that can join multiple tables together. Right now we have to use regular materialized views to accomplish that and it sucks.

akulkarni · on Feb 23, 2022

Thanks for letting me know. Have you filed a Github issue? If so, could you point me to it? Thanks!

https://github.com/timescale/timescaledb

fourseventy · on Feb 23, 2022

I appreciate the response! Here is the github issue https://github.com/timescale/timescaledb/issues/1446

akulkarni · on Feb 23, 2022

Thank you! I shared it with the team

sdesol · on Feb 22, 2022

Congrats and I was wondering if you can comment on the current team size? I'm looking at the number of contributors that have created pull requests within the last four months and it is shockingly low (in a good way). Based on the following:

https://oss.gitsense.com/insights/github?q=pull-age%3A%3C%3D...

It looks like there has only really been 7-10 full time contributors and for you to have raised what you have with such a small team is quite impressive. Is development happening elsewhere or is my hunch correct?

Edit: Thanks to feedback from mfreed, below is a more accurate picture of development activity:

https://oss.gitsense.com/insights/github?p=authors&q=pull-ag...

mfreed · on Feb 22, 2022

Hi! So the team is over 100 at this point, but engineering effort is spread across multiple products at this point.

The core timescaledb repo [0] currently has 10-15 primary engineers, with a few others working on DB hyperfunctions and our function pipelining [1] in a separate extension [2]. I think generally the set of outside folks who contribute to low-level database internals in C is just smaller than other type of projects.

We also have our promscale product [3], which is our observability backend powered by SQL & TimescaleDB.

And then there is Timescale Cloud [4], which is obviously a large engineering effort, most of which does not happen in public repos.

Interested? We're growing the teams aggressively! Fully remote & global.

https://www.timescale.com/careers

--

[0] https://github.com/timescale/timescaledb

[1] https://www.timescale.com/blog/function-pipelines-building-f...

[2] https://github.com/timescale/timescaledb-toolkit

[3] https://github.com/timescale/promscale ; https://github.com/timescale/tobs

[4] https://www.timescale.com/blog/announcing-the-new-timescale-...

sdesol · on Feb 22, 2022

Hey thanks for the insights! I've added all timescale repos for indexing and should have the bigger picture in a few hours. Thanks again for catering to my curiosity.

999900000999 · on Feb 22, 2022

How did you get started?

I've dealt with databases for a very long time, But I frankly find SQL extremely hard and I could never imagine forking postgres to improve it.

Is this your first company, did you personally write much of the early code, or did you hire a team to do so.

As a side note, I'm seeing an insane amount of movement when it comes to business to business VC funding.

What about an entertainment product ?

Hypothetically, if you were given a year off to come up with a new product, do you think it would be possible?

I would absolutely love to read a blog post where you discuss the challenges you've faced getting here.

avthar · on Feb 22, 2022

Timescaler here. Linking a few posts below [0][1] that answer the majority of your very good questions. The posts detail why and how TimescaleDB started and why the founders chose to build a time-series database on PostgreSQL.

[0]: https://www.timescale.com/blog/when-boring-is-awesome-buildi... [1]: https://www.timescale.com/blog/40-million-to-help-developers...

999900000999 · on Feb 22, 2022

Thank you !

I plan on reading this all in full.

Edit : Very impressive way to pivot , best of luck !

akulkarni · on Feb 22, 2022

Thank you :-)

There's a lot of lessons I've learned along the way

These are probably two of the top ones:

1. Solve a big problem that people have (even better if you personally have it)

2. Who you work with matters more than what you work on :-)

999900000999 · on Feb 23, 2022

So the big question I have is how do I get started ?

Looks like you had a small company, that was able to pivot and expand.

I'm planning on spending at least 3 months diving into my side projects, and then I'll try YC again.

Do I then just cold email various YC like funds until I fetch some interest, or should I focus on building my project.

gilbetron · on Feb 22, 2022

Did you originally call the database, "Scale DB"? If so I think we considered you for something in 2015/2016 or so!

akulkarni · on Feb 22, 2022

Nope :-) That's another product :-D

jabiko · on Feb 22, 2022

While I think that TimescaleDB is a great technology, the support experience of their Timescale Cloud offering was quite underwhelming.

In one occurrence we wanted to create a VPC peering between the a database hosted on Timescale Cloud and our Kubernetes cluster hosted on Azure. For this you need to put the Azure resource group name in a form on Timescale cloud.

Turns out our resource group name contained an uppercase character and the form on Timescale Cloud has a (broken) validation that required the name to be all lowercase. We couldn't easily change that name since that would have required us to re-create our production AKS/K8S cluster.

After contacting Timescale support (as a paying customer) the answer was basically: "Well, we require the resource group name to be lowercase, we can't change that, sucks to be you, bye"

We can live without that VPC peering, so we didn't push that further, but there are zero technical reasons for that restrictions and I would bet that its just a broken validation regex in their backend that they are unwilling to fix.

akulkarni · on Feb 22, 2022

Hi there, sorry you had a negative experience with Timescale Cloud.

It's true that on Azure we require both the resource group name and also the Virtual Network name to be in lowercase.

But Microsoft names are case-agnostic, so this should be okay.

We've had other customers with this issue before and converting the resource group name to be lowercase worked for them.

Also, sometimes technical restrictions exist for internal reasons that are not obvious / hard to share externally. :-)

That said, I shared your message internally and someone is looking into this. Stay tuned. More soon.

jabiko · on Feb 22, 2022

Hi, thanks you. I'm really grateful for your answer. The Azure documentation seems to hint at the resource group name being case insensitive, so I guess that could actually work. To be honest I don't know if we tried just using a lowercase version.

Again, I want to emphasize that I find it really great that you took the time to answer here.

tylerfontaine · on Feb 22, 2022

Hey there. I'm responsible for support here at Timescale. akulkarni asked me to take a look at this.

To be fair, I don't have a good answer on why the input box does require the resource group name to be lowercase, but I ran through the steps end-to-end, and as akulkarni said, the names are case agnostic on the Azure side, so you'd just need to input the resource group name lower case to create it successfully.

Sincerely, thank you for the feedback, as this is how we learn and improve. If you give this a shot and run into any trouble, please do let us know, and we'd be happy to help troubleshoot.

aeyes · on Feb 22, 2022

> Looking ahead, our goal is to keep innovating on top of PostgreSQL and to continue adding breakthrough capabilities

Does Timescale contribute back to PostgreSQL or do they truly only build on top of it? https://www.postgresql.org/community/contributors/ only lists two contributors and they both worked on Postgres before joining Timescale.

yieldgap · on Feb 22, 2022

On Twitter, they said they're building a team for upstream contributions https://twitter.com/acoustik/status/1496145349735559168?t=HG...

georgewfraser · on Feb 22, 2022

I would love to run a time-series benchmark against a good column store like Snowflake to see if purpose-built time series databases are actually faster. I have a sneaking suspicion the time scale databases are just reinventing the column store, and that an appropriate non-sabotaged benchmark would show this.

manigandham · on Feb 22, 2022

> "timescale ... are just reinventing the column store"

Not reinventing but reimplementing it for Postgres, which didn't have serious OLAP capabilities before. Lots of "newsql" systems are combining OLTP and OLAP by starting at one side and adding the other.

So far Timescale has column-oriented compressed storage and scale-out partitioning, and they're working on matching the compute part.

ren_engineer · on Feb 22, 2022

the main selling point is developer experience I think, rather than building out a bunch of stuff on top of a more general purpose tool you use a specialized DB and save time. They also have some benchmarks against Clickhouse for example

ashvardanian · on Feb 22, 2022

I have just looked up those charts on Timescales website [1] and I am a bit surprised. Never extensively used any of those DBs, but I have seen their sources and must say that expected bigger gaps [2]. Also worth looking: the Taxi Rides Benchmark on Postgres vs Clickhouse [3].

[1]: https://www.timescale.com/blog/what-is-clickhouse-how-does-i... [2]: https://www.pradeepchhetri.xyz/clickhousevstimescaledb/ [3]: https://tech.marksblogg.com/benchmarks.html

abraxas · on Feb 22, 2022

There is a lot of value in having a columnar storage that is fully ANSI SQL and supports all of the goodies that you get in the Postgres ecosystem.

NoSQL databases with their half assed SQL grammar implementations are a real pain to use in real applications where they often have to be handled differently in code vs the RDBMS because either their syntax is slightly different or their connection stack is incompatible.

riffic · on Feb 22, 2022

This company should have a wikipedia article at this point, right? Where is it?

closest I can find are these:

https://en.wikipedia.org/wiki/TimescaleDB

https://en.wikipedia.org/wiki/Draft:TimescaleDB

LoriP · on Feb 22, 2022

Timescale's Community Manager here. Unfortunately that's a longish story of a strongly opinionated Wikipedia editor.

riffic · on Feb 22, 2022

I think I might have bumped into that particular editor recently, lol.

Let me know if you want help in putting together a draft, I'm good at finding sources[0] to establish notability.

[0] https://www.google.com/search?tbs=bks:1&q="TimescaleDB"+-wik...

LoriP · on Feb 22, 2022

Thank you, I might take you up on that...and exchange experiences lol

riffic · on Feb 22, 2022

I'm asking to have Draft:TimescaleDB restored. I'll add citations as soon as that happens.

same username on WP.

LoriP · on Feb 23, 2022

Wonderful... I'll seek you out today, really appreciate your help. Maybe you will have the magic touch. (sorry for radio silence, timezones!)

riffic · on Feb 23, 2022

I just took it upon myself to get the old draft restored and to add sources to assert notability.

Moved draft into article namespace.

This isn't my first rodeo. We'll see how it goes.

mfringel · on Feb 22, 2022

How does Timescale handle lookup tables?

That is, "I have this seldomly-updated list of ~10000 things, and I'm going to need to join it against my time-series data."

With other time-series databases I've dealt with, it's an afterthought at best and the answer is typically "Enrich the data via flink/benthos/etc. on import and avoid using any kind of join."

Does Timescale's use of PostgreSQL circumvent this issue, both in terms of storage of lookup tables, and performance on join?

mfreed · on Feb 22, 2022

Yes, we support the rich set of PostgreSQL's JOIN operations, including against hypertables. It's generally smart enough to only apply these JOINs against the right subset of time-series data if you also have any time predicates (due to the way we perform "constraint exclusion" against our hypertable chunks).

There are other common queries related to what you describe, like a "last point query": Tell me the last record for each distinct object. Here, for example, we've built special query optimizations to significantly accelerate such queries:

https://www.timescale.com/blog/how-we-made-distinct-queries-...

manigandham · on Feb 23, 2022

There are plenty of distributed relational columnstores that can do joins. Timescale is bringing that to Postgres but you already have options from Clickhouse to Redshift.

mfringel · on Feb 23, 2022

Sure, but the ability to do so is separate from whether it's a good idea or not.

ishikawa · on Feb 22, 2022

This shows that awareness to time series data is huge today, unlike 10 years ago.

fabioyy · on Feb 22, 2022

i'm using timescale to store sensor/gps log ( 50 inserts/s - 24/7 ). after 2 months still very good )

hardwaresofton · on Feb 22, 2022

Have you written about this anywhere? I'm sure TimescaleDB would love to signal boost that post, and I separately would love to read about how you have it set up and the nitty gritty of the setup.

How are you dealing with backups/WAL and general DB administration? Are you using Timescale Cloud?

fourseventy · on Feb 22, 2022

Not OP but I run a Timescale instance with the same order of magnitude inserts/sec and have been running it for about 2 years now. The database is closing in on 1 TB on disk. We don't use Timescale Cloud, we just host it on a VM in Google cloud with 8 CPU's and 32 gigs of ram, which seems adequate for now. We do WAL backup using the WAL-G tool which backs the db up to google storage.

hardwaresofton · on Feb 22, 2022

Thanks so much for this. 2 years at that insert speed only being about 1TB of data is fantastic.

I've also had many discussions on backups (Barman, backrest, Wal-E/G) etc and always feel like I have to look it up afresh every time to get myself back up to speed on which one I should be using.

fourseventy · on Feb 23, 2022

Ya the backup solutions are complicated. I don't even remember the differences between those different tools. All I remember is that I spent about a day researching the tools and determined that Wal-G was the best one to use for our use case.

fabioyy · on Feb 23, 2022

I have done no optimization, no WAL partition, no slave (yet). i'm just using a single t3a.xlarge Aws instance with 1500 IOPS (EBS Volume).

we use 1500IOPS just to speed up our read-only jobs and reports.

i'm just using AWS EBS snapshot feature ( every hour ) for backup.

hardwaresofton · on Feb 23, 2022

Thanks for going into the nitty gritty, super helpful to know what your setup is like.

In the I’ve had terrible experiences with on-demand IOPS but now I feel like even 1000 provisioned (the least you can get) is too much for the workload I was running… the app was mostly idle but had bursts that would overwhelm

intellix · on Feb 22, 2022

We enquired about Timescale Cloud, to migrate from Cloud SQL as we could use it to solve a couple of problems in a simpler way and give us a chance to create other solutions when needed. We don't have anything in time series yet but were hoping to have it there to experiment and start dipping our toes but found it a bit expensive.

The pricing seemed more of an all or nothing investment and didn't make sense if we weren't ready to start using the extra functionality just yet. Was hoping that it would be similar to Cloud SQL and just cost extra depending on usage.

akulkarni · on Feb 23, 2022

Hi there, not sure how you did your price comparison, but typically with native compression, performance improvements, continuous aggregates, etc, you can go much further with the same resources on Timescale Cloud than Cloud SQL (or any other generic PostgreSQL provider). Did your math take those performance improvements into account?

tkinom · on Feb 22, 2022

I have written time series logging db with sqlite believe that approach has following advantages:

   System performance scales well with latest SSD HW.
      As compare to cloud base approach that is limited by network/cloud speed.

   One can store logs per day / week / year in separate db files as needed.  

   Backup of small db files for last few days/weeks are trivial with rsync.

Love to hear other pro/con arguments from folks who use Timescale type approach.

ignoramous · on Feb 23, 2022

Try timeseries with duckdb.org or Clickhouse Local? We use the former for analytical queries (queries over columns instead of rows), at which it is better than Sqlite3.

matesz · on Feb 23, 2022

Is using timeseries database like TimescaleDB a good fit for a chat platform like Slack/Discord/Mattermost? I've heard that Discord is using ScyllaDB, but I have quite a bit of experience in Postgres so it would be nice to use just that.

In any case, does anybody know what are the advantages of one over the other? Big thanks!

mparnisari · on Feb 22, 2022

Wonder if Timescale would be a good answer to https://stackoverflow.com/questions/70841804/is-aws-timestre...

akulkarni · on Feb 22, 2022

In our benchmarks (which you and others are welcome to replicate), Timescale vastly outperformed AWS Timestream:

https://www.timescale.com/blog/timescaledb-vs-amazon-timestr...

mfreed · on Feb 22, 2022

To replicate, please see the Time Series Benchmark Suite, which is open-source and has many vendor-contributed configurations:

- https://github.com/timescale/tsbs

- https://github.com/timescale/tsbs/blob/master/docs/timestrea...

mparnisari · on Feb 23, 2022

ashvardanian · on Feb 22, 2022

Insane! $110M towards yet another Postgres extension. Theoretical CS and hardware has advanced so much, but the people are using the same old boring approaches. Truly sad.

_joel · on Feb 22, 2022

True, why use proven technology with decades of production usage with your data when you can use a novel and theoretical CS paper implementations.

pella · on Feb 22, 2022

"boring" == less transactional surprise :-)

https://jepsen.io/analyses/postgresql-12.3

pella · on Feb 22, 2022

proposal for "Solving PostgreSQL wicked problems"

https://github.com/orioledb/orioledb/blob/main/orioledb-post...

rektide · on Feb 23, 2022

Nice share. Oriole DB updated the readme 17 days ago to say they'd release in February 2022, so I'll definitely be watching this space. These sounds like some really interesting striking-at-the-heart-of-it changes. I hope we have a long time to see which of these pan out & pay off big.

On a side note, nice interesting brief comments & shares elsewhere on this site. :thumbup:

ashvardanian · on Feb 22, 2022

Great report, but I am new B-Trees alone will not enough. The simplest common solution is to switch to LSM Trees for higher write throughput. Thats exactly what Yugabute does, by putting Postgres over RocksDB. Same way as Facebook uses MyRocks = MySQL + RocksDB.

Beltalowda · on Feb 22, 2022

What do you think better alternatives/approaches are?

dboreham · on Feb 22, 2022

PostgresSQL process-per-connection execution model is limiting.

ashvardanian · on Feb 22, 2022

Literally anything. There is so much to do better. Faster I/O, kernel bypass and async filesystem, new persistent data-structures, alternative lock-free concurrency resolution schemes…

Disclaimer: I am highly biased, as I am funding/researching/developing a DBMS myself. Out of necessity though, as we constantly hit bottlenecks in the persistent I/O layer. We are not selling or offering anything, but will soon share some fresh internal results on aforementioned topics.

In the meantime, here is a list of startups re-implementing mostly identical ideas: https://unum.cloud/post/2021-12-31-dbms-startups/

CyberDildonics · on Feb 22, 2022

I don't understand what is difficult or non trivial about these types of databases and when people try to explain it, it usually just gets more confusing. Filtering values over time is just the same operations that you would find in an audio editor or a one dimensional version of what you find in an image editor (weighted averages of values). I wonder how many customers could just use sqlite but don't know anything about computers and end up buying some sort of subscription to a 'new kind of database'.

The web page just drops as many buzzwords as possible - web3, crypto, nfts, monitor soil to fight global warming - it looks like a disaster to anyone who understands the basics of programming.

jtlisi · on Feb 22, 2022

Have you read the gorilla TSDB paper? https://www.vldb.org/pvldb/vol8/p1816-teller.pdf

It does a good job laying out why TSDBs are used and some of the tricks they leverage to store this type of data. See the requirements for the service layed out in the paper:

• 2 billion unique time series identified by a string key.

• 700 million data points (time stamp and value) added per minute.

• Store data for 26 hours.

• More than 40,000 queries per second at peak.

• Reads succeed in under one millisecond.

• Support time series with 15 second granularity (4 points per minute per time series).

• Two in-memory, not co-located replicas (for disaster recovery capacity).

• Always serve reads even when a single server crashes.

• Ability to quickly scan over all in memory data.

• Support at least 2x growth per year

Lots of organizations want to adopt an SRE/devops model and want a similar system. Also one thing you should know is that trying to accomplish this with traditional DBMS is usually possible but since it is not making specifically optimized trade offs it usually is more expensive and requires a lot of tuning/expertise.

Lots of organizations (even legacy companies) have a massive need for this kind of service. Also there are very cheap options out there than can handle the million metric use case for basically a <100$ a month is infra costs. The use case is definitely there and even if it's possible with traditional DBMS systems, it usually cheaper and more performant to use a dedicated TSDB.

jacobr1 · on Feb 22, 2022

If your data is small enough, then sure, any number of well tested data platforms will work for you.

The problem something like timescale tries to solve is dealing with "high cardinality." When you have many unique values the indexing approaches needed to ensure performance start becoming different. You'll run into write performance issues if need indexes on every single column, and every single combination of columns, and each column has a large number of unique values. While the common factor many of these datasets tend to share is that they are being constantly generated by some kind of sensor/probe/live-system, they tend to have a variety of other dimensions that are also high-cardinality.

CyberDildonics · on Feb 22, 2022

There are two different things here - the first is people not needing an elaborate solution because computers are fast and the second is that if someone does need a solution with less overhead, why is that difficult?

Values over time like audio is a one dimensional signal. Seeking is basic data structure stuff, filtering is basic signal stuff. There aren't going to be other dimensions like time, which makes the other values just other channels. If you need to combine dense values they can be not only filtered, but filtered into individual distributions.

People give abstract descriptions like you have here, but I'm just not seeing a difficult problem in all of this.

manigandham · on Feb 23, 2022

Relational databases are historically either OLTP like Postgres, MySQL, SQLite, etc; or OLAP like Vertica, Clickhouse, Greenplum, Redshift, etc.

The latter group is designed to analyze lots of data (calculating aggregations across billions of rows) and have developed features like storing data as columns, using compression, batch/vectorized processing, scaling out across multiple servers, and other techniques to get that performance. Timescale is an extension to Postgres that brings these capabilities to Postgres and is one of a very few relational databases that offer OLTP+OLAP in a single product.

The time-series niche is what they targeted first, and the product offers lots of useful features around time-related data, but it's also a generic analytical database offering at this point.

lopatin · on Feb 22, 2022

> I don't understand what is difficult or non trivial about these types of databases

Boy were you right about that

CyberDildonics · on Feb 22, 2022

I think if you had something to say you would have said it already. There are people who actually replied with something worthwhile.

ctvo · on Feb 23, 2022

> There are people who actually replied with something worthwhile.

And you dismissed it as still not being difficult.

The Timescale folks did all the hard work of finding product / market fit for you. If this problem is so trivial, you'd make a faster, cheaper, more reliable product and steal their lunch. Not only their lunch but Clickhouse, AWS Timestream, etc..

CyberDildonics · on Feb 23, 2022

No, I said I don't understand why it isn't trivial or if most of the people buying it don't realize they don't need a specialized database. Only one person was able to come up with anything, other people, including you just go extremely upset at the question without being able to answer it.

The number that stood out the most was 700 million values per second, which is about 11.6 million per second. This is still less than the data in a single 4k image. My guess is that none of this is a feat of engineering, but that it's just enough work that many people that need it won't create their own solution.

beanjuiceII · on Feb 22, 2022

promote that person to management

cleancoder0 · on Feb 22, 2022

SQLite does not support column based optimizations. Time series data is insanely compressible.