>You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.
Have been using the free version timescaledb before to shard a 500 Million observation time series database. Worked drop-in without much hassle. Would have expected some benchmarks and comparisons in the post. I will for sure watch this...
indeed. Financial timeseries I was working with over 100 million new points, _per day_. For anything serious TimescaleDB is essentially not open source. Well done tembo.io crew -- will definitely give this a whirl.
What do you mean by "for anything serious it isn't open source"? I didn't see any red flags in the apache variant of timescale, just constant pleading to try their hosted option.
And as I understand that license, you are allowed to use Timescale for anything that doesn’t involve offering Timescale itself as a service. If you were using Timescale to process lots of time series transactions in your backend, it doesn’t seem to me like that would break the license.
(Which is to say that if, like Tembo, you’re offering Postgres as a service you do indeed have a problem. But for other use, should be fine)
The tricky thing with these licenses (BSL, SSPL, etc.) is that you can use them freely for internal stuff, but suddenly, if you make your product public (assuming it uses, e.g., TimescaleDB), things can get muddy. Everyone wants the flexibility to either open-source or commercialize a successful internal product in the future.
The problem is that, even if your app is not a mere frontend for TimescaleDB/Mongo/Redis, you can get sued, and you'll have to spend unnecessary time and money proving things in court. No one wants this, especially a startup owner whose money and time are tight. Also, even if your startup/company uses some of these techs, potential company buyers will be very wary of the purchase if they know they'll have to deal with this later.
I would assume TimescaleDb only sues if you money. In this case you can also afford a commercial license. If you hit big just contact them and tell there was a problem having a correct license earlier and you want to fix the situation.
There is 0% chance Timescale would sue mom’n’pop operation for breaking their license.
Your assumptions are based on good faith, and businesses are not run on that :) Imagine Oracle or IBM bought Timescale; that 0% chance suddenly increases dramatically.
I imagine legally would need a lawsuit to set a precedence, and if a license owner sets an over-reaching precedence of what a wrapper is, they risk losing customer trust and companies avoiding them like the plague.
e.g. timescaledb going after a tsdb as a service company offering tsdb behind a graphql wrapper vs timescaledb going after a financial company offering timeseries data collection and viewing.
I think a good border test would be, would timescaledb allow you to offer a metrics and logging service? technically you're offering timeseries database functionality, but it's in a constrained domain, and very clearly a different product, but still effectively CRUDing timeseries data.
That’s the internal use restriction. There is also the restriction more relevant to the use cases I’m talking about on Value Added Products which is “the customer is prohibited, either contractually or technically, from defining, redefining, or modifying the database schema or other structural aspects of database objects”.
Which is, basically, saying that you can do anything that doesn’t give your customers the ability to redefine and modify the database schema as long as you are creating a product that is adding value on top of timescale. Is any of this 100% clear? Not any more that legalese generally is, and of course probably wise to talk to a lawyer if you’re concerned about it. Timescale has made their intent with the license clear in the past with blog posts and such though.
Why would the number of data points correlate to budget? Perhaps there’s a chance if the business scales with paying users, but that’s unlikely to be true in finance.
I'm no market data specialist, but that seems like order of magnitude just top-of-book for US equities? The equity options market, at least, is orders of magnitude larger.
> 500 million is very little however. A regular table with a covering index would probably be fine for many use.
Totally agree. The problem here was it was some awkwardly designed geospatial measurement dabase (OGC sensor things) and we could not do much about the queries etc forms one ORM logic and alot of postgis stuff. It was great to have something as a drop-in replacement speeding up all the time series queries without much thinking. Actually what was still nagging us was all the locking going on due to the transactional semantics. Time series databases are probably much better at handling queries under constant ingress. We are total amateurs wrt database optimisation, but my guess is that many of those offerings are rather targeting the average use case.
I suppose it means by what is meant by an "observation". Is that an entire time series for a single property or a single point? Nevertheless, the number of points absolutely matters.
A regular Postgres database can give you 50-100K inserts per second and can scale to at least 1B rows with 100K+ individual series without much difficultly. If you know you will need less (or much less) than this, my suggestion is to use a regular table with a covering index. If you need more, use ClickHouse.
500 million observations, with 4-byte floats, is 2 GB. This is the kind of size that you can store uncompressed, in RAM, on a phone. It is hardly at the point where you require specialized time-series software at all.
It's about time that Timescale started getting what it deserves.
Sometime in early 2022, just as they raised their Series C, leadership decided that they had gotten what they wanted from the open-source community and TimescaleDB. They decided it was time to focus 100% on Timescale Cloud. Features began to become exclusive to Timescale Cloud, and the self-hosted TimescaleDB was literally treated as competition. At the same time, they managed to spoil their long-time PaaS partnership with Aiven, which was (and still is) a major source of revenue for the company. The reason? Everyone needed to use Timescale Cloud and give their money to Timescale, thus making Aiven a competitor. In short, with the raising of Series C, Timescale stopped being an OSS startup and began transitioning to a money hungry corporation.
In 2023, they conducted two rounds of layoffs, even though the company was highly profitable. Recently, Planetscale also carried out layoffs in a similarly harsh manner as Timescale, but at least Planetscale had the "courtesy" to address this with two sentences in their PR statement about company restructuring. Timescale did not even do that; they kept it all quiet. Out of 160 employees, around 65 were laid off. The first round of layoffs occurred in January, and the second in September. No warnings. No PIPs. Just an email informing you that you no longer work for them. Many of the affected employees were in the middle of ongoing projects. The CEO even mentioned in the in-house memo how they diligently worked on the September layoff throughout the summer. Interestingly, many of these employees were hired by competitors like Supabase and Neon. It’s worth emphasizing that this was not a financial issue—Timescale is far from having such problems. Instead, it was a restructuring effort to present nice financial numbers during ongoing turbulence in the tech market. (And yes, you guessed it! Timescale also hired their first CFO a couple of months before the first layoffs.)
You might say that it's just business, but as an OSS startup, I expect them to live by the values they have advertised over the years and treat their users and employees much better than they currently do. With this in mind, I welcome Tembo as a new player in the time-series market.
Footnotes: Timescale = the company. TimescaleDB = OSS time-series database developed by Timescale. Timescale Cloud = TimescaleDB managed by Timescale on AWS.
Knowing little about the company, it's almost certainly completely untrue to state the "the company was highly profitable"; you don't hire 160 people in an OSS-centric business and also turn a profit. You likely have a distorted understanding of the challenging nature of the burn rate in a changing macro environment.
It saddens me to see that we have generated so much ill will from you. It sounds like you were affected by our layoffs last year. You have every right to be upset. If you ever want to chat about this 1:1, you know how to reach me. I’d be happy to make the time.
To anyone else reading this: Some of what this person has shared is true, but some of it is not true.
I debated whether or not to reply. But one of my personal leadership values is “transparency”, so I thought I’d take the time to respond.
Yes, we conducted two rounds of layoffs in 2023. Like many tech companies, we hired a lot in 2021 and early 2022. Then, as the tech market began to correct mid 2022, we were forced to make tough decisions, including layoffs.
I take responsibility for the over-hiring and the layoffs. It brought me no joy to do them. But I feel a moral obligation to our customers to stay on the path of financial sustainability. I also feel a fiduciary obligation to our investors, some of whom are individuals, some of whom are large funds, who have all trusted us with their money. I feel a similar responsibility to current and former Timescalers who own equity in Timescale.
Sometimes, that means making tough decisions like this. But again, it was my call (not anyone else), and I accept full responsibility.
Yes, we did not publicize this news. Frankly, we thought we were too small for others to care. Maybe we got that wrong. But that decision came from a place of humility.
This is not true: “Just an email informing you that you no longer work for them.” Every affected person – except for a handful who were not working that day – was told the news individually, on a live Zoom call, that included at least one of our executives or a member of our People team. For the few teammates who were not working that day, we made many attempts to connect with them personally. I know the team tried their best to approach these hard conversations with care and empathy.
I was glad to see that a number of the affected individuals quickly found new roles at other companies in the PostgreSQL ecosystem, including at Supabase, Neon, and Tembo. These are good, smart people. The PostgreSQL ecosystem is better off with these people continuing to work to improve PostgreSQL.
The comments questioning our belief in open source are also not true. We still believe in open source. The core of TimescaleDB is still open source. Some of the advanced features are under a free, source-available license. Our latest release – TimescaleDB 2.15 – was just two weeks ago. Unlike most (all?) of our competitors, we have never re-licensed our open source software. This is something that is true for us but not for many others, like MongoDB, Elastic, Redis, Hashicorp, Confluent, etc.
Yes, we are building a self-sustaining open source business. Yes, it is hard and sometimes we get things wrong. But we have never stopped investing in our community. Today the TimescaleDB community (open source and free) is 20x larger than our customer base. And this community has more than doubled in the past 1+ year. We are also planning significant open source contributions for the next few months.
To the author of this post: I hope this response provides some clarification. And again, I’m available to chat one-on-one if you’d like.
To our open source and free community users, and to our customers: thank you for trusting us with your workloads. We are committed to serving you.
Finally, to the Timescale team, both current and former: thank you for all your hard work making developers successful. We are here to serve developers so that they can build the future. The road won’t always be easy or smooth. But we are committed, and we will get there.
Taking responsibility for laying off a bunch of people is thin gruel. What does that mean? Nothing. They showed you loyalty, you didn't return it. The over hiring is a symptom of bad management, so taking responsibility would be to demote yourself and take a pay cut. All the executive staff should have taken one and kept more people on. It is irresponsible and cruel to overhire and then dump people. I could say more, but I am sure this is falling on deaf ears.
Over-hiring is a calculus that shifts depending on macro market conditions. To pretend otherwise just isn't an honest assessment of what it means to be a business leader. In the zero interest rate era, it was irresponsible for business executives to not invest to ensure they had a competitive foundation or else be left-behind by those doing so... The rise of interest rates changed the market's appetite to invest in non-profitable growth companies, which in turn had a series of follow-on effects that required software executives to reverse course so as to optimize for their long term viability in the new climate.
Dumb question: why can't I just insert a bunch of rows with a timestamp column and indices? Where does that fall short? At a certain # of rows or something?
What does this let me do that can't be achieved with "regular PostgreSQL without the extension"?
I'm with you, I need to read up more on where timeseries could benefit, at work we have a PostgreSQL instance with around 27 billion rows in a single partitioned table, partitioned by week. Goes back to January of 2017 and just contains tons of data coming in from sensors. It's not "fast", but also not ridiculously slow to say e.g. "Give me everything for sensor 29380 in March of 2019".
I guess depends on your needs but I do think I need to investigate timeseries more to see if it'd help us.
Now give me a plot with the average of all sensors of model=A in region=B, grouped by customer, for the past 3 months, downsampled to 500 points. Assuming 1 sensor reading per minute.
I have no doubt sql can do it without too much trouble, but for a time series this is really an instant operation, even on a small server.
A time series will first find the relevant series and then simply for-loop through all the data. It takes just a handful of milliseconds.
Sql will need to join with other tables, traverse index, load wider columns. And you better have set the correct index first, in your case you also spent extra effort on partitioning tables. Likely you are also using a beefy server.
Adding to your comment, from my perspective (exploration geophysics)
> and just contains tons of data coming in from sensors.
it's also desired to, on the fly, deal with missing sensor data, clearly bad sensor data, identify and smooth spikes in data (weird glitch or actual transient spike of interest), apply a variety of running filters; centred average is basic, parameterised Savitzky–Golay filters provide "beefed up" better than running average handling .. and there are more.
It's not just better access to sequential data that makes a dedicated time series engine desirable, it's the suite of time (and geo spatial) series operations that close the deal.
Same boat here, but with DSP/SSP bidding statistics. Generating around 1 billion rows a day and still going strong. Single table, partitioned by week. BRIN index on timestamp, normal index on one column.
there are several good articles explaining this, especially on Timescable blog, but in short, without time partitioning and just index, at some given point the performance for reads and writes degrades exponencially.
I think what's meant here is windowing (partitioning the query) not partitioning the table per se. Though even with this strategy, you must manually create new partitions all the time.
This also isn't typical time-series data, which generally stores numbers. Supposing you had a column "value INTEGER" as well, how do you do something like the following (pseudo-SQL)?
SELECT AVG(value) AS avg_value FROM logs GROUP BY INTERVAL '5m'
Which should output rows like the following, even if the data were reported much more frequently than every 5 minutes:
The gold standard for this Druid at very large scale, or ClickhouseDB. Clickhouse has a lot of problems as far as modifying/scaling shards after the fact, while Druid handles this with ease (and the penalty of not being able to update after the fact.)
Looking at the comparison with Click Benchmark, they are almost pathetic in terms of performance. They cant even handle sub-second aggregation queries for 10M records. Compared that too even duckdb reading from parquet files.
Postgres is missing a proper columnstore implementation. It's a big gap and it's not easy to build.
One solution could be integrating duckdb in a similar way as pgvector. You need to map duckdb storage to Postgres storage and reuse duckdb query processor. I believe it's the fastest way to get Postgres to have competitive columnstores.
Great to see this kind of innovation. PostgreSQL is interesting while "core" was always Open Source and using very permissive Open Source library, there have been many proprietary and source available extensions, ranging from replication to time series support.
Now we see those Proprietary extensions being disrupted by proper Open Source!
Would this be a good extension when you want to load balancer log entries (status, response body, headers etc)?
I think a columnar database store would be more efficient than normal row-based databases? load balancer log entries could be considered something similar to analytics events.
QuestDB, TimescaleDB, and PostgreSQL are more about universal databases and don't work as well as Prometheus or VictoriaMetrics. And yes, Clickhouse will beat them both, but only by running queries on a huge amount of data. See also https://docs.victoriametrics.com/faq/#how-does-victoriametri...
> You may already be asking: “why not just power the stack using TimescaleDB?” The Timescale License would restrict our use of features such as compression, incremental materialized views, and bottomless storage. With these missing, we felt that what remained would not provide an adequate basis for our customers’ time-series needs. Therefore, we decided to build our own PostgreSQL-licensed extension.
It's much newer for one, so it's behind on features. But, we're working on adding new ones based on customer demand.
We want to build it with a PostgreSQL license using existing community extensions as much as possible and build custom stuff (still permissively licensed) only when necessary.
It's about time that postgres (and other databases) add native append-only tables. That doesn't make it timeseries, but it probably helps with the standardiziation and all the logic/access around it.
It is in the sense that it's a purpose built time series database. Once upon a time it was definitely among the best too, but IMO their sales and tech and everything strategies have been just massive clusterfucks. They also have the best metrics agent, by far.
They're on their third serious rewrite with breaking changes for users. They pulled features from the open source version (clustering) to have a higher moat for the Enterprise version. They deleted customer data permanently in their hosted version after sending two emails to billing addresses. Their marketing and product naming also sucks:
Tembo CEO here - we are targeting feature parity for Tembo Cloud w/ RDS as soon as possible, would love to have you give Tembo a try sometime, give us feedback :)
Have been using the free version timescaledb before to shard a 500 Million observation time series database. Worked drop-in without much hassle. Would have expected some benchmarks and comparisons in the post. I will for sure watch this...