Hey Chad, PipelineDB co-founder here. PipelineDB certainly isn't intended to be the only tool in your data infrastructure. But whenever the same queries are being repeatedly run on granular data, those are the types of situations in which it often makes a lot sense to just compute the condensed result incrementally with a continuous view, because that's the only lens it's ever viewed through anyways (dashboards are a great example of this). Continuous views can be further aggregated and queried like regular tables too.
In terms of not requiring that raw data be stored, a typical setup is to keep raw data somewhere cheap (like S3) so that it's there when you need it. But granular data is often overwhelmingly cold and never looked at again so it may not always be necessary to store it all in an interactively queryable datastore.
As I mentioned, PipelineDB certainly doesn't aim to be a monolithic replacement for all adjacent data processing technologies, but there are areas where it can definitely introduce significant efficiency.
Great. Thank you for the clarification. What you just described definitely sounds like something PipelineDB would be great for. I can see it being especially useful for quickly standing up dashboards and maybe even datamarts when considering new data sources. I just wanted to make sure that I wasn't missing something.
So what's the best practice for when you want a real time dashboard but also want the ability to compare data overtime. E.g., ave. bounce rate this month vs last? Is Pipeline still ideal in this case?
Jeff (PipelineDB Co-Founder, here) - Yes, PipelineDB is great for this use case. One powerful aspect of PipelineDB is that it is a fully functional relational database (a superset of PostgreSQL 9.4) in addition to a streaming-SQL engine we have integrated the notion of 'state' into stream processing, for use cases exactly like this.
You can do anything with PipelineDB that you can do with PostgreSQL 9.4, but with the addition of continuous SQL queries, sliding windows, probabilistic data structures, uniques counting, and stream-table JOINs (what you're looking for here, I believe.)
In terms of not requiring that raw data be stored, a typical setup is to keep raw data somewhere cheap (like S3) so that it's there when you need it. But granular data is often overwhelmingly cold and never looked at again so it may not always be necessary to store it all in an interactively queryable datastore.
As I mentioned, PipelineDB certainly doesn't aim to be a monolithic replacement for all adjacent data processing technologies, but there are areas where it can definitely introduce significant efficiency.