The lack of a semantic layer and join limitations are what made me pass on superset, but that was a couple years ago so maybe those features have been added.
I built my own semantic layer instead. I use this in production in my company but obviously use at your own risk as it's a one-man show.
This looks interesting for me, but I'd really like more detail about the architecture and deployment in the docs.
There is this:
> A final SQL query against the combined data from the DataSource Layer
> The Combined Layer is just another SQL database (in-memory SQLite by default) that is used to tie the datasource data together and apply a few additional features such as rollups, row filters, row limits, sorting, pivots, and technical computations.
But it leaves me with questions - how/when does this get populated? What other options are there besides in-memory SQLite? (I presume that's just a convenience for development and would use something else in production?)
Or is it just what Superset calls a 'metastore' i.e. data about the data, and the queries are run against the data source layer?
It first runs one or more queries against your DataSources in a drill-across query fashion. You can think of DataSources as one or more completely separate databases. You could have one mysql, one postgresql, one duckdb etc all in the same Warehouse (not saying this is common in production, just an example). Within those DataSource queries it's also joining all needed tables together for you, i.e. joining multiple tables in each database to meet your required grain.
It then takes the results of all those queries and combines that data in another layer which is currently an in-memory sqlite database. The purpose of that layer is joining the data for presentation as well as applying some additional features like rollups, technicals, formula fields, etc.
I'm not familiar with what superset does under the hood or exposes as an API so I don't know how to compare it, if there is some similar backend piece. But I suspect no part of superset is quite the same as this, based on what its front end can do.
Or from a comment elsewhere in this thread about Superset:
> Superset lets you join tables within the same database. If you want to do cross-DB joins, we have a new (beta) in-memory meta-DB that lets you do this
I built my own semantic layer instead. I use this in production in my company but obviously use at your own risk as it's a one-man show.
https://github.com/totalhack/zillion