Show HN: Metabase, an open-source business intelligence tool

jtmarmon · on Oct 21, 2015

Sort of off topic, but every cool BI tool I can find (prompted by recently seeing metabase and AWS quicksight) seems to work with one database at a time.

When working with microservices, your data is spread across some N databases, rendering most BI tools, from what I can see, completely useless for reporting on more than one service at a time.

Are there any solutions for this? Or is the only one right now just dumping all your service DBs into a single DB for analytics?

edit: thanks friends. ETL it is. don't know why I thought it would make sense to have a tool that reports across multiple databases since the performance would be horrific...although maybe there's space for a hosted BI tool that does ETL automagically. just a thought

vog · on Oct 21, 2015

You will have N databases no matter what. Almost every company has some legacy systems around, or simply uses different tools for different jobs.

However, you still want to copy & combine your data into a single database. The relevant Fowler patterns are:

http://martinfowler.com/bliki/DataLake.html

http://martinfowler.com/bliki/ReportingDatabase.html

More precicely, it is a common pattern to split the analtics into three parts: 1) Collecting the data (using lots of adapters) into a "data lake", 2) filter and preprocess that data lake into a uniform data structure whose structure is determined by the analysis goal rather than the operational system, 3) analyze that uniform data with statistical and other tools.

It usually makes no sense to combine those phases into a single overall tool. First, these are very different tasks, where different specialized tools will evolve anyway. Second, you want to keep the intermediate results anyway - for caching as well as to have an audit trail and reproducibility of the results.

For example, you don't want the performance of operational system be affected by for how many analysis tools it is used at a point in point. Also, you don't to work on a always-modifying dataset while fine-tunung your analysis methods.

jtmarmon · on Oct 21, 2015

Thanks for your input. The point of a BI tool is to allow flexible analysis of your data. What types of transformations are generally required for making this work without actually _doing_ the analysis during the warehousing step?

vog · on Oct 21, 2015

The data lake contains datasets in their original structure, raw data, with all warts and stuff, defined by the operational systems.

The data warehouse contains datasets in a unified (and usually simplified and reduced) form, defined by the needs of analysis tools.

If your analysis tool accesses the data lake directly, it will almost certainly contain "parsers" for various operational data formats. Also, it will perform those transformations over and over again every time. And multiple analysis scripts may contain multiple versions of those parsers. The idea is to separate these "parsers" out of the analysis step and to "cache" the cleaned-up intermediate result. That "cache of clean data" is usually called "data warehouse", and can create good indexes on that data, multiple runs of your analysis tools have very fast access.

jtmarmon · on Oct 21, 2015

got it. so the idea is generally people want to do sort-of 2nd derivative queries on the data, so it's best to get those first stats out of the way in the warehousing step

vog · on Oct 25, 2015

Yes, although I wouldn't describe this as "2nd derivative queries", but more like "put the code that you need anyway into two separate layers (tools) with clean boundaries and a persistent intermediate result".

exelius · on Oct 21, 2015

This is why data warehousing exists. That way you can use many different BI tools against a consistent set of data and pre-calculate a lot of commonly-used summary data.

ETL also isn't something you can just do automagically. It requires an understanding of the data and your goals, because you essentially have to build both your data model and your reporting requirements into your ETL process. You could probably do it automagically for some simple cases, but for most real-world scenarios it's just going to be easier to write a Python/Perl script to run your ETL for you.

BI / reporting requires a lot of plumbing to work correctly. You have to set up read-only clones of your DBs for reporting (because you don't want to be running large queries against production servers) and generally an ETL process that dumps everything into a data warehouse. From there, you can push subsets of that data out to various BI tools that provide the interface.

jrpt · on Oct 21, 2015

If anyone's looking for a straightforward ETL framework, check out DataDuck http://dataducketl.com/ The `dataduck quickstart` command is as close to automagic as you can get, and then you can customize your ETL after that.

demian · on Oct 21, 2015

Thank you, I'm currently looking for alternatives to "enterprise" ETL tools.

Most of leading tools are HORRIBLE as programming enviroments and also are incredibly expensive.

glogla · on Oct 21, 2015

Yes. For example, I am convinced that Informatica Powercenter was made by the Devil to bring suffering into the world.

demian · on Oct 21, 2015

PowerCenter is a piece of shit.

arethuza · on Oct 21, 2015

Then that makes it a lot better than some "enterprise" ETL/integration tools I've seen that could only aspire to be pieces of shit.

robinhowlett · on Oct 23, 2015

I recently joined a competitor (SnapLogic). It's a lot nicer IMHO - HTML5 drag-n-drop interface, unified platform for big data, API integrations, IoT etc., and supports executing integrations either in the cloud or on-premise.

blumkvist · on Oct 22, 2015

Microsoft's SQL Server Integration Services has a lot of adapters and transformations out of the box. The tooling is in visual studio, you can use c# or f#. Warning about f# - you will never want to touch other languages after you try it.

unixhero · on Nov 2, 2015

Yeah. It's really friggin good. I highly recommend it. And I'm an entrenched libre open source kinda character.

I recommend reading PACT PRESS' books about Business Intelligence on the Microsoft SQL Server stack. It brings it all together in a fantastic way, example driven.

JPKab · on Oct 22, 2015

This looks bad ass, but unfortunately I work with retail data, and retailers don't want any of their data on Amazon. Bummer.

glogla · on Oct 21, 2015

They don't mention the github version anywhere. Is it useable on its own?

mkesper · on Oct 21, 2015

You get full control of how and where you run Metabase which means:

    No storage limits, pricing tiers, or caps.
    Your data stays private and on your own servers.

exelius · on Oct 21, 2015

Metabase is nowhere close to being a business-ready BI tool. I wish them luck, but their current product is basically a tech demo compared to the real, commercial products out there -- none of which require you to put your data anywhere in particular either (you can run Tableau against pretty much any DB platform out there).

glogla · on Oct 21, 2015

Oh, I meant the duck thing.

darylteo · on Oct 22, 2015

Just to provide more options. Pentaho comes with a full suite including BI and ETL offerings (Community Edition or Enterprise).

Alternatively, you might be interested in Apache Nifi for ETLs (apparently used by the NSA for big data stuff...) then combine with Metabase.

JPKab · on Oct 22, 2015

NiFi looks very promising for numerous chunks of the ETL capability space, but I'm not sure it can stand alone on the transformation piece. I've been looking into coupling it with python for a full ETL stack.

joewitt · on Oct 25, 2015

JPKab - As you explore NiFi more and pair it with your own scripts I'd be curious to hear if you think there are things we can and should do better to be more complete. Let us know at dev@nifi.apache.org Good luck!

jaltekruse · on Oct 21, 2015

Disclosure, I am a Drill committer.

If you are looking for an open source technology to meet this need, Apache Drill is a distributed SQL engine that can talk to multiple datasources and make them available in a single namespace. Joins across sources are supported and can actually perform well when you use enough nodes for your data volume.

Because Drill exposes ODBC and JDBC interfaces, it can very easily be used with BI tools to expose all kinds of data to analysts.

The architecture is fully pluggable so anyone can write a connector to a new datastore and just include it on the classpath when starting the cluster. In the current 1.2 release Drill can connect to any database that exposes JDBC, like MySQL, Postgres, Oracle, as well as a number of non-relational datastores like HDFS, MongoDB and HBase.

https://drill.apache.org/

salsakran · on Oct 21, 2015

Dumb question, but what dialect of SQL does the JDBC interface use?

jaltekruse · on Oct 21, 2015

Drill uses standards compliant SQL for flat data.

To work with all of the non-relational stores Drill supports complex types like maps and lists. The syntax for working with this complex data is the same javascript, i.e. a.b.c[3], so this is a compliant extension of SQL, but is non-standard.

buro9 · on Oct 21, 2015

The solution is to use an ETL (Extract, Transform, and Load) tool to copy data to a warehouse.

You then point the BI tool at the warehouse.

There are lots of tools for this already, and even more custom solutions and other things. CSV files tend to rule here... scheduled exports that are picked up by the ETL tools.

The transformations that take place are usually normalisation of data from systems of differing schema or different types of data (perhaps changing strings into corresponding numbers, i.e. Risk Level = Orange could be normalised as 50% in the data warehouse).

Businesses have taken this approach for a long long time as the databases that drive applications, services, and now micro-services have always been tuned for the performance of the production use-case and not for ad-hoc reporting.

hglaser · on Oct 21, 2015

(Full disclosure, co-founder of Periscope Data here.)

Periscope Data supports cross-DB joins. We cache the data in our own clusters so queries run really fast, and also so you can do things like join across databases and upload CSVs.

More info: http://www.periscopedata.com. I'm harry at periscopedata.com

jtmarmon · on Oct 21, 2015

woah, right as a I wrote that edit I get this comment ;) very cool. checking it out

jrpt · on Oct 21, 2015

Some BI tools can connect to multiple databases, however the right solution is to put all your data into a single data warehouse and run the analysis off of that. Query performance is one reason, and here's a blog post I've written with some more:

http://dataducketl.com/blog/every-tech-company-needs-a-data-...

sixdimensional · on Oct 21, 2015

I agree, the "one database at a time" approach is the way vendors ignore the harder problem of connecting data across platforms.

I used to work for a company that built the "automagic ETL" kind of solution. You could write a single straight SQL statement across virtual "tables" (if you had non-tabular data, say NoSQL, you could query it through a table-like projection). It could literally join data across heterogeneous systems by shuffling data between them or to a dedicated "analytical processing platform" for join processing. Or you could do things like, create a table on one system, as the result of a select from a join of tables on three other systems. At the time, this was way ahead of what anyone else was doing.

However, it is a hard problem to solve, the company is/was small and funding was a problem because it took a long time to find ways to invent the tech. Also, it was an enterprise solution and closed source - when it really probably needed to be open source to be able to support the diversity of data sources.

These days, between Apache Drill, Spark, Ignite, etc., and any number of other commercial solutions, we're starting to see the solutions to solve the problem you're talking about.

I bet this Metabase UI on top of Apache Spark, and your databases, would be a killer. That's a common pattern (BI tool on top of Spark), see Apache Zeppelin for how it uses Spark, for example.

That said, as long as your data isn't truly ridiculously huge - if it can be centralized, centralization still works just fine.

ajw0100 · on Oct 21, 2015

This is something we saw at Chartio very early on. It's why we built the concept of layers directly into our product to support cross-DB networked joins:

https://support.chartio.com/quick-start/#layers

Feel free to reach out to me (aj at chartio.com) for more info!

mipmap04 · on Oct 21, 2015

I've worked in this space a lot and in my experience, the only true solution is some sort of layer below your reporting tool to aggregate the data. That being said, it's not always a database that physically stores the data. It simply needs to be a semantic layer that models the way in which the data will be queried. Additionally, your OLTP systems are not optimized for reporting and you'll quickly choke yourself out trying to perform complex analysis on data structures attempting to solve a problem for which they weren't designed.

salsakran · on Oct 21, 2015

So in practical terms, you'll eventually want to bring everything back in and create a analytics DB or data warehouse. Depending on what your schemas look like in each microservice, you might just be able to add each DB into Metabase and hit them up.

We don't do cross DB joins in the query builder. You can still use something like PG foreign data wrappers to create an uber-db with all the tables from each microservice.

Regardless of whether you use MB or another tool, there will be some legwork. Part of the tradeoffs involved in the great monolith vs microservices decision.

smithian · on Oct 21, 2015

I've done some minor stuff in MS PowerBI and PowerQuery/PowerPivot that I think meets your need. It's pretty nifty.

https://powerbi.microsoft.com/ http://www.powerpivotpro.com/what-is-power-pivot/

cgio · on Oct 22, 2015

The non-cool BI tools work alright with multiple sources and integrating them into a common OLAP model that takes care of automating the relevant backend queries. This is a valid approach if you do not want to move all your data into a DW before reporting on consolidated views, but the there is no magic bullet in terms of which model is optimal. I am talking about Oracle BI, SAP BO, Cognos BI.

demian · on Oct 21, 2015

Tool-level data integration is supported by tools like Spotfire, but for analyzing high volume of data you need to mantain a data warehouse using ETL code to integrate the data from all the diferent sources.

For better performance you may also need to implement an OLAP tool.

bigger_cheese · on Oct 21, 2015

We use SAS Enterprise Guide for this at my work. Handles source data spread across multiple databases.

njx · on Oct 21, 2015

lot of bi tools provide connectivity via odbc and jdbc so works with multiple databases at the same time. check http://www.infocaptor.com for e.g, serves both odbc and jdbc, on prem and on cloud

bra-ket · on Oct 21, 2015

take a look at Apache Calcite: https://calcite.incubator.apache.org/

mark_l_watson · on Oct 21, 2015

I like the idea and the use of AGPL. This license lets companies release useful software and still keep the door open for selling alternative licenses, if that is what Metabase does. I have been thinking of writing something similar for natural language access (similar to what I used in my first Java AI book fifteen years ago) to relational and RDF/OWL data stores, and Metabase provides a nice example of how to distribute a general purpose standalone query tool.

Also +1 for being a Clojure app. I am going on vacation tomorrow and am loading the code on my tiny travel laptop for reading.

vbit · on Oct 21, 2015

Perhaps, but many companies have a 'Abosolutely No AGPL' policy which effectively reduces adoption.

nabla9 · on Oct 21, 2015

That's why AGPL + Multi-licensing is so good idea.

If you have hangups against AGPL, you can pay money and get commercial license.

dublinben · on Oct 21, 2015

Those companies are missing out on some really great tools then, like this one. Perhaps as more and more awesome software is released as AGPL, they will change their position.

vinceguidry · on Oct 21, 2015

That seems like a very silly policy. Do these companies do any kind of analysis on what they'd need from their software licenses and what they'd be giving up by using AGPL?

You'd think from their attitudes that the FSF is a patent troll.

nyan3 · on Oct 21, 2015

Let's look at the advantages of using software under "permissive" licenses for those companies: - embedding somebody else's code in a closed source product and potentially compete against it - patenting somebody else's code and potentially sue the author

For the upstream author losing this type of hostile users my using AGPL might be a good idea.

davis · on Oct 21, 2015

It's not a silly policy. Using AGPL software (without getting a commercial license) in any modified form and/or depending on it would require all applications that use it across the network to be open sourced.

Of course this requires it to be modified which could happen accidentally and completely put a closed source application in jeopardy.

More info: http://programmers.stackexchange.com/q/107883/89740

jf · on Oct 21, 2015

Of course, the way to say in compliance with those policies is for those companies to purchase licenses for the software.

salsakran · on Oct 21, 2015

Thanks for the kind words!

Would love to hear what you think of the codebase.

JPKab · on Oct 22, 2015

More kind words:

This thing is badass.

One very selfish request: Google Bigquery connector.

I think a lot of the companies you'd like to serve are going to be folks who might start getting into large quantities of data, but are too small to have people to build out and maintain OLAP schemas. This, to me, is where BigQuery comes in.

salsakran · on Oct 22, 2015

If you don't mind, cast a vote in https://github.com/metabase/metabase/issues/1294

We're going to be building out connectors by community demand. =)

iheartmemcache · on Oct 21, 2015

FYI, Metabase is a Thompson Reuters platform. I bet you anything they hold the trademark for the term.

You have a pretty good market opening right now - Crystal Reports just went to a SAP Business Objects-type model. Pentaho doesn't even have a community edition anymore.

It looks like you don't support MDX/snowflake schemas/other OLAP reporting standards, so I'm guessing you're going for the lower end of the market. So two things you should do off the bat - import data from Google Sheets (maybe offer to "unlock" the service for tweets and/or links), and offer an Excel plug-in (free up to X rows).

Also offer to charge if only to allow Bob to go to his manager and say you offer support. In corporate environments, no support is often a no go.

exelius · on Oct 21, 2015

Everyone in the low end of the business world just uses Tableau or QlikView - neither company is very militant about enforcing their trial licensing; and full versions aren't really all that expensive for a business anyway (~$500 per license per year, but a small company can get away with a single license). Both tools are also much more mature and powerful than this product.

Not saying there's no room for an open source BI platform, but I don't know there's a lot of money in it given the competitive pressure Tableau and Qlik are putting on the SAPs/Oracles of the world. Anything over $100 is going to put you squarely in the realm of real BI tools with MUCH more capability, a much simpler interface, and standardized training programs.

infinite8s · on Oct 21, 2015

Tableau is ~$2000 a yr, and Qlikview is > $1000 (although the recently released Qliksense is free).

exelius · on Oct 21, 2015

Unless the price changed in the last month since I talked to a sales rep, it's $999 for a desktop license, and $500/yr for "Tableau Online".

infinite8s · on Oct 22, 2015

That's for the personal edition, which can only connect to CSV, Excel, MS Access, OData and Tableau's proprietary Data Extract format (TDE). Tableau Online requires your data to be stored in their cloud (which might not be acceptable to some IT departments).

dsmith-tableau · on Oct 26, 2015

(Full disclosure: I work at Tableau)

Tableau Online doesn't require you to save your data on our platform if it's stored in AWS, Google, or Azure (example: Amazon Redshift or Aurora). Your can maintain live data connections with datasources hosted on those ecosystems and the source row level data doesn't ever need to land in the Tableau Online platform. More info here: http://www.tableau.com/learn/whitepapers/tableau-online-keep...

blumkvist · on Oct 22, 2015

And also you need the desktop product to publish to it. Tableau online is a cloud alternative to the on-premise online dashboarding, not for the desktop client.

salsakran · on Oct 21, 2015

Yup, TR has a bioinformatics DB they sell which I think is based on metabase.org. We're in a sufficiently different space that we're hoping it's not a conflict.

We don't explicitly support snowflake schemas but folks are definitely using MB with them them. We have a limited set of joins accessible through the query builder and snowflake schemas work really well.

blumkvist · on Oct 21, 2015

The BI market is crowded af.

escobar · on Oct 21, 2015

I always find database & "data helper" tools fun to experiment with. After playing with this with a MySQL database for 10 minutes on my Mac I have 2 initial reactions:

1. The GUI is quite nice, and very simple. It is a glorified query tool that knows your tables and helps you make queries with visualizations and gives the results. It still allows for raw queries if you're into that and just want their GUI for queries.

2. The tool (or the Mac app, at least) still has plenty of bugs to iron out:

- I tried creating a "Dashboard" and it wouldn't actually create or close the modal window. Then I refreshed the application and it had created 10 of the same Dashboard.

- I tried deleting the database and the button just doesn't work.

- Many of the queries I ran on my own tiny sample db seemed to just not run. Closing & reopening the app didn't help.

I feel like the bugs could largely be to do with the OSX binary in specific, and not the actual platform. Quite interested to see how this develops, and am going to put a bigger database in to play later when I have more time.

salsakran · on Oct 21, 2015

Really strange. Out of curiosity what version of OS X are you on? If you don't mind filing an issue at https://github.com/metabase/metabase/issues/new or http://discourse.metabase.com/ we would love to get to the bottom of what's going on.

escobar · on Oct 21, 2015

I'm running 10.9.5 and was planning on rebooting my machine, replicating once more, and filing an issue tonight :)

shrikant · on Oct 21, 2015

For anyone else wondering (as I was), the databases currently supported as data sources are [0]:

1) H2

2) MongoDB

3) MySQL

4) Postgres

[0] http://www.metabase.com/docs/v0.12.0/administration-guide/01...

eis · on Oct 21, 2015

Any such open source tools are welcome.

How does it compare to re:dash? https://github.com/EverythingMe/redash

Metabase seems to have a GUI to construct SQL queries which re:dash doesn't. What else is different?

salsakran · on Oct 21, 2015

re:dash is very cool, and we actually used an early version way back.

The primary difference is audience. Most of the impetus to build it came from the difficulties we had getting non-analysts to do their own data pulls and ad hoc queries. While some number of people were able to edit SQL others had written for them, it rapidly became unsustainable.

As a consequence of the audience, we have lots of ancillary features (like the ability to see detail records and walk across foreign key links, the data reference, etc) that fell out as we learned how people used the tool.

numlocked · on Oct 21, 2015

Looks like a nice tool. I wrote https://github.com/epantry/django-sql-explorer, which is sort-of similar in terms of the problem it solves, except embraces SQL front-and-center. Excited to check this out in more depth.

Personally, I think it's really exciting is that this is all written in Clojure! We need more Clojure web apps out there to learn from! An initial glance at the source makes it seem very readable as well. A great learning resources for us amateur Clojurists. Also makes it clear that the ecosystem is pretty immature; there is a lot of frameworky boilerplate to support a straightforward REST API.

salsakran · on Oct 21, 2015

Yes, writing the app in Clojure has been very interesting. Some things were downright magical, while other aspects felt very rough coming from other languages and frameworks. Ironically the prototype was in django =)

There will be a blog post about it coming up!

numlocked · on Oct 21, 2015

Yeah -- it seems crazy that you have to reinvent things like your defendpoint macro.

I know Clojure is all about composable libraries, not frameworks, etc, etc but just as compojure has become the de facto standard for routing, I imagine more libs will emerge higher up the web stack. It'll mean opinionated libraries, but that is the only way to get more people writing apps that deliver value instead of spending time creating macros that ensure (for instance...) the presence of int IDs in API requests.

Buetol · on Oct 21, 2015

Clojure + React/Redux, the people behind it are incredible !

_979m · on Oct 21, 2015

Another extremely powerful open source BI solution is Saiku, based on the Mondrian OLAP server: http://www.meteorite.bi/products/saiku

Omnipresent · on Oct 21, 2015

This looks extremely cool. Is there a live demo of it running? or does one of the cloud images come with some sample data so that we can play with it?

zachlatta · on Oct 21, 2015

It's super easy to get started. If you have Docker installed, just run the following:

    docker run -d -p 3000:3000 --name metabase metabase/metabase

salsakran · on Oct 21, 2015

Thanks! We ship a sample database to play with in all the distributions.

anonu · on Oct 21, 2015

How would we compare Splunk and Palantir with this company? I know their focus may be a bit different: Splunk is geared towards "logs" and Palantir towards govt datasets, etc... but at the essence of each company is the task of bringing diverse datasets together and making sense of it. Which makes me think that the companies that are successful at this focus on a particular use-case - as opposed to trying to build for a generic one. thoughts?

salsakran · on Oct 21, 2015

Splunk is mainly about log analysis, though they have a new BI product. It's primarily about making sense of your logs. We primarily work with a database you already have (and eventually the data warehouse you'll spin up).

Palantir is a wide set of tools and really about much more powerful analysis than we're targeting.

Our whole bag is letting non-technical people in your company get answers to their common questions by themselves.

LukeG · on Oct 21, 2015

We've been using Metabase for the last 6 months, and we use it every day. So straightforward, even for non-technical, non-analyst users.

JPKab · on Oct 21, 2015

This looks awesome, and is something I've been looking for. I'm aiming to start offering a specific service to businesses creating a back-end for BI platforms, but would like to offer the service as cheaply as possible by having a license free BI layer.

krmmalik · on Oct 21, 2015

Would be interested in what you can offer when youre ready. Im helping businesses implement BI more from a strategic perspective. I dont get involved technically , its just part of my consulting.

dragonwriter · on Oct 22, 2015

Terrible thing about the googlability of the name, since its so heavily overloaded for DB-related things.

Also, while it seems to work tolerably well with the included sample database, connecting it to a local Postgres instance seems to work to the level of connecting and getting all the schema information, but every attempt to run a "question" -- even the simplest pick-a-table-and-go, or the questions it automatically suggests -- results in the "We couldn't understand your question. Your question might contain an invalid parameter or some other error." message, so it seems like there are some pretty significant rough edges.

salsakran · on Oct 22, 2015

Hey -

Really appreciate you taking the time to try it out!

If you're still around, are you using the default schema ('public') for your tables? We've found a few bugs around non-default schema we're working on.

pnathan · on Oct 21, 2015

This is really cool. Am I correct in drawing the inference that this is solely a "view" system, for viewing and querying the database?

salsakran · on Oct 21, 2015

Yes, we're mainly focused on making it easy for non-technical people to pull data out. We have had embedded applications that hit a REST api to expose actions on the data that was embedded though.

ludbb · on Oct 21, 2015

How do you configure to use Mongo with SSL? The mongod server is rejecting connections from metabase as it never tries to connect using that.

snorge · on Oct 21, 2015

What are your plans for supporting other databases?

salsakran · on Oct 21, 2015

Redshift, Druid and Presto are in planning, but in general, we're going to build drivers in the order of community demand. If you have a specific db, open an issue or chime in on one of the open driver issues letting us know it's important.

justabystander · on Oct 21, 2015

Support for some of the more common JDBC drivers would be nice, like SQLite, MS SQL and (shudder) Oracle. I know Korma supports all of these (http://www.sqlkorma.com/docs#db), and it seems like you use that.

numlocked · on Oct 21, 2015

Bug report -- hitting 'Help' in the OSX app leads to: http://www.metabase.com/docs/v0.12.0-test, which 404s

I was trying to figure out how to get it to walk a FK/do a join so I could filter based on the name of the foreign entity instead of the ID.

salsakran · on Oct 21, 2015

Oh, that's an old version tag. Keep an eye out for a new version in a few hours, the app should offer you a chance to update.

Regarding FK's we let you filter or group by any attribute on either the table you're working with or any attribute on an FK in that table. Are you using the sample DB or your own?

numlocked · on Oct 21, 2015

Maybe I'm an idiot, but here's me selecting from a customer table. Customers have a FK to 'address' via ship_address_id. I want to filter on address.state, but can't figure out how to do it. 'state' is not an available field in the filters and I can't seem to drill into any of the FK entities.

Screenshot: https://s3.amazonaws.com/uploads.hipchat.com/57670/394271/Wt...

salsakran · on Oct 21, 2015

it looks like the field isn't marked as an FK in our semantic model. Could you go the admin, and click on "Metadata" and check to see what "special type" that field is marked at? If it's not "Foreign Key" you can set it to that and see if you can filter by linked fields.

What type database are you pointing this to? Do you have FKs/constraints set?

Kinnard · on Oct 21, 2015

This looks very interesting, but a clearer more to the point explanation, example, or video should be on the landing page.

anonnomad · on Oct 22, 2015

Thanks! Absolutely awesome tool. Could do almost all of the charts & metrics I need. I couldn't figure out how to do a chart that shows me the SUMs of something per day and per category. Also number "questions" in the dashboard could benefit from better styling (maybe center horizontally/vertically).

krmmalik · on Oct 21, 2015

Very cool. How does this compare to something like Microsoft Power Bi? Will I have any issues running it on Linux (Note, im a fairly non-tech user).

I have some clients that need BI reporting. We've been using Cyfe so far, but it's only good for summaries of data. It's not very good for custom reporting.

greggyb · on Oct 21, 2015

These are very different product offerings. Power BI offers a mature in-memory columnstore database with a custom functional query language, DAX, with full dimensional modelling facilities. Additionally, if taking advantage of the (freemium) Power BI web portal, you get natural language query and cross-platform mobile apps.

This is essentially a SQL query builder on top of a RDBMS with some nice visualizations.

I am not trying to disparage the project - in fact I'm often of the opinion that a good query builder is all most organizations need for BI, but the two products in question are vastly different. Power BI has tooling to perform ETL, data modelling, and reporting. Metabase only covers the last piece.

In terms of performance, Metabase, being on top of a SQL RDBMS will have the potential to be incredibly speedy, but with large data volumes, you will need a DBA to make sure reporting scales appropriately. With Power BI, that conversation comes much later in the game. 10Ms of rows are trivial with a star schema and sufficient RAM in terms of performance tuning. With a SQL solution, there is a lot more back-end work to tune for an ad-hoc BI workload even at the 1Ms of rows level.

salsakran · on Oct 21, 2015

We run it on linux in production and it's the primary platform we recommend for company wide deployments.

Power BI is pretty awesome, but pretty complicated when we tried it out. With enough attention it can do very cool things. Our primary audience is folks who don't need the level of customization, and have a lot of non-technical coworkers that want to ask their own questions.

krmmalik · on Oct 21, 2015

Brill! Looking forward to trying it

buremba · on Oct 21, 2015

I couldn't run the Docker image, it seems that the Java process running in docker image doesn't bind the port 3000. Also tried to deploy to Heroku but when I try to access the application via url, Heroku throws 503 error.

tlrobinson · on Oct 21, 2015

Curious. I run the docker image all the time and haven't had any issues with the port binding. What kind of system are you using? and what was your launch command?

You can double check that we are exposing port 3000 from the container which is where the app is designed to run by default. The Dockerfile used to build the image is in our repo here: https://github.com/metabase/metabase/blob/master/bin/release...

Heroku sometimes times out if the application takes too long to start. Could you try again and see if it works? We're working on making Heroku deploys more robust.

buremba · on Oct 21, 2015

It seems that the Docker issue is memory overflow in container. I tried to run the docker image in a 512mb Digitalocean droplet that already runs several processes that also consume memory so I think it's expected considering Metabase runs on JVM.

salsakran · on Oct 21, 2015

Are you on virtualbox? Sounds like a port binding issue. If you open an issue at github.com/metabase/metabase or http://discourse.metabase.com/ we'll try to debug exactly what's happening.

jrvarela56 · on Oct 21, 2015

Same here, tried deploying with the Heroku button but can't access the app after deployment.

anuj_nm · on Oct 23, 2015

Just tried this locally and on AWS at Change Heroes and its great! We've been looking to become more metrics driven as a company, and this might be the tool that makes it happen!

Thanks for building it and making it open source :)

lfuller · on Oct 21, 2015

I'm unable to deploy to Heroku, though it seems to be a problem on Heroku's end. Their Rollbar snippet is throwing an unhandled error once I arrive from your deploy button.

pgm8705 · on Oct 21, 2015

I'm able to get get past that and get the app deployed, but then I just get application error trying to launch it. Then I went and looked at the build log. I'm not sure its even doing anything:

-----> Fetching custom git buildpack... done -----> Null app detected -----> Nothing to do. -----> Discovering process types Procfile declares types -> web -----> Compressing... done, 33.1MB -----> Launching... done, v5

djfm · on Oct 21, 2015

> Null app [...] 33.1MB

Wow!

salsakran · on Oct 21, 2015

We occasionally see failures to deploy when things don't spin up fast enough. Does it blow up every time you click on the deploy button?

katsuyan · on Oct 21, 2015

We've been using at Color Genomics for a few months and we absolutely love it. Easy to setup, no data sharing with external parties, and simple/straight forward UI.

salsakran · on Oct 21, 2015

Thanks for the kind words!

mgrennan · on Oct 21, 2015

TNKS - Nice simple tool. I started in AWS (t1.micro) as a .jar. I had to do "hostname localhost" to get the database (h2) to run but then it came right up.

z3t4 · on Oct 21, 2015

"there's more then one way to skin a database"

unixhero · on Nov 3, 2015

Yeah all that database skin stuff... It's essentially derogatory or veiled complaints about simplicity right?

boundlessdreamz · on Oct 21, 2015

In the mac app I'm unable to set a column type as a foreign key. The get a list of targets (table -> primary key) but unable to select from it.

salsakran · on Oct 21, 2015

Could you take a screenshot of what it looks like and leave a post on with it on http://discourse.metabase.com/

I'll try to reproduce it. What database are you using?

boundlessdreamz · on Oct 21, 2015

http://imgur.com/xzb3mVo When I try to select Users-> Id (or any other option) it does not get selected/saved. I'm using Postgresql

Omnipresent · on Oct 21, 2015

First thing that caught my attention was the sailboat and if I could interact with it.

salsakran · on Oct 21, 2015

easter eggs to come =)

Omnipresent · on Oct 21, 2015

Little touches like that make the product great! Is the metabase front page part of the github repo?

cammsaul · on Oct 22, 2015

This is so cool! Written in Clojure! WOW!

Any idea on the timeline for SQL Server support ?

2pointsomone · on Oct 21, 2015

I've been looking for something like this for ages!

salsakran · on Oct 21, 2015

Let us know what you think!