Hacker News new | past | comments | ask | show | jobs | submit login

This is why data warehousing exists. That way you can use many different BI tools against a consistent set of data and pre-calculate a lot of commonly-used summary data.

ETL also isn't something you can just do automagically. It requires an understanding of the data and your goals, because you essentially have to build both your data model and your reporting requirements into your ETL process. You could probably do it automagically for some simple cases, but for most real-world scenarios it's just going to be easier to write a Python/Perl script to run your ETL for you.

BI / reporting requires a lot of plumbing to work correctly. You have to set up read-only clones of your DBs for reporting (because you don't want to be running large queries against production servers) and generally an ETL process that dumps everything into a data warehouse. From there, you can push subsets of that data out to various BI tools that provide the interface.




If anyone's looking for a straightforward ETL framework, check out DataDuck http://dataducketl.com/ The `dataduck quickstart` command is as close to automagic as you can get, and then you can customize your ETL after that.


Thank you, I'm currently looking for alternatives to "enterprise" ETL tools.

Most of leading tools are HORRIBLE as programming enviroments and also are incredibly expensive.


Yes. For example, I am convinced that Informatica Powercenter was made by the Devil to bring suffering into the world.


PowerCenter is a piece of shit.


Then that makes it a lot better than some "enterprise" ETL/integration tools I've seen that could only aspire to be pieces of shit.


I recently joined a competitor (SnapLogic). It's a lot nicer IMHO - HTML5 drag-n-drop interface, unified platform for big data, API integrations, IoT etc., and supports executing integrations either in the cloud or on-premise.


Microsoft's SQL Server Integration Services has a lot of adapters and transformations out of the box. The tooling is in visual studio, you can use c# or f#. Warning about f# - you will never want to touch other languages after you try it.


Yeah. It's really friggin good. I highly recommend it. And I'm an entrenched libre open source kinda character.

I recommend reading PACT PRESS' books about Business Intelligence on the Microsoft SQL Server stack. It brings it all together in a fantastic way, example driven.


This looks bad ass, but unfortunately I work with retail data, and retailers don't want any of their data on Amazon. Bummer.


They don't mention the github version anywhere. Is it useable on its own?


You get full control of how and where you run Metabase which means:

    No storage limits, pricing tiers, or caps.
    Your data stays private and on your own servers.


Metabase is nowhere close to being a business-ready BI tool. I wish them luck, but their current product is basically a tech demo compared to the real, commercial products out there -- none of which require you to put your data anywhere in particular either (you can run Tableau against pretty much any DB platform out there).


Oh, I meant the duck thing.


Just to provide more options. Pentaho comes with a full suite including BI and ETL offerings (Community Edition or Enterprise).

Alternatively, you might be interested in Apache Nifi for ETLs (apparently used by the NSA for big data stuff...) then combine with Metabase.


NiFi looks very promising for numerous chunks of the ETL capability space, but I'm not sure it can stand alone on the transformation piece. I've been looking into coupling it with python for a full ETL stack.


JPKab - As you explore NiFi more and pair it with your own scripts I'd be curious to hear if you think there are things we can and should do better to be more complete. Let us know at dev@nifi.apache.org Good luck!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: