Hacker News new | past | comments | ask | show | jobs | submit login

It would be interesting to know the identity of the vendor for "DBMS-X". I work in the "enterprise" data warehouse space and I'm trying to advocate moving away from "database appliances" towards distributed computing, and having a quotable source from Google would be very compelling.



I'm curious - what sorts of work do you do in the data warehousing space? Do you work as a consultant, or as an implementor at a customer of data warehouse products?

It seems to me that that whole industry (DW & ETL) is a dinosaur whose lunch is about to get eaten by some upstarts.


I've read a few books on data warehousing, and maybe you can confirm my suspicion:

Isn't ETL just an acronym that means "I wrote this Perl script to populate the database"?

How on earth is that even an industry?


Simple ETL jobs are mostly just E & L: extract the data from one system, load it into another.

Where things get complex is in the Transform aspect of some jobs. Mapping disparate schemas is complex, often messy work. Especially when one (or both) sides of the ETL job have poor/no primary keys, foreign keys, or even are just "mostly standard" CSV files [shudder].

Also: some ETL jobs can get quite large. I know one guy who had to create an ETL system that continuously moved data from one 1200-table system into some other system. Crazy.


The term "ETL" itself is often used in place of "Data Integration" which is much larger, particularly when it comes to data warehouse design. The wiki article is a good drop off point: http://en.wikipedia.org/wiki/Data_integration

It may be difficult to understand how this is an industry coming from a web development/startup angle (big supposition there) but there are literally thousands of companies with lots of databases varying in age, size and complexity that need integrating, and plenty of companies competing for that work as either implementors or software providers. A perl script might do the job but most products focus on performance, reuse, ease of maintenance and compatability across many different database/file types.


Eh. Even if growth slows a lot because more and more new systems are Hadoop/etc, big companies are so tied to their legacy systems that they basically never get rid of what they have, so those companies will have significant recurring revenue from their current customers for the foreseeable future.

I also get the impression that Exadata is a pretty impressive feat of engineering and, if you need to do what it's optimized for and are prepared to pay a few million per rack, it's a very good option.


Both, really. I work as a consultant for a company that provides consultancy for clients that use ETL products (software/'appliances' etc).

Your second comment is true, however the DW industry has in the last year figured this out and started to embrace the "Big Data" movement. Informatica (the largest player in the DW space according to Gartner) added HDFS connectors to its latest release, for instance.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: