We're on iteration 2 for this course, and it's still in somewhat rough shape. If you plan to devote significant time to the lectures, I'd recommend waiting until next spring, when we'll be teaching iteration 3 online at http://ds-class.org.
Maybie it's just me, but "Data Science" seems to be basically enterprise information systems (ETL, dataflow diagrams, data mining, Data Warehousing, business inteligence, and so on) but with a cool mobile-social-esque feel associated to the term.
I had classes in information systems analysis & design with the exact same content.
There's definitly been some ruffed feathers from the BI professionals and data analysts over the "data scientist" title. We like this band before they were popular man!
And there's some truth to the criticism that its mainly a rebranding. Someone (can't recall the source, sorry) recently defined "data scientist" as "a data analyst who lives in California."
That said even though many of the generalized tasks are the same I think there's some value to the title. There are a broad range of big pro and analyst roles that don't fit. Lots of big pros just make ssrs reports or just build star schema or look at data for insights but don't apply any hypothesis, test, repeat method.
The key differentiators for a data scientist IMO are
- can do everything required to go from piles of unorganized data to usable insights. From data munging to visualization design to programming to applying statistics correctly to analyst activities like knowing what business questions to ask
- when doing analyst work they operate using scientific(ish) methods to test and verify data hypotheses.
That describes many data analysts and BI pros that don't have cool titles now, but may soon. Recognizing the difference between people and businesses that do all of that vs report writers and ad hoc olap browsing users is valuable and positive IMO.
So, basically, you are saying that the main difference is that data scientists also make desicions based on the data, while the BI/DA works as a "data guy" for executives. Is that a correct way to put it?
In a way there seems to be a parallel between the enterprise programmer vs. hacker, and the business inteligence/data analyst vs. data scientist.
Yeah or at least the execs are saying "getting more users is important. How can we improve signups?" instead of "get me a time on signup page metric on report x."
A data scientist is like an analyst that doesn't have to go beg the tech guys to collect a new data set or build a new mining model, etc.
When Jeff & DJ Patil started using the term "data scientist," they were at Facebook and LinkedIn making products ("People You May Know," etc.) via machine learning on massive datasets.
It may be my ignorance, but when I hear "enterprise," "BI," "ETL," etc., I'm picturing some poor analyst doing database JOINs in order to dump the latest widget numbers into a PowerPoint table for the next board meeting.
Insofar as there is such a thing as "data science," I think it means making transformative use of data (ie by creating tools or models), not just summarizing it.
Lots of back and forth over the nomenclature as usual. I hope not to obfuscate further by adding my definition.
I'm currently a BI consultant aspiring to the title of data scientist and here's my motivation...
Traditional business intelligence skills basically refer to people who are 'IT guys that have finance knowledge' ...so generally you'll find yourself doing pretty general reporting along with some financial performance management (FPM) albeit at the data modeling/ metadata modeling level (you're building metadata models and cubes/reports dashboards with drill down not just flat reports.)
All of this is done at the whim of some exec/BA/line manager all of whom (in my experience)seldom understand the subject well enough to actually pose sensible strategic questions.
Data science implies several levels of creativity expressed through solid technical skills along with a dash of journalism. Maybe it is just a rebranding but what it represents to those in the field is a total paradigm shift in terms of where and how the skills are applied. This is key because all too often my work as a BI consultant boils down to churning out x number of meaningless reports by a certain date so that some head of department can get his bonus and justify the Oracle purchase that incidentally resulted in a 3 day trip to Paris funded by a stunningly sophisticated sales team.
If I come off cynical it's because I am passionate about data. I believe that data science and the paradigm shift it represents has the power to really change human lives and I believe that it has a key role to play in the future of the evolution of our species.
This year's offering has some changes from the Spring 2011 version of the course (the assignments are all different), but you can view the Spring 2011 at http://datascienc.es/spring-2011-course/
I must say I am quite disappointed... watched 2 hours of this, then started skimming, sorry but everything I have encountered is quite trivial (and I see myself as a rookie in the field). Did other HNers find new bits in those lectures? if so, please point out.
We're on iteration 2 for this course, and it's still in somewhat rough shape. If you plan to devote significant time to the lectures, I'd recommend waiting until next spring, when we'll be teaching iteration 3 online at http://ds-class.org.
Later, Jeff