Hacker News new | past | comments | ask | show | jobs | submit login
For Today’s Graduate, Just One Word: Statistics (nytimes.com)
89 points by tokenadult on Aug 6, 2009 | hide | past | favorite | 42 comments



This is a question I've been meaning to ask on here, but didn't think it warranted its own "Ask HN": What's the best way for someone with an otherwise decent math background, but no statistics, to get started on the topic? Any particular books, websites, etc that people recommend?


The key is to come up with a well-formulated question, such as "I wonder if I can predict stock trends." Then in doing so, you'll search the web and come across predictive modeling, which will lead to machine learning techniques, which will lead to good resources. Within those machine learning sources, search for chapters on prediction and classification, and you'll come across regression techniques, support vector machines, relevance vector machines, etc. Then you'll wonder, "ok, how do I actually solve this problem" so you may search for "SVM implementations" and find Steve Gunn's for Matlab for example. Then after much codebanging, you'll realize that inorder to solve this problem, you need a good dataset, so you go to Yahoo finance and see if you can download some data for IBM.

This is usually the process one needs to follow, albeit with some intermediate steps switched out for others here and there.


it seems like if he did that, he'd be biting off WAY more than he can chew at the present time.

If you want to learn statistics, it's probably better to start at the beginning (_Cartoon Guide to Statistics_, O'Reilly's new _Head First Statistics_, or Huff's _How to Lie With Statistics_) than to leap headlong into a huge, mostly intractable problem and pagefault in knowledge at each point you come across something you don't know how to do.


I agree, but at the same time, you only learn by doing. In my experience, even when you dive in way over your head, you tend to pick up the information rather quickly. I took a machine learning class with no statistics background and after a few weeks of floundering and learning terminology I was fine. The benefit of forming a problem first gives you an ultimate goal of which to work towards.

Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew.


I agree, but at the same time, you only learn by doing.

This is true, but I think in the original poster's case it could be much more easily and reliably be accomplished by continually giving him problems that are within (or more ideally, just outside) his circle of competence, rather than a problem like "predict stock prices" which isn't in any human being's circle of competence. Moreover, with the latter problem, he'll get virtually no feedback as to whether his answer was correct, because a correct answer for "use statistics to predict stock prices" doesn't exist. Odds are that if he follows that path he'll quit before making any progress, or at the least he won't be able to close the feedback loop that is so vital to gaining expertise.

I think that if he's starting from ground zero and wanting to learn statistics, he'd be much better served by sitting down with The Cartoon Guide to Statistics and a deck of cards and set of dice first. He can work his way up to conquering the stock market :)

Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew.

I think that's apples and oranges - he's trying to learn statistics, not trying to convince potential clients or investors that he's already an expert. I don't think it would be harmful at all for him to have a lofty, far-out goal like "predict stock prices" to aim toward, but I do think that if he starts out trying to learn statistics by typing "statistical stock prediction methods" into Google, he will burn out rather quickly. Pagefaulting in knowledge when you need it is probably optimal for something where you just want to make sure your knowledge is passable, but if he wants to truly know his domain, he's gotta get out the marbles and urns. :)


"Don't tell Google that it's slogan ("Organize the world's information") is biting off more than it could chew."

They came up with that slogan well after they were already experts in search, and probably after Google had been written and they'd formed a company. To hear Larry tell it, Google started with the question of "What if we could download all of the web and just keep all the links?" And then Terry Winograd encouraged them, and they realized that the link structure of the web was a lot like the citation graph of academic publications. And then they realized they could build a kick-ass search engine by using that to rank page relevance. And then when nobody would license it, they formed a company.

The way to build massive world-changing systems isn't to start with massive world-changing ideas. It's to start with interesting, challenging, but tractable problems, and then see where they lead you.


I agree with you, starting with a problem can help provide motivation, structure, and focus and all of those can help keep you learning quickly.

Still, there is a place for an introductory book if you are truly coming from minimal knowledge of the subject.


No, that's how to figure out if he can predict stock trends. He wants to learn statistics, not necessarily an application of statistics. I see what you're getting at, but before one can ask such a question, one must know if the answer will involve statistical theory and that's a big jump. Learn the basics first, then look for applications lest you can't see the forest for the trees.

The other problem with this approach is that while you may get depth, there is very little breadth. I find that when learning something for the first time, I want to go wide and shallow. Then once I have a basic understanding I can figure out where I want to learn deeper.

OTOH, maybe you just learn things differently from me!

I just gave my friend (a PhD in Operations Research) his basic Statistical Theory book back. I also wanted to get a better understanding of statistics, but that stuff was putting me to sleep too easily :-(


Right. Everyone has different learning styles. The problem with learning just basic theory is that you get caught up in minute details. That's why I always recommend people to start with a focused problem, then in the process of trying to answer that problem, you become knowledgeable about all the pieces of the puzzle. The key is to make the discipline of statistics interesting to YOU!


I can't recommend Cartoon Guide to Statistics by Larry Gonick and Woollcott Smith enough. It's a great resource for developing an intuitive understanding and can often answer questions by explaining the concepts in a different way that traditional texts. http://www.amazon.com/Cartoon-Guide-Statistics-Larry-Gonick/...


I got an excellent introduction to Statistics in my Computer Engineering degree[1][2]. It is for the practical day-to-day use of stats.

Applied Statistics for Engineers and Scientists , 2nd Edition, J. Devore and N. Farnum, Duxbury Press, Thomson Publishers[3]

Hope this helps.

-----------

[1] http://www.maths.unsw.edu.au/students/current/homepages/math...

[2] Set text and course overview :: http://www.maths.unsw.edu.au/students/current/homepages/outl...

[3] http://www.amazon.com/Applied-Statistics-Engineers-Scientist...


Devore is a very good author on statistics.


There was an Ask Metafilter question about learning stats.

http://ask.metafilter.com/105045/Best-way-to-relearn-statist...


Have you checked out Head First Statistics by O'Reilly?


I recently got this book: http://www.amazon.com/Statistics-Gentle-Introduction-Frederi... I found it to be perfect for what I wanted. Basically, my knowledge of statistics was VERY limited, but one of the things I'm working on now requires a good understanding, so.. The book is gentle and not too math heavy. Everything it covers, it does so in detail and with examples and real-world stories. I found it to be a good way to get a basic foundation. You may want to follow up with a more advanced book after though.


I was in the same boat as you. I have an undergrad math degree but took no statistics. I have found that most statistics books are inscrutable at best and complete wastes of paper and effort at worst. Based on the book recommendations I've been given by people who supposedly use statistics all the time, I don't believe that many trained statisticians have any idea what they are doing.

That said, here's what has sort of worked for me:

- Cartoon Guide to Statistics by Larry Gonick

- Fundamentals of Applied Probability Theory by Al Drake: out of print, and mostly about probability. However, the stats intro at the end is the clearest one I've ever read. Originally recommended to me by Philip Greenspun.

- Introductory Statistics with R by Peter Dalgaard. I'm assuming if you are posting to Hacker News you probably are coming from a programming background. This book does exactly what the title says, shows you how to apply introductory statistics by programming. It's heavier on R than stats.


Based on the book recommendations I've been given by people who supposedly use statistics all the time, I don't believe that many trained statisticians have any idea what they are doing.

Many people working with statistics in their employment have never grappled with the issues of adequate descriptive statistics or reasonable inference. Besides the recommendations I just posted above,

http://statland.org/MyPapers/MAAFIXED.PDF

http://repositories.cdlib.org/cgi/viewcontent.cgi?article=10...

see

Statistics: A Guide to the Unknown 4th edition

http://www.amazon.com/Statistics-Guide-Roxy-Peck/dp/05343728...

for articles with examples of what statistics is really all about.


Second the idea that a lot of statisticians don't know what they're doing. The first mistake is to make things easier by glossing over, or mangling, the mathematical details. Many statistics textbooks do this, and the only real protection is to have a good math background. The second mistake is to treat statistics like a bunch of techniques to be learned and applied with little regard for the philosophical problems inherent in every attempt to model the real world.

The Cartoon Guide to Statistics is an excellent way to go from zero to a good overview of the basics with a minimum of hard math. After that, if you're mostly interested in applying basic techniques to your own stuff, you want a good undergrad textbook. I don't have any good recommendations here, unfortunately. If you have a good math background (or are motivated to get it) and you want to keep going, Statistical Models; Theory and Practice by David A. Freedman (http://www.amazon.com/review/R2XUNM92KYU7BB) has the math, the philosophy, the hands on analysis of studies, and the exercises to put you in a better position to evaluate statistical research than some people who produce it.


Physcab suggests a great step #2: diving into some real statistics problems. The information available online today is more than at that point.

Step #1 is going to be getting the foundations so that you can quickly digest the meaning and purpose of things like machine learning. Fortunately, statistics can be largely summarized as "the science/math/art of explaining variance".

Study what variance is and means and you'll dive through probability, distributions, modeling, inference, prediction, parametrization, simulation, and all of those fun topics while keeping an understanding for why they exist.

Finally, I'd highly suggest taking a look at Tufte's work because once you understand variance, you've still got to explain it.


This was a good read: http://www.zedshaw.com/essays/programmer_stats.html

He even mentioned just about every topic we covered in my stats/probability class. There's a book list at the end.


My two favorite orientations to what statistics is all about are two free articles on the Web. Both recommend books for further reading, and they are good books.

"Advice to Mathematics Teachers on Evaluating Introductory Statistics Textbooks"

http://statland.org/MyPapers/MAAFIXED.PDF

"The Introductory Statistics Course: A Ptolemaic Curriculum?"

http://repositories.cdlib.org/cgi/viewcontent.cgi?article=10...

Here's a list of the best beginning textbooks:

http://www.mrderksen.com/textbooks.htm


I found Sheldon Ross, Introductory Statistics very approachable

http://www.amazon.com/Introductory-Statistics-Second-Sheldon...


It is only a brief introduction and should be nothing more than a starting point, but "The Manga Guide To Statistics" http://www.amazon.com/Manga-Guide-Statistics-Shin-Takahashi/... is a good first book.


I highly recommend Hayashi's book on Econometrics. It's what we used in the first year of my PhD program in Economics. But it's probably worth starting with a basic probability and statistics book, preferably one that's pretty mathematical. I've used Hogg McKean and Craig's Introduction to Mathematical Statistics and it's really good.


O'Reilly's Statistics In a Nutshell is quite good as both a textbook and a reference. I'm (slowly) working through it on my own.


Get the machine learning book by Bishop.


I completely agree. Statistics is what I most wish I had studied more intensely in school. And not just so I wouldn't have those poor marks on my transcript.


Well, school may be over, but life is not :-)


I agree. I did horrible in my stats class. IT was only after I graduated that I discovered how damned useful statistics is and I now wish Id done better in class. I'm slowly trying to remedy it though, so theres still hope.


Stephen Baker's book, the Numerati, covers what data geeks are doing with stats, data mining, semantic analysis, machine learning...

Here's a link to some of his talks on the subject: http://thenumerati.net/index.cfm?catID=4


I have some background with machine learning and I'm reading this book now. It doesn't seem like way a good way to learn much about applied statistics, but it does a good job of illustrating how data mining impacts people's daily lives.

There's an "aren't you shocked that they're gathering all of this data?!" tone that gets old as the book goes on, but it's generally a good read. He even describes some of the major algorithms (support vector machines and clustering) in layman's terms.


SVM's and clustering are AWESOME. My housemate uses both to analyse EEG data and his classifiers are absolutely amazing. Looks like his work may end up being used by the european space agency too.


Or, instead of statistics, you could study it's sister field: Machine Learning.

Robert Tibsharani provides the following comparative glossary for machine learning and statistics: http://anyall.org/blog/2008/12/statistics-vs-machine-learnin...

                        Glossary

  Machine learning              Statistics
  network, graphs               model
  weights                       parameters
  learning                      fitting
  generalization                test set performance
  supervised learning           regression/classification
  unsupervised learning         density estimation, clustering
  large grant = $1,000,000      large grant= $50,000
  nice place to have a meeting: nice place to have a meeting:
  Snowbird, Utah, French Alps   Las Vegas in August


So, for those of us who have mostly forgotten math that doesn't get used regularly, what's a good way to get an overview of at least what's possible, and where it's applicable? Enough to get an idea of what to go study in further detail in order to accomplish something, or at least ask for help/hire someone.


http://www.mrderksen.com/textbooks.htm

These books have good indexes leading to statistics issues in particular fields of research or applications.


this article make statustics really cool, i want to learn it now, anyone know how hard could it be?


Just like: Plastics

Of course, should add the obligatory remark: Lies, Damn Lies and Statistics.


I was worried that nobody else here would get the reference. Whew!


Dude, that quote is totally worn-out and cliché. Next time, use this one by Aaron Levenstein:

"Statistics are like a bikini. What they reveal is suggestive, but what they conceal is vital."


Anyone who wasn't registered keep getting pushed off the site? F-that.


Just google for name of the article go to the site from this point. Site never blocks traffic from search engines :)


Gotta love the T-shirt...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: