Introduction to Statistical Learning

dnquark · on Aug 4, 2020

Hastie and Tibshirani teach a free course based on this book on Stanford's OpenEdX (https://online.stanford.edu/courses/sohs-ystatslearning-stat...). I highly recommend taking this course or reading the book before delving into ESL. IMO, ESL is excellent as a reference, but trying to learn by reading it linearly is not an optimal time investment.

Now if only a similar course existed for Wasserman's "All of Statistics..."

dewy · on Aug 4, 2020

There's a Youtube playlist[1] of recorded lecture videos by Wasserman from his CMU course that uses All of Statistics as a textbook.

I haven't watched more than a couple of mins of them (yet), so no idea how good they are (but the blackboard is quite hard to see in the recordings). However, it obviously doesn't have all the extra stuff that you would get in a proper MOOC.

[1] https://www.youtube.com/playlist?list=PLJPW8OTey_OZk6K_9QLpg...

stevegalla · on Aug 4, 2020

If I’m not mistaken, Wasserman’s lectures are on YouTube under “Intermediate Statistics Larry Wasserman (CMU-36-705)“. You can find course notes and assignments for 36-700 and 36-705 on the web, which seem to use All of Statistics as the course textbook.

CMU has posted a lot of great statistics material beyond those two courses.

ryankupyn · on Aug 4, 2020

I'm a big fan of ISL - one of the best intro machine-learning oriented textbooks out there IMO. If you're looking for book that still offers a broad survey while going a bit deeper into the math, I recommend Elements of Statistical Learning as well (they share 2 authors):

https://web.stanford.edu/~hastie/Papers/ESLII.pdf

_fullpint · on Aug 4, 2020

ESL is really really good. Absolutely cannot be recommended enough. I’ve worked through at few times and it’s a great book to have on the shelf.

ISl is good but as it being an introduction book — the lack of higher order math wasn’t my favorite.

abhgh · on Aug 5, 2020

Recommendation for ESL seconded. One of the best ML books in terms of writing, development of intuition, breadth of topics (ofc doesn't cover everything, esp deep learning).

What I love about the book is how the topics are "connected" so to speak. The narrative within a theme is typically "let's look at problem P, here's technique Q to solve P, but if you thought about P slightly differently you would see something like technique R would also work, so let's talk about that now".

The level of math might be tough for a beginner though.

madiyar · on Aug 5, 2020

I can also recommend https://github.com/maitbayev/the-elements-of-statistical-lea... as a supplementary resource that reproduces graphics and implements algorithms from scratch from the book.

disposedtrolley · on Aug 4, 2020

The authors also published a set of very good lectures covering content in the book: https://www.r-bloggers.com/in-depth-introduction-to-machine-...

johannes_ne · on Aug 4, 2020

I'm currently working through this book. Highly recommend, even if you have no intention on learning R. The R part is very limited, you will not learn R programming, but if you already know R, it is very useful to end each chapter with a practical demonstration of the theory.

lukeplato · on Aug 5, 2020

I am trying to decide between going through a statistical learning vs. a deep learning textbook/course. Any thoughts on what would be more rewarding for someone with no immediate plans to work in ML nor do graduate level research. Thank you.

_wk3u · on Aug 5, 2020

This is one of my all time favorite technical books. I wrote a review of sorts a few years back[0]. It doesn’t cover any deep learning topics, which perhaps dates it at this point, but it gives solid fundamentals on a breadth of techniques common in industry. This is always in my recommendation list for folks making the transition from more systems or product engineering to ML.

[0] https://www.linkedin.com/pulse/introduction-statistical-lear...

lukeplato · on Aug 5, 2020

Noticed your comment after posting mine: https://news.ycombinator.com/item?id=24056852

I was wondering if in retrospect you would have preferred reading the Goodfellow deep learning book vs. this?

disgruntledphd2 · on Aug 5, 2020

I've read both, and I would strongly recommend ISL first as it's much broader and covers the basics much much better than the Goodfellow book.

_wk3u · on Aug 5, 2020

Agreed. I’ve read both and would also recommend this first. The good fellow book goes deep fast. This was my review at the time https://www.goodreads.com/review/show/2196621333

That said, if your choice is more general, statistical learning vs deep learning, I’m sure at this point you can find more approachable deep learning primers. This book just isn’t it IMHO.

saeranv · on Aug 4, 2020

I've been working my way through this book, and it's fantastic. I love the way this book grounds all the discussion of statistical learning with a practical data analysis problem.

geff82 · on Aug 5, 2020

Is there a book of similar quality that uses Python?

pks016 · on Aug 5, 2020

Great book. My go-to stats book. It was one of my first intro. book for statistics with R in undergrad days.

larrydag · on Aug 4, 2020

Great resource if one is wanting to learn R.

saeranv · on Aug 5, 2020

I know R, but don't enjoy programming in it (although I love the documentation R has for its various libraries). Luckily there's various attempts to translate all the ISLR exercises into Python on github[1][2] that I have found immensely useful in understanding implementation.

[1] https://github.com/JWarmenhoven/ISLR-python

[2] https://github.com/tdpetrou/Machine-Learning-Books-With-Pyth...

larrydag · on Aug 6, 2020

Interesting. I like Python but don't enjoy programming in it. I find programming with R much more to my liking. I do data analysis and modeling mostly.

noch · on Aug 4, 2020

> [I]f one is wanting to learn R

A useful practice, in my experience, is to implement R code samples in some other language, like C/D. Implementing lower level math functions for oneself can also be fun instead of relying on a library, depending on one's ultimate goals/interests.

adenadel · on Aug 4, 2020

I think that this is a great exercise to really learn the implementation details of the methods, but the point of ISLR is more to get scientists (and the like) up and running with the usage of these methods.

bvrstvr · on Aug 4, 2020

I'm a data scientist without a formal background in programming. Can someone please explain why implementing math functions in C/D is different than doing it in R?

For example, I would assume that creating a mean function using numbers and operators would be language-agnostic.

fizixer · on Aug 4, 2020

- R: the mean function is already created

- C: You will create the mean function

It's the difference between using a calculator and knowing how to multiply two 5 digit numbers with pen and paper.

You might say, "Oh but we can do that in R". Well, if you start doing that one function at a time, very soon you find yourself with R grinding to a halt.

disgruntledphd2 · on Aug 5, 2020

Go look at the R source for mean sometime (you want the base function). It's not as simple as one might imagine.

cinntaile · on Aug 4, 2020

If you want to you can recreate the mean function in R as well, so why would you choose to do it in another language if your goal is to learn R?

dunefox · on Aug 5, 2020

No good reason for C/D/... other than the wanting to learn the particular language. Otherwise, use R or Julia to stay in the scientific domain.

iamcreasy · on Aug 5, 2020

For optimal performance you need to control few things simultaneous, such as memory allocation/access. R do now allow you to access memory but C does.

dunefox · on Aug 5, 2020

I think Julia would be a better choice, especially since you can directly use R & Python from it. This way you can implement the methods in a scientific language yourself and still use the original R methods.