Free Data Science Books

shubhamjain · on Sept 9, 2015

Although, there is no denying that this is a valuable resource but I have started to get turned off by a list of n books to learn something - they can be valuable but it is undeniable that they can also be overwhelming and perplex someone about how to get started. I believe technical books should be used to complement your knowledge of the field not to get started in it. For eg, "Secrets of the JavaScript Ninja" will be very valuable because I already have experience in JS and it will help me understand some of the caveats that I might have overlooked. The best way has always been to get start implement something regarding the subject and try to dive into everything you uncover.

A blog post submitted here mentioned the same sentiment [1] -

> I can’t fully explain how immensely unmotivating it is to be given a huge list of resources without any context. It’s akin to a teacher handing you a stack of textbooks and saying “read all of these”. I struggled with this approach when I was in school. If I had started learning data science this way, I never would have kept going.

[1]: https://www.dataquest.io/blog/how-to-actually-learn-data-sci...

doctorcroc · on Sept 9, 2015

Second the dataquest post. Information without structure can be overwhelming, and its important to know what the optimal ways to learn something are. Arguably this is why formal schooling was created - to provide a framework for learning...

jpmonette · on Sept 9, 2015

Thank you - this is a wonderful ressource that I had lost in my list of bookmarks about data science. That's another good example of information overload.

SZJX · on Sept 12, 2015

Sure a bunch of books is no use. But, for self learning there's nothing more systematic than following one or two well-written books through. Just trying to gain everything via "practical" knowledge without any systematic guidance is definitely dangerous.

piraze · on Sept 9, 2015

At least "Python for Data Analysis" is a pirate copy. Wonder how many others are too. But as long as you make money from affiliate links you don't care, right?

gdulli · on Sept 9, 2015

Lists of "curated" free books/resources etc. are a very active spam format these days. It's a simple and effective way of publishing without having any original content of your own. People love clicking on these things because they love the idea of learning.

LearnDataSci · on Sept 9, 2015

What makes it seem like Python for Data Analysis is a pirated copy? I figured since it was hosted from Canisius College it would be legally distributed.

I don't want to host pirated content, so if it is I will remove it.

piraze · on Sept 9, 2015

The book is not listed at http://www.oreilly.com/openbook/

Also the PDF has a link to a notorious ebook pirate platform on every page. If you really believe content on college pages is legal, you must be very naive. I've never seen a naive webmaster that uses domain privacy though.

coroxout · on Sept 9, 2015

Personally I wasn't surprised to see (possibly) pirated content on an .edu site with a ~username URL, as the ~ suggested a student's page, where unauthorised content might pop up to share with classmates and stay up undetected by the college.

What surprised me is that the owner of the Canisius page appears to be teaching staff rather than a student. The other books hosted there seem to be legitimately freely available, however, so I'm guessing that was also a naive mistake.

LearnDataSci · on Sept 9, 2015

Thanks for that link, I actually didn't know O'Reilly had such a page.

I'm not very familiar with ebook pirating platforms. So the link didn't seem suspicious to me.

Anyway, the book was removed. Thanks again for pointing it out.

ching_wow_ka · on Sept 9, 2015

If you're a beginner, you're probably going to be too overwhelmed by the options. I often find emailing/asking a few different professors/researchers/students in the field you want to learn for suggestions more productive.

That's not to say this isn't helpful. This is from my own personal experience.

larrydag · on Sept 9, 2015

Also get plugged into a local meetup/user group. They are popping up everywhere. Here are some examples of R user groups. http://blog.revolutionanalytics.com/local-r-groups.html

dbhattar · on Sept 9, 2015

I would also add http://mmds.org/ in the list. Link to the book is 'http://infolab.stanford.edu/~ullman/mmds/book.pdf.

ching_wow_ka · on Sept 9, 2015

It's there. "Mining of Massive Datasets"

yoklov · on Sept 9, 2015

Is anybody aware of good books/resources on machine learning/data science in Matlab?

My SO has been trying to learn ML to further her work for a couple months now, and has had a hard time with it. She quite intelligent, but isn't a terribly experienced programmer (she's been writing Matlab for a couple years now, but mostly in a scientific setting)... Either way, I suspect part of the problem is that most of the explanations usually are in a language unfamiliar to her, and expect her to learn or translate it in addition to the concepts.

maurits · on Sept 9, 2015

Andrew Ng, the man behind the excellent ML course on coursera, has an introduction to Deep Learning using Matlab.

[1]: Wiki with code, exercises and explanation

[2]: Video lecture one with a recap on back-propagation

[3]: Video lecture two on Sparse Auto Encoders

[4]: Handouts

In terms of books, Bayesian Reasoning and Machine Learning [5] is Matlab based. So is the Handbook of Monte Carlo methods [6].

[1]: http://ufldl.stanford.edu/wiki/index.php/UFLDL_Tutorial

[2]: http://www.stanford.edu/class/cs294a/video1.html

[3]: http://www.stanford.edu/class/cs294a/video2.html

[4]: http://www.stanford.edu/class/cs294a/handouts.html

[5]: http://web4.cs.ucl.ac.uk/staff/D.Barber/pmwiki/pmwiki.php?n=...

[6]: http://www.maths.uq.edu.au/~kroese/montecarlohandbook/

pigs · on Sept 9, 2015

This course is great: https://www.coursera.org/learn/machine-learning It's all done in GNU Octave, which is mostly compatible with MATLAB.

anacleto · on Sept 9, 2015

I would recommend this fantastic in-depth intro to the principles and practice of Amazon Machine Learning:

https://cloudacademy.com/amazon-web-services/courses/amazon-...

(Hand-crafted by data and code guru James Counts)

fitzwatermellow · on Sept 9, 2015

I noticed something last night while watching the Djokovic US Open quarter-final. It featured an "IBM Insights" segment which claimed to have mined 8 years worth of Majors competitions to generate stats. And one interesting result it was able to produce went something like this: if Djokovic is able to return only 25% of his opponents serves, then in 85% of past matches it has resulted in victory for him. The implication being that such is the strength of his defensive game.

While this is no doubt really interesting, I find I am getting diminishing returns from outputting stats like this from big dumps of past historical data. What I would like to be able to show is a live heat graph style stats tracker, where each point in the match updates my belief net about who is winning, or playing better. Of course, the final outcome may be upended by some fluke occurrence such as a Hail Mary pass in the final seconds which is what makes sports interesting, but nonetheless I think a live tracker would say a lot more than the actual score of the match.

So, I am wondering if anyone has specific resources for real time online data mining? At web scale for high throughput data streams. And I agree with shubmajain above, libraries and repos are preferable to books and academic journals ;)

kilbuz · on Sept 9, 2015

The insight was related to winning first-service points when returning serve. This tweet has a screenshot of the association: https://twitter.com/lapsu/status/620223838895407104

This isn't too far from the logic: "How can we win this game? Score more points than the other team". I suppose the more interesting thing would be to compare the same correlation across players.

Expeditus419 · on Sept 9, 2015

I agree that the stats don't provide insight regarding game play and strategy. IBM has been providing the same weak stats for years now. I would like to see tennis incorporate the hawk-eye system tracking player movement and shot placement as well. Perhaps that could produce a heat map. On that note they can also eliminate the line judges while we're at it. The whole challenge system is idiotic. They have the tech, they should incorporate it throughout the sport.

msellout · on Sept 9, 2015

I don't understand that IBM Insights note about Djokovic. Can you explain more?

hguant · on Sept 9, 2015

Without doing the math - Djokovic is such a strong player that even if he's only returning a quarter of your serves, meaning you're 3/4's of the way to winning your set (I don't tennis, sorry if I'm getting the terms wrong), he's still probably going to beat you.

elechi · on Sept 10, 2015

Well, that's a close explanation, except I think you're confusing set and match. For men's tennis, it takes 3 sets to win the match, with the potential of playing 5 sets.

I'm actually not sure that the math is true, though. (Or I really don't understand what the stat is saying.) Let's say that it actually is for every 4 serves, you win 3, Djokovic wins 1. That number gives you every game (winning the game game-point-15), to give you every set. I don't see how Djokovic ever wins a game, let alone the set or match.

msellout · on Sept 10, 2015

It's hard to take any action based on that fact without further information. Even a gambler couldn't use that tidbit without conditioning on things like the current score. Or am I missing something?

anacleto · on Sept 9, 2015

Great resources.

I would add these great ebooks on Cloud Computing and AWS Certifications:

The Cloud Computing Job Market

With this eBook you will learn how Cloud Computing is changing the IT industry and creating a complete set of new roles for companies and businesses worldwide. Information and data to start your cloud computing career.

Link [0] https://cloudacademy.com/ebooks/cloud-computing-job-market-3...

A Guide to AWS Certification Exams

Introduction to the full range of Amazon Web Services certification exams: learn what, why, and how to pass just the right exam for you.

Link [1] https://cloudacademy.com/ebooks/guide-aws-certification-exam...

AWS Solutions Architect Certification

Study guide to Amazon Web Service's Solutions Architect certification exam: tips and suggestions on how, what, and where to learn.

Link [2] https://cloudacademy.com/ebooks/aws-solutions-architect-cert...

noobermin · on Sept 9, 2015

Honest question: is ML/DS something you can just pick up and be hired[0]? May be I'm ignorant, but I'd think employers would look for a degree in some related field to actually consider you for a position doing it.

[0] As in how you can pick up web hacking, do a few websites and create a reputation and get hired that way without a formal degree.

jb1991 · on Sept 9, 2015

There was a thread on here a month or two ago about this. In general, it was noted that it's best (for both employment as well as just getting stuff done) to have a deep understanding of a particular area of ML rather than a general understanding of many areas. Usually those with a deep understanding have focused on it in school. But the latter group of generalists is a much larger group in the software industry, since most of us did not go to school for this specifically.

vikp · on Sept 9, 2015

I went from being a US diplomat with no coding background to getting a job at edX as a machine learning engineer, so it's very possible. The keys are to find projects and build a portfolio so that you can prove your capabilities, and to start a blog/go to meetups so that you can build an audience and find opportunities.

DrNuke · on Sept 9, 2015

Market seems to want a lot of them, different profiles and CVs for different domains and responsibilities: data wranglers, data analysts, statisticians, machine learning, business analysts, communicators, infrastructure operators, big data architects. The best shot is coupling your academic / self-matured strength with a domain you really like and start building your own portfolio from real-world case studies in the field you choose.

brational · on Sept 9, 2015

I think you kind of posed a question and a partial answer. If degree in related field (math, statistics) then yes you can pick these things up. If CS or no degree it will be much harder to pass resume filters.

geoff-codes · on Sept 9, 2015

Suggest putting this in a repo somewhere, in the vein of:

https://github.com/vhf/free-programming-books/blob/master/fr...

https://github.com/ligurio/free-software-testing-books/blob/...

etc.

viewer5 · on Sept 9, 2015

Any specific recommendations from anyone?

weavie · on Sept 9, 2015

The Elements of Statistical Learning together with the online course (http://www.r-bloggers.com/in-depth-introduction-to-machine-l...) makes for a great introduction.

EDIT: Oops I should have said "An Introduction to Statistical Learning with Applications in R" rather than The Elements of Statistical Learning. The Elements book goes into way too much depth to be a good introduction to the subject.

snoman · on Sept 9, 2015

Similarly, An Introduction to Statistical Learning With Applications in R is like a practical version of (or companion to) Elements. I very much enjoyed it.

craigching · on Sept 9, 2015

And the Stanford version of the same class linked above for ISLR is, in my opinion, better:

https://lagunita.stanford.edu/courses/HumanitiesandScience/S...

weavie · on Sept 9, 2015

Yes good point, my bad - I meant to link to "An Introduction" rather than "Elements". Elements is not a good starting point - your head will explode.

ching_wow_ka · on Sept 9, 2015

Depends on what you want to learn.

alceufc · on Sept 9, 2015

"Mining of Massive Datasets" by Leskovec, Rajaraman and Ullman is very good.

Although the post gives a link to the Amazon page of the book, PDFs of the chapters are free to download at the official book web site[1].

[1] http://www.mmds.org/

LordKano · on Sept 9, 2015

I really like this kind of stuff.

It's my opinion that our educational process is a bit too heavy on algorithms and languages while being a bit too light on data structures.

I like to brush up on this subject matter from time to time just to keep myself sharp.

DarkTree · on Sept 9, 2015

Anyone recommend any of the R books listed or know of any great R books for purchase?

larrydag · on Sept 9, 2015

My favorite intro to R book is The Art of R Programming by Norman Matloff http://www.amazon.com/The-Art-Programming-Statistical-Softwa...

craigching · on Sept 9, 2015

This is a new book by Danny Kaplan that I was able to provide some feedback on prior to publishing:

http://data-computing.org/

I really enjoyed the book, it took a modern approach to R using many of the newer packages (dplyr for instance) and ggplot and combined them into a very nice introduction to R with labs, etc. Well worth checking out.

geoff-codes · on Sept 9, 2015

https://github.com/vhf/free-programming-books/blob/master/fr...

blumkvist · on Sept 9, 2015

Discovering Statistics Using R by Andy Field. AINEC.

alador · on Sept 9, 2015

Nice books collection. Thanks :)

LearnDataSci · on Sept 9, 2015

crazypyro · on Sept 9, 2015

Why are you hijacking my scroll speed...

Your "smooth-scroll" library is completely breaking my touchpad scroll with an Acer c720 Chromebook. One slight movement (which should be a few pixels scroll) is moving me over half-way down the screen. Makes your site unusable with this touchpad as accidental scrolling sometimes happens and moves the screen a whole page away, especially when trying to right click open links because the gestures are similar.

LearnDataSci · on Sept 9, 2015

Sorry to all affected by the smooth scroll. It's been removed.

LearnDataSci · on Sept 9, 2015

Hmm. Interesting. I just implemented the smooth scroll yesterday so I will definitely check that out. Thanks for the input.

DeusExMachina · on Sept 9, 2015

Smooth scrolling is already implemented correctly in the browser. Your implementation is just a hack that hijacks the normal behaviour a user is accustomed to and just gives back a version that just feels wrong to interact with, even without performance issues.

iDemonix · on Sept 9, 2015

Do the entire internet a favour and un-implement it.

yoklov · on Sept 9, 2015

It isn't broken on touchpad for me.

That said, if I'm being honest, it's fairly unpleasant to use on a desktop with a mouse. It scrolls you to the top after it loads (which is after the rest of the page), and behaves differently than the computer normally does...

I would recommend doing away with it.

LearnDataSci · on Sept 9, 2015

I made a change to the code, but since I don't have a touchpad, I won't be able to tell if it's fixed. Let me know what happens if you happen to go back to the page.

manigandham · on Sept 9, 2015

Please don't use this. There's nothing wrong with browsers and how they scroll. We all know how to use it and it works well everywhere.

You're loading more code just to mess with something that already works without any new benefit (and actually degrading the experience).

crazypyro · on Sept 9, 2015

It's still not working well on my touchpad. It stutters badly. I honestly would recommend removing it. I checked it on my desktop. It works there, but the difference scroll speed is unhelpful and actually a little bothersome.

nanny · on Sept 9, 2015

Just remove it entirely already, it's nearly unusable on my desktop now, and it was bad before.