Hacker News new | past | comments | ask | show | jobs | submit login
Stanford CS Book: Mining of Massive Datasets [pdf] (stanford.edu)
227 points by yarapavan on Dec 8, 2010 | hide | past | favorite | 17 comments



Pretty cool. I work in search quality at Google, and this is a pretty decent overview of the more universal tricks I've picked up from people on the job, as well as a lot of things I didn't know. MinHashing in particular is one of my favorites.

Also, if you like this, I'm trying to collect resources of this quality in this subreddit: http://www.reddit.com/r/learnit


Thanks for sharing! This is a great set of resources.


Great resource! Thanks for curating the learnit subreddit!


Great shit on that page.


If you guys mostly have CS background you should definitely check this out : http://www-stat.stanford.edu/~tibs/ElemStatLearn/

It's Data Mining from a statistician point of view. You can download the entire book for free on the website and the graphs and computations are done with R.

This book is also one of Hal Varian's favourites: http://www.dataspora.com/blog/sexy-data-geeks/


I've been thinking lately of (finally) pursuing graduate studies, and data mining is an area that I find drawn to. Obviously Stanford is doing some significant research in this area, but I've been out of academia for 4 years and I somehow doubt I'd be a competitive applicant. Does anybody have personal experience with other universities/programs that are doing extensive research that they'd like to share? It'd be greatly appreciated!


I was out of school for 3 years before going back to pursue a PhD. It can be done, but you're going to have to put in a lot of work the first year getting back into the swing of academia, and even more if you're going back to specialize in something you don't already do.


have no advice for you but i'd be very curious to hear what you end up doing. I'm 30 and have 0 idea how to get into a phd program. Drop me a line sometime.


Anyone have a epub or Mobi format? I just got a kindle and the converter for it ruins code samples if you take it from PDF.


I used to face the same issue with my Kindle and bought an iPad, because of how frustrating I found this, and am quite happy with the Kindle app on it.

Very smart of Amazon to hedge themselves on the hardware so cleanly.

Sharing my story not to diss the device, but in case anyone else is having the same predicament and considering jumping.


Whenever I asked around on academic circles about this, the ipad was the consensus. The kindle DX does an ok job according to the people I talked to, but the slow page turning drove people crazy when trying to understand a complicated paper referencing previous math/diagrams.

I really wish the academic world wasn't standardized around formats that only work for print. There are some LaTeX->html converters out there, that could presumably be used to make epub, but I have no idea how well they work.


Your peers are generally correct. I used a DX for technical papers and PDFs after going through the conversion hell you are stuck in (albeit with a Sony Reader) and then jumped to an iPad when it came out. The best advice I can offer is to hop over to mobileread.com and check out the forums to see what the current state of the art is. Back when I was doing two or three conversions a day the tools to use were Rastafarian and PDF2LRF but I am guessing there are better options available now. You may also want check out Calibre, which is an ebook library management app but one with a lot of built-in conversion routines.


Holy cow, could a project have picked a worse name than "Rastafarian"? If one searches for "Rastafarian", you can guess what turns up. But even "Rastafarian pdf conversion" seems to drop the "pdf" and returns lots of results for converting to the Rastafarian religion. Same for "convert".

I guess I'll have to go trolling through the mobileread.com forums, but just damn.


I've had a similar experience. The Kindle gets the job done, but I find myself using my iPad (or even iPhone) to read Kindle content most of the time now.


Chapter 2 of this book is awesome. Covers specific implementation and design strategies for map reduce matrix multiplication and table joins.


Thanks, great idea for the reddit group.


An amazing resource. Brilliant... Stanford lead the world in open education (MIT are great too)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: