I did something pretty similar over christmas, though I used named entity recognition to extract book titles rather than looking for amazon links, and (so far) also limited it to specific "Ask HN" threads about books. You can find it here: http://www.hnreads.com/. It is interesting to see how little overlap there is between the two, though that may be due to my using far fewer (and also newer) threads!
Surprised to see Permutation City in that list. Given that the book is written in 1994, Gregg displays admirable prescience about how computing would develop. Honestly you would think it was written in the last 5 years or so. His vision of cloud computing is absolutely outstanding. It blew me away when I checked when the book was written after reading the first few chapters.
I'd read Schild's Ladder prior to reading Permutation city, which is also a good read. It does seem to get bogged down in the technical and descriptive side of things at times, however, it's a fantastic idea for a story. The main premise of the film would make a great movie.
Whilst I'm on the subject of good "Hard sci-fi" novels, Tau Zero is also worth reading.
"Since the Introdus in the twenty-first century, humanity has reconfigured itself drastically. Most chose immortality, joining the polises to become conscious software. Others opted for gleisners: disposable, renewable robotic bodies that remain in contact with the physical world of force and friction. Many of these have left the solar system forever in fusion-drive starships.
And there are the holdouts: the fleshers left behind in the muck and jungle of Earth—some devolved into dream apes, others cavorting in the seas or the air—while the statics and bridgers try to shape out a roughly human destiny."
Egan's books have been some of the most thought-provoking I've ever read as far as science fiction technology. A lot of the works were out of print until recently; I'm glad to see there's been a resurgence of interest in his writing, and the availability of his works.
Opened 7 new tabs from the interesting books to me before I realized this didn't work too. It seems the URLs are not quite "pages" -- opening one and refreshing it redirects home as well.
If it's not too much bother, any chance you'd be willing to put this on a git repository, like GitHub or Gitlab? It would be cool to be able to contribute to: for instance, being able to filter the list to fiction and non-fiction would be pretty great :-)
I might be a bit embarrassed with the code quality - this was one of my first attempts at front end development. I was planning to expand the features of the site a bit, including genre filters, so perhaps I could clean it up a bit and release it when I've done that.
Oh, I wouldn't worry about that :-) if you are concerned, just say it's rough and ready in the readme. I doubt anyone will be critical, and if they are then tell them to piss off!
The beauty of open code is others can help improve it
One thing that struct me about your site (apart from being a good list, well done!) is how blazingly fast (close to HN, which I find funny) the page loads. Could you fill us mere mortals in on how the fuck you got it so fast?
That's not really much to do with me - the credit there lies with fastly! This was the first time I've used a CDN, and I had a great experience using them. Highly recommended.
This is really cool. I started reading HPMoR after finding it.
I'm curious: after performing NET on the corpus, how did you filter to find books only? Did you just search on Amazon's catalogue for exact matches, or was more tweaking required?
Thanks. How did you deal with, e.g. 'Steve Jobs'? I guess a generic NER tool would extract that as a single entity. Then you'd search Amazon and find an exact title match (the biography by Isaacson). But you probably don't want this in your list, as the references were to Steve Jobs the person, not Steve Jobs the book.
So did you use some manual review step to remove these?
Well, the nature of NER is that it can extract 'Steve Jobs' as either a PERSON or a BOOK entity (in my case). You can in principal extract whatever kind of entity you want, provided you have appropriate training data. The idea (loosely) is that it learns the structure of text around, say, a book. You can then label a piece of text as a book, despite not previously knowing that title. It doesn't solve the problem of two books having identical titles, however :).
(Note: I'm not an expert in NER/NLP, so please correct me if I have got something wrong!)
Aha, I see. Thanks. I thought you were using some generic pre-trained NER, but I understand now that you used either one which was pre-trained to find books, or you trained your own.
On the system browser on my phone (a Kyocera Rise running Android 4.0.4) all I see on that page is the header and footer, no content. :-( I get the same result if I hit it with Firefox with Javascript disabled, but Javascript is very much enabled on my phone browser.
I actually wrote it myself, based on conditional random fields! I would not recommend doing this - it was mostly an exercise to learn how they work. I should probably have mentioned that I manually cleaned up the results a bit. I understand nltk has NER capabilities and python bindings, though.
Thanks for sharing! This list is a list I actually expected the OP's list to be. Probably because I'm also more likely to view the Ask HN threads about books.
At 4 GB, I'd just as soon query this locally, but this looks like a fun exercise.
I notice that there were 10,729 distinct ASINs out of 15,583 Amazon links in 8,399,417 comments. Since I don't generally (ever?) post Amazon links, I'd be interested in expanding on this in two ways.
First, I'd reduce/eliminate the weight of repeated links to the same book by the same commenter.
Second, I'd search for references to the linked books that aren't Amazon links. Someone links to Code Complete? Add it to the list. In a second pass, increment its count every time you see "Code Complete," whether it's in a link or not.
Discounting multiple links by the same user is a good idea. Your seconds suggestion brings some rather complex problems, for example if a comment goes like "Code Complete is the worst book I ever read" it is certainly not an endorsement, while linking to a book in most cases is. Also a sentence like "programming perl is fun" does not necessarily refer to the book.
So this would require some form of sentiment analysis and also require book titles to be uniquely identifiable.
I know we traditionally process tokens as case-insensitive, but... it seems reasonable to assume in HN comments that book titles would be capitalized properly (so we could ignore non-capitalized titles). Whether or not this information is present in the version on BigQuery, I'm not sure though.
> At 4 GB, I'd just as soon query this locally, but this looks like a fun exercise.
This requires scraping all the Hacker News data manually, for which I have a tool to do so (https://github.com/minimaxir/get-all-hacker-news-submissions...) which I mentioned in the post you linked, but it still requires a significant amount of time to get/process the data, hence why the BigQuery dataset has a significant advantage.
The absence of SICP, I imagine, is because when people refer to the SICP, they usually just link to the open link to the book: https://mitpress.mit.edu/sicp/ .
Yes, that is probably the case. Quoting from the post
"Amazon is often the goto website for referring books, but many books have dedicated homepages as well as pages pages on their publisher's website. Moreover, many freely available are referred frequently in comments, but are not considered in this ranking."
The approach used here has limitations, I hoped to make that clear by pointing them out and choosing titles and headlines accordingly.
Having owned and read through "Introduction to Algorithms" for years I agree that it is a good book. However, recently I have been feeling like it is recommended way too often without thought.
It is not the best when it comes to explaining things in an intuitive manner. It is a great reference book with lots of algorithms and proofs.
In recent years I have been drawn more towards Levitin's "Introduction to the Design and Analysis of Algorithms".
Anyone else have similar feelings about "Introduction to Algorithms"?
I second you. I have found Steven Skiena's "The Algorithm Design Manual"[1] to be a great book in this regard. Of course, like you say, CLRS remains an excellent reference.
I think Skiena and CLRS are complementary, each compensating well for the limitations of the other. If you were going to have exactly two algorithms books, I'm not sure I can think of a better pair.
It seems like a lot of academics and hardcore computer science enthusiasts love CLRS. But I remember taking an algorithms course just a few months ago and that book was almost unreadable. A lot of my classmates shared the same sentiment. The math was dense and the explanations were unclear. Juggling several other CS classes that quarter, nobody had the time to work through it. Maybe I wasn't smart or patient enough to understand CLRS, but it just never made much sense to me, especially on more advanced topics of algorithm design.
I loved The Algorithm Design Manual. Used it for my class as well as preparing for technical interviews. Although looking back at it now, TADM is great for learning algorithms (quickly) for the very first time, while CLRS is a good reference manual once you already understand how they work.
Yes, I also use CLR(S) as a reference, when I need to quickly look up an algorithm. But it doesn't really explain how to come up with algorithms - for that, my favorite is Udi Manber's "Introduction to Algorithms: A Creative Approach". It's ridiculously pricey though.
I came here to recommend Manber's book also. It emphasizes developing algorithms and their correctness proofs together -- a great technique. The design and the proof naturally inform each other. This approach ought to be better known.
I use CLRS more as a reference than anything else. If I need to quickly review some data structures or algorithms for an interview, I'll look up the relevant chapters in it.
If I really want to understand something like max flow-min cut, I'll turn to "Algorithm Design" by Kleinberg and Tardos, which often has a much more intuitive explanation for concepts(imo).
I feel that way about "Don't Make Me Think". It was a good summary of design principles, but even as a beginner I didn't feel as if I got much new information out of it. It was pretty lite. It's overrecommended, I think.
How come "Darwin's Theorem" appears so often? It's quite unknown, with one review on Goodreads and 4 reviews on Amazon
Is this a result of the author spamming his own work?
Edit: Looks like it, short skimming of "darwin's theorem site:news.ycombinator.com" shows that all links are from user tjradcliffe, who is the author. A case for manual curation of data.
Out of 8 million data points the top book got around 50 references. I wonder how much significance should be attached to that, it looks to me to be down in the noise level.
I find it shocking that out of eight million comments, the top book is only mentioned ~50 times, but the parent comment illustrates why -- many people mention title/author pairs without linking to the book itself.
Code: The Hidden Language of Computer Hardware and Software by Charles Petzold
(http://amzn.com/B00JDMPOK2)
The replies to this comment are all endorsements as well and as such would not get picked up by this web-scraping. Great book though. Amazon link intentionally omitted
Related: There are a ton of sites set up like this. Hopefully somebody will post a list. Lotta work by HN folks on various ways of slicing and dicing the data.
I wrote this curated site from HN several years ago. Got tired of people continuously asking for book recommendations. http://www.hn-books.com/
Couple points of note. This is 1) an example of a static site, 2) terrible UI, 3) contains live searches to comments on each book from all the major hacking sites, and 4) able to record a list of books that you can then share as a link, like so (which was my reason for making the site)
Also Related: Thanks for setting up that curated list. I'll definitely be going through it to get recommendations.
I got tired of seeing such recommendations being asked repeatedly and then disappear on HN so I created this:
He made a snarky comment about Andrew Bartbeit's death, so conservatives gave him a slew of one star reviews. If you filter by "verified purchases", it's not as polarized.
I've considered building the same myself. It would be lovely if you tracked the various HN reader client apps. A few that come to mind are: Hacker News Enhancement Suite for Chrome [1], Hacker Menu for OS X [2], and Premii's HN web app [3].
I was thinking the same thing, there was no way that SICP wasn't in that list, because its probably one of the best programming books written. And no matter who I recommend the book to, they say the same thing.
The "I don't want to pay for content and all ads are evil" mindset is not really sustainable. For me affiliate links are the least obtrusive form of advertising on the web I'm aware of. Thanks for your comment and appreciation of the post.
Hard to read tone in text-only messages, isn't it?
I'd actually like an answer. I have no problem with the affiliate links at all (though the listicle-like gallery presentation is awkward on mobile), and would legitimately like to know if there are any real-world examples of affiliate links making money when targeting a specific community. I had affiliate links all over (organically, not in ad form) when I ran a popular SharePoint community, and I think I made ~$300 total in the year I had the links active. HN "feels" like a larger community than my old SharePoint community was, so the answer to My GP's query is interesting from an academic sense.
Brain Pickings [1] is estimated to make a significant amount of money from affiliate links. The exact number is disputed, but it's not chump change [2].
How do you imply that from their post? You could've asked why they want to know this, instead of just assume it's because they don't like it. Negativity is not good for anything.
1) sure, there was effort put into this and it likely wouldn't have been made w/out affiliate links, so clearly there's value provided.
2) it's pretty clear that a list like this is going to underrepresent books where an amazon link is less likely --- superior free alternatives to the books listed in some cases. Using affiliate links (and, as far as I can tell, only undisclosed affiliate links) suggests that getting an accurate "top 30" wasn't the highest priority.
So the affiliate links are a signal of intent, and intent affects accuracy in this case. But the list wouldn't have been written otherwise.
"You've figured out a way to post native content advertising to Hacker News" would be the equally uncharitable counterpoint to your mock quote.
Yes, definitely. But that was a choice. There are tons of smart reasons to choose to generate the list from Amazon links, and one of them appears to have been, "because we know that they can be tagged with affiliate links."
As mentioned in the post Amazon is often the goto website for linking books, I've never seen a link to Barnes and Noble on HN, Reddit or the like. I took care to make clear that this is a limited sample and didn't claim generalizability anywhere in the post.
Tagging the links was a conscious decision, but I do think that the approach of looking for Amazon links provides more insight than any other domain/online store.
Probably. Labeling the links could make things less ambiguous, though. Either, "This website is supported by affiliate links" under the links to amazon. Or "I did this for fun but won't be able to host it w/out support. Please consider using this affiliate link to help out" and then providing both.
It's not super clear from the website whether this is something you did primarily to learn about data analysis, learn about HN's book recommendations, or to make money. Any of those is fine! But the motivation is likely to influence the outcome (even unintentionally) and it's something that could be made more clear.
I see the case for having a disclosure section that informs about affiliate links and don't mind adding that. But quite frankly, everyone who knows about affiliate programs can see that the links to Amazon contain that tag. No redirects or whatever fishy techniques I've seen to hide that.
Oh, I missed that the posted article itself included the affiliate tags.
In this case, yes, the content is good and the intent was not to drive visits to Amazon. (in contrast with a hypothetical post like "Top 10 Books You Should Buy To Be Smart According To The Nerds at Hacker News.") Sites/Medium posts have been trending toward the latter model of revenue generation in the face of declining ad revenue.
I remember Jeff Atwood's 4k monitor review post [1]. Someone had calculated that he made thousands of dollars pimping that thing.
I have no issue with people doing this, as long as their posts are not solely motivated by wanting an excuse to post their affiliate link. I guess the more popular you get, the more likely that is to happen.
Its funny I never read Atwood's post but I bought the same monitor mainly based on the features, amazon prime/reviews, and price... and I have to agree with Jeff.. its the best work/computer investment I have made in a long time. My other recommendation would be the Cambridge Audio DAC Magic Plus but its rather pricey.
I even contemplated blogging about the above products with no interest in making money because I like the products that much. I have feeling Jeff was thinking along those lines as well and not to make money.
Hey, people misinterpreted my post (can't blame them, it was very sparse). I am genuinely interested in the kind of money this can generate and would highly appreciate regular follow-ups if you don't mind sharing.
I believe people would just write the name of the really popular books like TAOCP, Hackers, Founders at work etc rather than linking to them.
The list:
"The Rent Is Too Damn High: What To Do About It, And Why It Matters More Than You Think" by Matthew Yglesias
Publisher: Simon & Schuster
"The Four Steps to the Epiphany: Successful Strategies for Products that Win" by Steven Gary Blank
Publisher: Cafepress.com
"Introduction to Algorithms, 3rd Edition" by Thomas H. Cormen
Publisher: The MIT Press
"Influence: The Psychology of Persuasion, Revised Edition" by Robert B. Cialdini
Publisher: Harper Business
"Peopleware: Productive Projects and Teams (Second Edition)" by Visit Amazon's Tom DeMarco Page
Publisher: Dorset House Publishing Company, Incorporated
"Code: The Hidden Language of Computer Hardware and Software" by Charles Petzold
Publisher: Microsoft Press
"Working Effectively with Legacy Code" by Michael Feathers
Publisher: Prentice Hall
"Three Felonies A Day: How the Feds Target the Innocent" by Harvey Silverglate
Publisher: Encounter Books
"JavaScript: The Good Parts" by Douglas Crockford
Publisher: O'Reilly Media
"The Little Schemer - 4th Edition" by Daniel P. Friedman
Publisher: The MIT Press
"The E-Myth Revisited: Why Most Small Businesses Don't Work and What to Do About It" by Michael E. Gerber
Publisher: HarperCollins
"Feeling Good: The New Mood Therapy" by David D. Burns
Publisher: Harper
"Programming Collective Intelligence: Building Smart Web 2.0 Applications" by Toby Segaran
Publisher: O'Reilly Media
"The Non-Designer's Design Book (3rd Edition)" by Robin Williams
Publisher: Peachpit Press
"The C Programming Language" by Brian W. Kernighan
Publisher: Prentice Hall
"The Design of Everyday Things" by Donald A. Norman
Publisher: Basic Books
"Cracking the Coding Interview: 150 Programming Questions and Solutions" by Gayle Laakmann McDowell
Publisher: CareerCup
"What Intelligence Tests Miss: The Psychology of Rational Thought" by Keith E. Stanovich
Publisher: Yale University Press
"On Writing Well, 30th Anniversary Edition: The Classic Guide to Writing Nonfiction" by William Zinsser
Publisher: Harper Perennial
"Darwin's Theorem" by TJ Radcliffe
Publisher: Siduri Press
"Knowing and Teaching Elementary Mathematics: Teachers' Understanding of Fundamental Mathematics in China and the United States (Studies in Mathematical Thinking and Learning Series)" by Liping Ma
Publisher: Routledge
"Don't Make Me Think: A Common Sense Approach to Web Usability, 2nd Edition" by Steve Krug
Publisher: New Riders
"Expert C Programming: Deep C Secrets" by Peter van der Linden
Publisher: Prentice Hall
"Clean Code: A Handbook of Agile Software Craftsmanship" by Robert C. Martin
Publisher: Prentice Hall
"The Elements of Computing Systems: Building a Modern Computer from First Principles" by Noam Nisan
Publisher: The MIT Press
"Code Complete: A Practical Handbook of Software Construction, Second Edition" by Steve McConnell
Publisher: Microsoft Press
"The Box: How the Shipping Container Made the World Smaller and the World Economy Bigger" by Marc Levinson
Publisher: Princeton University Press
"Software Estimation: Demystifying the Black Art (Developer Best Practices)" by Steve McConnell
Publisher: Microsoft Press
"Refactoring: Improving the Design of Existing Code" by Martin Fowler
Publisher: Addison-Wesley Professional
"Design for Hackers: Reverse Engineering Beauty" by David Kadavy
Publisher: Wiley
Thanks for posting the list. The chart in the article makes it impossible to tell what the books are without hovering over each one to see the captions.
Hard to read on mobile. Couldn't get past the first few. It is annoying to have to click a tiny thumbnail to read a bad, extracted synopsis from Amazon.
Interesting to see Influence so high, but Predictably Irrational not listed at all. I've heard Influence is a really great book, but from a quick skim it seems like Predictably Irrational covers the subject matter as least as well if not better. I'd be happy to hear the opinion of someone who has actually read both.
I've read both and Influence is far more useful if you're trying to, well, influence someone. The art of influencing is complex and involves more than just a few behavioral economics insights. Influence is a total framework for understanding the psychology and emotions of selling.
I was surprised not to see Dale Carnegie's book either but I suppose its rather dated and not as scientific (How to win friends and...). Carnegie's book had some of the greatest impacts on my personal life and professional.
Agree, that book is a real classic, I got my copy from amazon actually.
As a side note, the failure of both of us to actually mention the book's full title (or include its amazon link) presumably means that neither of the services being discussed would have registered this as a vote for the book. We're part of the problem! ;)
Influence was a great book, but it is a bit outdated (in my opinion). Predictably Irrational and his other books were much more relevant. Thinking Fast and Slow was the best one of then all.
They made you memorize the red-black tree algorithm?! Why? The whole purpose of a reference book like that one is to not have to memorize it. If you need to implement a red-black tree, you just look it up (I have done exactly that, with that book and that algorithm).
To be fair, most companies don't let you bring your "Introduction to Algorithms" textbook in with you for a whiteboard interview where they expect you to regurgitate exactly how to do Red Black trees or any of a hundred other concepts from memory, so having him memorize isn't necessarily awful.
I wonder how many books would be on the list if it were somehow easy to extract mentions by name instead of by link. Mythical Man Month is mentioned regularly here and I don't think it's linked very often because of how well known it is.
perhaps taking a look at the negative reviews can reveal a little more secret. This is one of them:
It is sad that someone published this crap and killed thousands of trees. Do you know how long it will take to regrow those trees? 20-30 years. Your selfish lust for money lead you to get up all your principals.
SHAME.
In most such lists there's a distinct lack of math books even though there are tons of great math books specifically written for programmers and compsci people.
Anyone who has read the #1 book (Rent is too high) and who might want to add a few comments about it? What it describes, suggests etc. Never heard about it before.
In 2015, at Crunch Practical Bigdata Conference, Budapest, I showcased what books some subreddit community talk about: startups, entrepreneur, productivity reads. Slides are available here: http://www.slideshare.net/martonkodok/complex-realtime-event...
"The Rent Is Too Damn High: What To Do About It, And Why It Matters More Than You Think"
Not where I live. What to do about it? Move. Find an employer willing to let you work remotely, and find your own quiet cost-conscious piece of paradise.
Yes, but it’s complicated. Cities have become more valuable in the age of the Internet, against my predictions, intuitions and preferences!
Too few companies offer remote, because remote is hard. My experience at Stack is that remote work really only happens when a company supports it “natively” from the start. It’ll take a generation to make a dent, sadly. I’d like to be wrong.
Agree, fully. I add that remote feels incredibly easy when it's ingrained in the company, as it is for the one I work at right now. After experiencing it, I feel like it really doesn't have to be as hard as my previous employers made it out to be.
Absolutely. But I don't think city living needs to be expensive. Looking toward the more population-dense central Europe, average families are able to afford to live in urban areas because there is far more urban housing available throughout the countries, and it's not just concentrated in a couple of metropolitan cities.
The demand for urban living outpaces supply of urban housing the United States. We need a great urban housing expansion throughout the country to make city living affordable again.
Actually, living in city is (1) good for environment - no SUV trips, energy efficient buildings, and (2) good for soul - activities, theatres. I think love for suburbs (and then driving 30 miles to work and burning natural resources to catch up with city lifestyle) is mostly a US thing.
I've always been intending to read "The Human Zoo":
"How does city life change the way we act? What accounts for the increasing prevalence of violence and anxiety in our world? In this new edition of his controversial 1969 bestseller, The Human Zoo, renowned zoologist Desmond Morris argues that many of the social instabilities we face are largely a product of the artificial, impersonal confines of our urban surroundings. Indeed, our behavior often startlingly resembles that of captive animals, and our developed and urbane environment seems not so much a concrete jungle as it does a human zoo. Animals do not normally exhibit stress, random violence, and erratic behavioruntil they are confined. Similarly, the human propensity toward antisocial and sociopathic behavior is intensified in todays cities. Morris argues that we are biologically still tribal and ill-equipped to thrive in the impersonal urban sprawl. As important and meaningful today as it was a quarter-century ago, The Human Zoo sounds an urgent warning and provides startling insight into our increasingly complex lives."
It's disheartening to see that Matthew Yglesias has book, with an economic argument no less, that is incredibly popular here on HN. He's been discredited by many economists from the left/right/libertarian. That is, when his arguments aren't simply ad hominems.
If you've read this book, do yourself a favor and read some refutations. Or do one better, and learn economics so you can think about these issues clearly.
Easy to say on paper, but people quite understandably have friends, family and other attachments that tie them to a place... and that's even if they can get remote work. Most people still need to be physically present. Especially down the lower end of the pay scale.
Possible application of Law of Unintended Consequences: every time you write a program to extract data _out_ of HN, you increase motivation for someone else to insert data _into_ HN.
At some point someone with a megaphone encouraged their listeners / viewers to brigade the book with bad reviews. They did so and the low star count is the result.
I'm trying to Google for details over the brouhaha but can't find the correct keywords.