The Imposter's Handbook

eksemplar · on Aug 24, 2016

I have a degree in CS and I've never found myself in a situation where anyone would discuss bouble sort vs merge sort. Neither have I been in a situation where big-o was relevant beyond the basic concept of not doing obviously stupid shit.

What you've really missed is things like best practices, design patterns and concepts like SOLID, but a lot of people with CS degrees missed some of those as well.

If the book covers this, excellent, but why wouldn't it sell itself on valid points?

jacobsenscott · on Aug 24, 2016

I have a CS degree, and while nobody sits around talking about data structures and complexity that's not the point. It gives you a foundation of knowledge that you automatically and subconsciously apply to every job you do.

A CS degree prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes. It sort of gives you a crystal ball. You can see that some code isn't going to work when a db table gets to 100,000 records, or that some code is making the wrong space/time tradeoff, or that some code is using the wrong data structure from the standard library.

When your performance monitoring tool is telling you something is slow or leaking memory you have the foundational knowledge to understand why and fix it, rather than spending money on 10 more dynos or whatever.

Computers are so fast and cheap today a lot of this doesn't matter most of the time. The naive solution does just fine. But when the core dump hits the fan you better have one or two CS grads on staff.

BinaryIdiot · on Aug 24, 2016

> A CS degree prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes.

I couldn't disagree more. I do not have a CS degree and have lead many teams of folks with a combination of having and not having them. It's a huge mixed bag and I'm not confident you can make a general statement in either direction.

Yes CS can prepare you by knowing some of the basics but I've run into countless people with CS degrees who don't understand how much of anything works. I've also run into many without CS degrees who understand how the damn storage implementation in Postgres works.

Anecdotally, to me with my small bag of data points from the teams I've lead and the people I've interviewed, a CS degree is only what you make of it. If you were a good student, studied and understood the content then you have an edge. If you were an okay student who just memorized things for tests and never actually applied the knowledge then you're in no better state than someone without a CS degree (perhaps even in a worse state as most of the people I know, including myself, were told by multiple leads that without a CS degree everything is going to be a struggle and to advance you career you must get one so of course I had to work even harder to prove them wrong).

nikmobi · on Aug 24, 2016

I agree with you for the most part, but I think if his sentence read:

> A CS degree can prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes.

I think most people can agree that it doesn't provide any guarantee, but it definitely gives you a boost in the right direction.

(For the record, I don't have a CS degree)

jasonjei · on Aug 24, 2016

I think in many ways, though, the self-taught guys are going to be very adaptive and resourceful to learn new things. The difference in knowledge can often be compared to a self-taught home cook and a formally-trained cook.

I once had an intern that was a CS master degree student and while he was tackling neural networks in school, I showed him how to link to a DLL 3-times in C++ and he still couldn't figure it out on his own. It also shows you having a CS degree doesn't mean anything.

I think CS though will tell you how it works.

stevenwiles · on Aug 26, 2016

> A CS degree can prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes.

That is just wrong. A degree is just a piece of paper. How is a piece of paper going to prevent you from making mistakes? It won't.

What you mean to say was this:

> Any intelligent person with reasonable CS knowledge will be able to work without making a lot of obvious and costly mistakes

I hope I've taught you something. :)

BinaryIdiot · on Aug 24, 2016

Yeah a simple word and I would have taken the post entirely differently :)

jacobsenscott · on Aug 24, 2016

I must have thought the word 'can' and didn't type it because it doesn't read right as it is. I do that sometimes.

BinaryIdiot · on Aug 24, 2016

Fair enough. I to do that quite frequently :)

manigandham · on Aug 24, 2016

Interesting comments but both are saying the same thing: it's all knowledge in the end.

Either you know or you dont. CS degrees just means you learned it in a more standardized/formalized setting and have some proof to show you did it, but ultimately it's the knowledge itself that makes the difference, not how you gained it.

themartorana · on Aug 24, 2016

Agreed. CS degrees do tend to insinuate some degree of focused study, but it's a unique field where you can put in the same focus outside of a collegiate setting and exit with the same result (for undergrad, at least).

I have a CS degree and have worked with brilliant engineers that were HS drop-outs. It has everything to do with a passion for learning. It still requires the time focused on the study of CS, but the setting is secondary.

testudovictoria · on Aug 24, 2016

Different universities will provide different sets of knowledge and depth of that knowledge. However, I think we can generally say that a CS degree will provide the "crystal ball" information that GP states. I believe that your last point is the crux of it all.

Anecdote: I have peers all over the CS knowledge spectrum while taking the exact same class from the same instructors. Some peers have taken that knowledge and written kernels. Others are struggling to write an array sorting method. The former have been served well by their undergrad studies. The latter would have likely fared better in a 12 week bootcamp where it's more training and less theory.

unethical_ban · on Aug 24, 2016

Along with the other remarks, I believe it would be fair to say "all things being equal". That is, a given person would benefit from a CS degree in the following ways.

emodendroket · on Aug 24, 2016

I agree with the sentiment, but I would say people who are familiar with the CS principles and not necessarily people who have CS degrees.

BurningFrog · on Aug 24, 2016

Spending 4 years to get a CS degree gives you a lot of skills and knowledge.

But so does writing software outside of college for 4 years.

Which one is better? That's an empirical question.

epalmer · on Aug 24, 2016

I think the CS degree validity question depends on the person and the institution. I have interviewed programmers from CS programs that did not know even the basics of programing, not to mention the more advanced topics that we should know. I also know of programs that are producing pretty well rounded and knowledgeable students with a 4 year degree. So the normal YMMV must be applied.

I also think the same holds true for self taught programmers. I am self taught. Early in my career (decades ago) I was using perl to process some large text files. I was building a string of relevant information like $x = $x + "some value".

So this was wrong on so many fronts. After 25 hours of running I figured something was wrong. Okay, so I'm a slow learner...

I preallocated the string and the program ran in less than 20 minutes. Now of course a string was an inappropriate data type as well. I learned a lot at that point and starting thinking about internal representations of data structures and other concepts.

JoeAltmaier · on Aug 24, 2016

Usually here is where we observe Dell, Gates and Jobs didn't finish college. But that's more about business than engineering.

cholantesh · on Aug 24, 2016

And even if it wasn't, one could retort that Brin, Page, Musk, Nadella, etc. did, and in fact, did graduate level studies. It's a complex question.

rayiner · on Aug 24, 2016

> A CS degree prevent you from making a lot of obvious (if you have a CS degree) and costly mistakes.

This. Many years ago, our code was shitting the bed, a month before a major milestone deadline. Turns out that someone wrote an N^2 algorithm and only tested with N=5.

I don't have a CS degree--just a few semesters of combinatorics and graph theory. When I was programming, I always felt that was a huge liability. I'd confront a problem, and I knew just enough to know it could probably be reduced to some graph problem and solved using a known algorithm, but I didn't know what that was.

pc86 · on Aug 24, 2016

And a CS degree teaches you nothing about how to appropriately test a given system and whether N=5 is sufficient, but industry experience will.

unethical_ban · on Aug 24, 2016

some CS degrees. From what I've observed, some CS programs are teaching "software engineering" at the senior/masters levels in order to focus on development and design over theory and math.

simtel20 · on Aug 24, 2016

Yes but I've seen CS degreed developers with industry experience do that when deadlines approach and the test procedures don't keep up with the product specs.

crdoconnor · on Aug 24, 2016

>Computers are so fast and cheap today a lot of this doesn't matter most of the time.

Indeed, yet since it's emphasized so much in college people prioritize this over cleanliness and architectural elegance which matters much more.

Silfen · on Aug 24, 2016

Depends (like most of these generalizations) on your college and teachers. I had two professors who hammered the point that the priorities are: clarity, correctness, and performance. In that order. Perhaps a bit dogmatic, but a useful idea.

We would often spend class time writing cleaner, simpler code after arriving at a correct answer.

ap22213 · on Aug 24, 2016

It's extremely helpful if a team member has a CS degree, but it's not always essential.

The CS grad usually understands the whole stack from UI through CPU, I/O, and memory. They don't get the distant stare when they see code with a Red black tree or a graph algorithm. They may not know about skip lists and bloom filters, but they can figure it out quickly. They understand reference vs. object equality. They understand multi-threading and concurrency strategies. They understand how to implement a hash so that there are few map collisions.

That said, a lot of IT work, web development work, database work, API work, etc. doesn't require all of that. A lot of my work does, but if it doesn't, I hire based on passion, productivity, resourcefulness, and craftsmanship.

As an analogy, many small businesses are successfully run by self-taught entrepreneurs. But, running a $100M company requires different knowledge.

eksemplar · on Aug 24, 2016

The thing is though that you don't really deal with this in your day to day life, and when you do, even with a few CS candidates on staff, you're going to hire a consultant with a Ph.D. in the field.

Of course I may be environmentally damaged from never having worked with someone who was self taught.

schrodinger · on Aug 24, 2016

I don't think that's entirely true. It comes up pretty often. You may not be doing formal proofs but it's helpful to have a thorough understanding as it helps guide intuition around scaling your product (e.g. how a database query will scale with more users). It should influence your design early on rather than be something requiring a PHD consultant later on

jacobsenscott · on Aug 24, 2016

I deal with this stuff regularly, but not daily. I'm sort of the go to guy on the team when things start to feel CS'y. It isn't PhD level stuff.

burnstek · on Aug 24, 2016

It also helps communicate with colleagues who prefer to leverage their CS background when discussing algorithms, issues, etc.

muhmi · on Aug 24, 2016

That sounds like very basic CS stuff, so I guess "CS degeee" here means B.Sc. level knowledge? ;)

pc86 · on Aug 24, 2016

As opposed to what? When discussing the existence of degrees in industry "CS degree" means Bachelor's 999 times out of 1,000.

vonmoltke · on Aug 24, 2016

Why would it not?

mamon · on Aug 24, 2016

In some countries (think "Europe") Bachelor degree is considered "incomplete", or just an intermediate step. Only Master of Science degree is regarded as true "higher education". There are even university programmes that take you straight to MSc degree, without stopping at BSc.

frobozz · on Aug 24, 2016

I disagree. The things that come up every day in practical, real-world professional software development, such as design patterns and SOLID are the things that professional autodidacts normally have plenty of experience in and knowledge of.

The things that they have missed by not taking a Computer Science degree are precisely those things that don't tend to come up, such as big-O and the behaviour of various sorting algorithms.

Anyway, here are two paragraphs on the linked page that you might like to see:

"More than just theory, this book covers many practical areas of the industry as well, such as: Database design, SOLID, How a compiler works, sorting and searching algorithms, Big-O notation, Lambda Calculus, TDD and BDD."

"One of the more subjective parts of the book, but I was asked by many people to write about these things. Specifically: SOLID, structural design, TDD, BDD, and design patterns."

marcosdumay · on Aug 24, 2016

> Neither have I been in a situation where big-o was relevant beyond the basic concept of not doing obviously stupid shit.

How do you know you are doing stupid shit if you don't know about complexity and don't know your algorithms?

Really, I do work on CRUD applications from time to time, and I often have to select algorithms based on complexity. Yeah, didn't have to implement one of them for ages¹, but I do have to tell coworkers things like "here you use a set", "here you use a list", "this sorting algorithm isn't stable", or "nah, just use brute force and get done" once in a while.

1 - Or, better, did implement a B-tree for a side project just a couple of months ago. Ended up just throwing it away, but I didn't know I wouldn't use it at the beginning.

collyw · on Aug 24, 2016

I learned big-o around 15 years ago, and have never used it professionally. (I forgot what it meant when I had a phone interview with Google).

I do know that some algorithms are more efficient than others, and some sorts depend on whether you expect the list to be random, sorted or partially sorted. If I find myself in a situation needing to choose between libraries. I would guess that half an hours reading would refresh what I need to know.

Periodic · on Aug 24, 2016

Just a year ago I was on a project that involved building a tree of data and we wanted the ability to try adding a few things and backtrack to multiple different points if it didn't work out, so there was a lot of shared data in the tree. Performance was critical and we had various operations that had to be performed regularly, so I did a lot of implementations and tested them directly.

My CS degree (which I got late in my career) was invaluable. I got it precisely because I had no idea what I was missing and if I hadn't I wouldn't have investigated half the options I did and would not have had a good framework for thinking about them. I would have gotten the job done without it, but it wouldn't have been as good a job and I wouldn't have known what I was missing.

vonmoltke · on Aug 24, 2016

> How do you know you are doing stupid shit if you don't know about complexity and don't know your algorithms?

So, one cannot understand complexity without Big O calculus? I only learned the notation after years of calculating memory and runtime complexities for real-time code, and that was only for interviews. I don't find comparisons at the Big O level to be useful in day-to-day work.

> Really, I do work on CRUD applications from time to time, and I often have to select algorithms based on complexity.

Really, you often have to select algorithms at that level of granularity? I never have, in almost ten years of full-time development. I frequently have to select or tune algorithms based on the factors that Big O throws out, though. Then again, I do very little with searching and sorting.

kchoudhu · on Aug 24, 2016

I have libraries that make sure I don't do stupid shit.

ThePawnBreak · on Aug 24, 2016

That's the point, there is no magic library that can do this for you. For example, let's say you're writing Python and have a list of receipt numbers. When you get a new receipt, you want to check if it's already in the list, so you do the pythonic thing: "if receipt_nr in receipts", assuming that the standard implementation of this operation in the library is efficient. The problem is, a set would be a lot more efficient here, but the standard library doesn't know you'll be checking if a number is in the list many times, so it cannot use the appropriate data structure.

So, in a way, the time and space complexity of each operation is part of the interface: maybe you don't really need to know how the internals are implemented, but at the very least you need to know the time and space complexity of the data structures you're using, and how to analyse the complexity of your code.

kchoudhu · on Aug 24, 2016

Next question: how often would this kind of malprogramming actually lead to deal-breaking performance issues in a project? After a decade in the industry, countless projects delivered and much shit code witnessed, I can empirically answer this question, and the the answer is -- even at places that are known for their technical excellence -- not often.

Look, I'm not advocating illiterate programming, nor am I a proponent of unprofessional work. I took my share of CS classes, and I can talk about algorithms and how to analyze them. The number of times I have had to do so on the job, however, can be counted on my bodily digits, without taking off my shoes.

This is probably not an original observation, but a big part of why these unproductive conversations keep on happening here has to do with how young HN demographic skews. When the biggest achievement in your life is getting that new degree you spent $BIGNUMBER $$/years on, you're going to want to belittle workmanlike programmers who don't measure up to your own standards. It takes a while for people to move on and grow up.

pjc50 · on Aug 24, 2016

> how often would this kind of malprogramming actually lead to deal-breaking performance issues in a project?

Depends very much on your projects. Web developers generally have it easy, as it's cheap to add more servers and free to burn CPU in the browser. Even so you can get into trouble with anything that's O(N) in the number of users. I do wonder how much use of nosql is from people who think SQL is slow because they've not got their indexes set up properly.

Game developers, of course, live and breathe performance. As do embedded and similar low-level environments. Or people working with this trendy "big" data.

sbov · on Aug 24, 2016

Performance will rarely be deal breaking. Instead people will toil away with slow systems and just accept it. It doesn't mean that they wouldn't be much happier with a faster system.

We recently replaced a system whose online store took about 900ms to respond, and their back end admin section on average around 1.7 seconds. It obviously wasn't deal breaking since they lived with it for 3 years but it still was still ridiculous, and definitely had an impact on both their productivity and bottom line.

emodendroket · on Aug 24, 2016

How big is the list we're talking about?

riskable · on Aug 24, 2016

What's funny about your example is that using a set in that situation would be an example of premature optimization. If your entire list of receipts fits into memory users won't notice a difference between a list and a set.

Also, if you used a list in Python (or other similar dynamic languages) swapping a list out for a set is a trivial operation...

    receipts = []
    # ...becomes:
    receipts = set()

...and it would require no changes to the conditional logic of the application. So later, if it turns out you're having performance issues you (usually) make a few minor changes here and there and get your 10x or 100x or even 10,000x speedup.

scott_s · on Aug 24, 2016

> If your entire list of receipts fits into memory users won't notice a difference between a list and a set.

If you had said "small enough", I would agree, but linear walks of data that can fit in memory can easily be noticed by users, for large enough lists. (Very large lists can fit in memory.)

But, at a higher level, yes, this is always true when people talk about performance: make sure what you're optimizing actually matters. I think a charitable reading of what ThePawnBreak said should assume one has already determined this particular operation matters for performance.

pjc50 · on Aug 24, 2016

Earlier this month I was looking at the question of "why does it take 160ms to scan a list of just 60,000 items?

The answer turned out to be "because they're not adjacent and on this embedded system every single cache miss costs you hundreds of nanoseconds".

I strongly agree that, like woodworkers, performance improvers should measure before they take out the power tools.

emodendroket · on Aug 24, 2016

> What's funny about your example is that using a set in that situation would be an example of premature optimization. If your entire list of receipts fits into memory users won't notice a difference between a list and a set.

That's completely untrue if you're running this check often enough.

kchoudhu · on Aug 24, 2016

Not to mention that most of the kids today would have stowed their entire population of receipts in whatever the datastore flavor-of-the-week is. At that point, the lookup would effectively be a SELECT -- which the DBMS would optimize away for you.

pjc50 · on Aug 24, 2016

.. unless they've forgotten the index, at which point the DBMS grinds to a halt.

Or they've written the query on the wrong side of the ORM so it has to pull in all the objects, check the value, and discard them.

emodendroket · on Aug 24, 2016

A very simple example that I think is "real-world" enough is recursively descending the fields of some object which has cycles. If you use a list instead of a set this can be quite slow.

kevhito · on Aug 24, 2016

And my students have calculators so there is no need for them to know basic arithmetic.

collyw · on Aug 24, 2016

My friend is a Maths teacher and can barely do mental arithmetic.

teekert · on Aug 24, 2016

I have no degree in CS and I see these terms (Big O, np vs p, etc) regularly, mostly here on HN. No idea what they mean, this books sounds great to me.

haylem · on Aug 24, 2016

> mostly here on HN

Which may or may not be an accurate depiction (as we read personal accounts and thoughts of the commenters) of a quite marginal subset of real-life IT-professionals.

I wouldn't worry too much about what's being said or not said on HN. There are great ideas and topics to be covered here for sure, but they're sprinkled on top of a giant cake made with 1-part self-loathing, 2-parts day-dreaming, and 1-part regular huff-and-puffing.

It makes for good entertainment and procrastination.

> No idea what they mean, this books sounds great to me

That being said, not knowing Big-O while doing CS or IT work seems worrying. Sure, it's not absolutely necessary for most of the grunt work. But you should definitely have the same understanding of performance issues without knowing the fancy notation and terminology. Big-O is just a notation and a formalization of these concepts, and it helps with communication. I'd say it's still better to know it.

So, indeed the book probably doesn't hurt.

teekert · on Aug 24, 2016

I'm just a biologist that switched to Python because Excel and Origin weren't dealing very well with my ever increasing pile of data (Typical data: Every row is cell in a Tissue sample, every column is a quantified parameter (size, marker intensity, ...) of that cell, typically I deal with 10s to 100s of tissues samples) Pandas is great, I spend my time turning DataFrames into histograms, scatter plots and ROC curves in Jupyter Notebooks. I have the feeling knowing Big-O is not very relevant. Still, learning new languages, new words and new abstractions is almost guaranteed to influence ones way of working and thinking at some level.

So, indeed indeed the book probably doesn't hurt ;)

Edit: Just glanced over the link in Practicality's comment about big-O and sure enough I think it may actually be useful as my ever increasing pile of data increases even further! I have to admit; as the parameters increase I find myself doing over night calculations more and more.

emodendroket · on Aug 24, 2016

Asymptotic complexity comes up pretty frequently in bioinformatics contexts because the volumes of data can be huge.

vonmoltke · on Aug 24, 2016

How does it change what you do, though? I did signal processing with massive data streams rather than bioinformatics, but I assume the situation is similar. The algorithms are what they are. They are complex mathematical equations or transformations that need to be run on data and are often optimized without being able to change their asymptotic complexity.

emodendroket · on Aug 24, 2016

Skiena's Algorithm Design Manual mentions him being brought in as an algorithmic consultant to modify some genetics analysis software so that it'd actually finish but I don't really remember the details or know enough about the field to give you plausible examples.

vonmoltke · on Aug 24, 2016

I can see that; I did a lot of similar work with signal processing algorithms. None of what I did affected asymptotic complexity at all, though. The asymptotic complexity was tied to the algorithms chosen, and changing those was an issue of trading computational performance for system performance.

emodendroket · on Aug 24, 2016

I pulled it up and found it; the problem involved modeling genome sequences as strings and finding possible substrings.

Jtsummers · on Aug 24, 2016

EDIT: Misstated the big-O, in this particular case (should've found my coworkers actual code). Both are O(m x n), one just has a large constant.

Here's a pattern I've noticed with code written for processing a data file by a lot of people (python-esque, using a function (match) that's "left as an exercise for the reader" to implement):

  def search(filename, value):
    with open(filename, "r") as f:
      for line in f:
        if match(value,line):
          print(line)
        # we don't care about not matching

  def main():
    for v in [search1, search2, search3, ...]:
      search("data.dat", v)

What happened is that one time they needed that search function, and so they made search and it worked well. They realized they could run that same search function repeatedly, and for small data files and few searches it was quick enough. But the performance is O(m x n) [EDIT: originally wrote O(m x n)], where m is the number of lines, n is the number of search values. [EDIT: wrong: a second m because it takes a time proportional to the size of the file to read the file.]

The data file is read every time something is searched. If you've got an SSD, it's not really noticeable. If you've got a spinning disk, it becomes a problem. If you're hitting network storage, you're downloading that file n times. The main issue being that each read (each iteration of the inner for) hits the hard drive, network, or similar. A simple performance hack is to move the read into main, put the whole thing into one list of lines and pass that list to search instead of the filename (modifying search appropriately):

  def search(data, value):
    for line in data:
      if match(value,line):
        print(line)
      # we don't care about not matching

  def main():
    with open("data.dat", "r") as f:
      data = f.read().splitlines()
      for v in [search1, search2, search3, ...]:
        search(data, v)

It's still O(m x n) [EDIT: It's now O(m x n). We've removed one of the m factors because we do the read once, and never again.]

For very large files and very large search parameter lists, this will still take a long time, but it's much faster than the previous version when you're dealing with large files.

EDIT:

Shortest code I can think of to get the actual worst case that I've had a few coworkers pull off:

  def search(filename, value):
    with open(filename, "r") as f:
      data = f.read().splitlines()
      for line in data:
        if match(value,line):
          print(line)
        # we don't care about not matching

  def main():
    for v in [search1, search2, search3, ...]:
      search("data.dat", v)

With, of course, other code in between because as vonmoltke points out, the above has clear problems. My point was about the structure of the bad pattern, not the specific implementation of it.

vonmoltke · on Aug 24, 2016

I would have never written the first example in the first place, and I don't need Big O calculus to tell me it's a bad idea. Even with a single file it is obvious that the initial implementation is doing unnecessary work and that re-reading a file from disk every time is ridiculous (unless the file is too large for memory, in which case I would pass the list of search terms to the search function and check each line for all terms as the lines are read).

Jtsummers · on Aug 24, 2016

It only looks like a bad idea because of the close proximity in my example. What I've seen normally is that it's grown into something mimicking this structure, but actually far more complex. The point where the file read happens isn't so near the top so that refactor is less obvious, and it's so deep that the person who puts it into that outer loop in main may not realize what's happening internally (fully, at least).

I'm trying to recall the structure of another case where this happened that with a more complex internal algorithm. The solution was far less obvious, but required similar refactorings. In that case it was both reading the file multiple times, and a several deep loop where one (which was by far the longest running) could be refactored to only happen once. Instead of 100 or so times, we flipped some of the loops around (moved it to be the outer loop, similar to the idea of moving the loop over all lines to be the outer loop in my other example). Big-O wasn't essential (for me), because I'd internalized that sort of thinking. But that explanation was essential for my colleagues (EEs, couple years out of school) who hadn't been exposed to that construct before (at least not enough to stick).

vonmoltke · on Aug 24, 2016

> It only looks like a bad idea because of the close proximity in my example. What I've seen normally is that it's grown into something mimicking this structure, but actually far more complex. The point where the file read happens isn't so near the top so that refactor is less obvious, and it's so deep that the person who puts it into that outer loop in main may not realize what's happening internally (fully, at least).

I don't see how Big O calculus helps here. If you have enough understanding to run that analysis you have enough understanding to see it is trivially a dumb idea.

> I'm trying to recall the structure of another case where this happened that with a more complex internal algorithm. The solution was far less obvious, but required similar refactorings. In that case it was both reading the file multiple times, and a several deep loop where one (which was by far the longest running) could be refactored to only happen once. Instead of 100 or so times, we flipped some of the loops around (moved it to be the outer loop, similar to the idea of moving the loop over all lines to be the outer loop in my other example). Big-O wasn't essential (for me), because I'd internalized that sort of thinking. But that explanation was essential for my colleagues (EEs, couple years out of school) who hadn't been exposed to that construct before (at least not enough to stick).

What's funny is that I am an EE, and the way I internalized complexity analysis and optimization was to look at the number of operations being performed, along with the cost of those operations, and design the code such that it used the fewest resources. Only later did I learn this "Big O" thing and it seemed stupid because it seemed overly complex and was telling me to throw out significant factors that I spent my career worrying about. I still don't really see the value of it over more detailed methods that seem trivially easy to me, like simply deriving an approximation of the complete polynomial describing the runtime, memory usage, or what have you. I am a systems engineer and have a bias towards modeling things, though.

dineshp2 · on Aug 24, 2016

> I wouldn't worry too much about what's being said or not said on HN. There are great ideas and topics to be covered here for sure, but they're sprinkled on top of a giant cake made with 1-part self-loathing, 2-parts day-dreaming, and 1-part regular huff-and-puffing.

I don't understand your argument regarding why you would not pay much attention to what is being said on HN. Could you explain?

blowski · on Aug 24, 2016

Probably because there is a noisy minority who voice dogmatic opinions without understanding the constraints of the problem at hand. They are lilliputians, spouting wonderful ideas that collapse in the face of deadlines and budgets. For those of us that have to live in reality, they can be very annoying and disheartening, so it's essential to take their opinions with a fistful of salt.

haylem · on Aug 24, 2016

That, yes.

But also because a significant amount of even the day's top popular topics have in the end very little relevance and impact to most businesses.

Which doesn't mean it's not interesting, though.

lafay · on Aug 24, 2016

This is the best description of HN I've ever read:

> There are great ideas and topics to be covered here for sure, but they're sprinkled on top of a giant cake made with 1-part self-loathing, 2-parts day-dreaming, and 1-part regular huff-and-puffing.

Practicality · on Aug 24, 2016

Ironically, big-O describes things that are very easy to understand, in a notation that looks complicated.

If you have a lot of experience, you probably already understand the concepts. The notation just gives a clear way to describe them: https://rob-bell.net/2009/06/a-beginners-guide-to-big-o-nota...,

chris11 · on Aug 24, 2016

True, but studying the theory gives a better intuition for big-O. And theory is helpful for some of the edge cases that can be more complicated. For instance, I wouldn't want to try to determine the run-time of a recursive algorithm with just the info in that article.

Practicality · on Aug 24, 2016

Indeed, and if you did learn these things through experience, as I supposed, then you probably did so by writing something O(2^n) and suffering when you had to process >100,000 records.

(Why does it take 100 ms for 10,000 records and 50 minutes for 100,000? ... hmm)

Learning these concepts through the theory is definitely more efficient.

gthtjtkt · on Aug 24, 2016

Just throwing my anecdote in the ring: I work with a small team (4 devs) in at a non-tech company. The other devs all have 10+ years experience and none even know what SOLID is.

I've never asked, but I'm guessing they don't know about any of the other things you mentioned either.

There's an ocean of small/medium businesses who just need to get shit done and don't need it optimized so it can scale to serve 7 billion people every day while running on a 1997 microwave.

HN seems to overlook that market entirely. Not hip/trendy enough, I suppose. Consultants are billing $250/hr just to write CRUD apps or reports, and they're completely booked.

DanHulton · on Aug 24, 2016

I mean... which consultants? In which industry? And who else works in that industry? And do you have phone numbers?

Just... asking. You know, for reasons.

gthtjtkt · on Aug 24, 2016

Legal industry, in a branch of the law that's very high volume. We were paying a consultant $175/hr just to create basic SSRS reports until I learned to do it. I'm sure plenty of other firms are still hiring outside devs at exorbitant rates.

And when I explain what I do (mostly automating clerical or data entry/retrieval processes) to friends in other fields, a lot of them have said "Wow, we could use a lot of that at our company. We do X and Y and Z over and over and it's a huge waste of time." Small companies often force highly-skilled workers to complete their own repetitive clerical tasks, and medium companies seem to hire teams of $10/hr drones. They never think "At what point is this worth automating?" or they don't know where to look for a dev who can do it for them.

I think the small/medium business CRUD app market is extremely neglected because it's not as glamorous as machine learning or whatever else all the MIT grads are doing these days.

jredwards · on Aug 24, 2016

I have no degree in CS and I see these terms regularly.

So I google them and read about them.

mattmanser · on Aug 24, 2016

You don't know what you don't know.

The point of a book like this is that some who knows what you don't know from being asked questions about things he didn't know by people who did know, he now knows what you don't know and can tell you what you need to know.

The only knowledge you have now is of known unknowns without knowing anything at all about the unknowns unknowns.

Know what I mean?

retro64 · on Aug 24, 2016

I realize you are joking, but this is why I was in frantic mode in my early years. I do not have a degree in CS, so I had no idea what I was missing. Consequently I read up on everything. I still have a bookshelf filled with books on algorithms, "gotchas", Gang of Four wisdom, database design, you name it. I was forced to learn coding on the side and still have the habit today. I always have a side project going (which has fueled my career, but has also unfortunately cost me my marriage).

I now have a son going through the proper steps of receiving a degree, and I have book to supplement him for every class (except for the higher math).

steve_g · on Aug 24, 2016

You should have ended with "You know?"

manifestsilence · on Aug 24, 2016

Exactly. Music major here. I spent a lot of time on Wikipedia, the programming sub-Reddits, and now on here. Making the code run is one thing, but knowing how to communicate with other programmers is important. Speaking the language around programming rather than just the programming languages themselves.

AMHOL · on Aug 24, 2016

https://www.youtube.com/watch?v=QnZ0Y4rvz6E&feature=youtu.be...

Periodic · on Aug 24, 2016

What I find most illuminating about the deep software theory is it helps me understand why things are best practices. There are a lot of things that are good to do, but understanding deeply how the compiler things and how the languages are designed gives me insight into why we choose to do the things we do.

The best example is taking a few options someone has for their function interfaces and reframing it in terms of type algebras which helps expose where it's overly complicated. The two best examples I have are making IO and mutations of input explicit and localized in single a location so the user doesn't have to guess which functions are changing things and which ones aren't. Once you have a good framework for describing these abstract boundaries a lot of other principles of design fall out from keeping it simple and readable.

ryandrake · on Aug 24, 2016

There are 1. "I get it to compile" programmers, 2. "I get it to work" programmers, and 3. "I make it great" programmers. A solid computer science foundation, knowledge of the literature, and experience can (but not necessarily will) get you from 2 to 3. Sadly, I've encountered quite a few professional programmers, complete with CS degrees, who year after year barely scrape by at 1, so it's not a given.

chadclan · on Aug 24, 2016

Preach it. This is EXACTLY my experience. CS degree or not a person's proficiency depends on that person. One of the biggest hurdles I personally face is that so many programmers just don't care about #3. It's often not a competency issue, it's an apathy issue.

sailfast · on Aug 24, 2016

I don't have a CS degree, and while I know not to write ten nested loops it's nice to know the actual reasons behind why it's a problem. Understanding the core principles allows for a deeper understanding of what you're doing, and as a result might lead to some novel use of something.

Not saying novel use or innovative stuff requires a CS degree, merely that improved understanding of the basics of computing will likely improve the way you fully consider engineering problems. This one's definitely on my reading list for that reason (also MIT / Stanford open courses on the topic of algorithms, etc are super helpful for this kind of background)

kamaal · on Aug 24, 2016

Any body who spends any amount of decent time with a problem, eventually know hows to optimize it. This isn't just with software, its with every task on earth.

Humans have an in built ability to think about saving the effort required to achieve any task X.

iandanforth · on Aug 24, 2016

Here's another perspective. Let's say the company you joined has a culture that is big on sports, specifically American football. A lot of the people on the team care about the sport, make jokes about it, and make passing references to the current events in the game. If you're the kind of person who cares about fitting in, you might just read up on the game and perhaps browse the sports page headlines so the comments and jokes don't go totally over your head. You know when to laugh.

This type of book can help with the cultural roadblocks non CS degree wielding programmers may have. There are many cultural code words that CS degree holders use and being up on those can make an "impostor's" life easier.

vonmoltke · on Aug 24, 2016

> Here's another perspective. Let's say the company you joined has a culture that is big on sports, specifically American football. A lot of the people on the team care about the sport, make jokes about it, and make passing references to the current events in the game. If you're the kind of person who cares about fitting in, you might just read up on the game and perhaps browse the sports page headlines so the comments and jokes don't go totally over your head. You know when to laugh.

Personally, I wouldn't want to work at a place where I felt like I needed to "fit in".

magic_beans · on Aug 24, 2016

I work in fashion because it was the best job I could get before I ran out of money. I have zero interest in fashion. I don't fit in with my colleagues because I have zero interest in fashion. It is not very fun to work in an environment where I have so little in common with my colleagues, but I didn't have much of a choice in the matter. Alls I'm saying is... sometimes the circumstances of life dictate where we find ourselves employed.

munificent · on Aug 24, 2016

> the basic concept of not doing obviously stupid shit

I find that the more CS fundamentals I learn, the higher the bar on my definition of "obviously stupid shit" becomes. It turns out lots of things become obviously stupid as you learn more about algorithm analysis, more about how compilers work, more about how CPU caching effects performance, etc.

barrkel · on Aug 24, 2016

I work on a product that does reconciliation for finance: that basically means comparing two lists. Each list could have a hundred items, or ten million. When there's that much variance in n, it matters what the O(n) is for any code that has to touch this data.

Worst-case list comparison for two lists of length m and n respectively, is O(mn). For 10 million in each list, that's a bigger number than you can reasonably expect to iterate through using brute force. So brute force won't do: it is imperative that you understand how everything that touches all the data scales.

"Not doing obviously stupid shit" means understanding the implementation of data structures and how they scale, understanding how nested loops multiply runtime, how recursion levels multiply runtime. It turns out there's actually a 1:1 correspondence between knowing big-Oh and not doing obviously stupid shit.

That's why it's one of our in-person coding questions: after the candidate has written up a solution to the problem, they need to be able to analyze what they wrote and understand how it scales to different n. If they don't have an intuitive understanding of the performance of code, the chances are they'll do obviously stupid shit.

(Personally, I learned about big-Oh long before I went to college, and I believe it's probably the single most useful thing to learn in CS, right up there with compiler theory (my previous job was a compiler engineer, so I may be biased). With best practices / design patterns etc., it's extremely hard to substitute for experience; I see juniors misapplying software design concepts on a weekly basis.)

chris11 · on Aug 24, 2016

Do you think that your intuition for not doing something stupid came from studying the theoretical side of cs? Sure, you should be able to get a good idea of the runtime of code without spending much time looking at it, but I'd say personally that spending a couple semesters in classes that specifically covered big-o really helped my intuition.

It just seems to me that not doing stupid shit can really depend on your intuitive grasp of theoretical cs. For instance, trying to find the optimal solution to an np problem might be a really bad idea, depending on the instance. But recognizing np problems doesn't really explicitly come up that often outside of theoretical cs.

codingdave · on Aug 24, 2016

"beyond the basic concept of not doing obviously stupid shit."

Isn't that exactly the point? Based on one's knowledge, something that is obviously stupid to you could be a completely new concept to someone without that same knowledge or experience. Describing anything as "obviously stupid shit" is exactly the kind of thing that makes new self-taught programmers afraid to engage with others. It is better to give people a break, help them learn, and give them a positive environment in which to do so. This book sure seems like a good step in that direction, whether or not its topic comes up in everyday conversation.

qwertyuiop924 · on Aug 24, 2016

The problem with design patterns and SOLID is that they're sometimes followed in a cargo-cult like manner. You don't necessarily get good code just following a set of instructions. You have to apply common sense, and know the tradeoffs of various approaches. Experience is the most important think to have, and it's something I lack, admittedly.

CS is no substitute for that, but it's still an important part of getting it. It gives you a set of tools and a lexicon for understanding what's going on, and understanding some kinds of tradeoffs. As somebody who is just trying to figure all this stuff out, this is invaluable.

RealityVoid · on Aug 24, 2016

Experience is good, but I find some people have a "knack"... they intuitively understand when something is good and when not. How do you instill that? Priming from good logical and mathematical education help. Nudges to think for yourself and thins of the reason things are done a certain way sort of help. What more?

ryandrake · on Aug 24, 2016

Lots of practice and experience.

Calling it a "knack" kind of downplays the years (decades) of study, work, and effort people put into refining their craft. You could also say Michael Phelps has a "knack" for swimming, but he also did nothing but workout, train, practice and compete for years.

qwertyuiop924 · on Aug 24, 2016

I keep trying to explain this to my non-programmer friends: No, I'm not some genius, I read lots of information, and worked hard to get to where I was, and from a programming perspective, I'm barely competant and know just enough to be dangerous. Yes, you could probably do it too, although the difficulty varies depending on how well you can handle the kind of abstractions and logic you'll be dealing with.

qwertyuiop924 · on Aug 24, 2016

That's actually kind of what I was trying to say...

CS is no substitute for experience, but it helps.

EliRivers · on Aug 24, 2016

"I've never found myself in a situation where anyone would discuss bouble sort vs merge sort. Neither have I been in a situation where big-o was relevant..."

How about in job interviews? Obviously never on the actual job, but how about interviews? They ask all sorts of crazy crap. I suspect because they've got no idea what to ask.

softawre · on Aug 24, 2016

There are many software engineering jobs where Big-O is relevant.

collyw · on Aug 24, 2016

And many many more where it isn't.

My current work, the problem isn't the algorithms that make the thing run slowly, its because the people who wrote it don't know how to use Django efficiently. Lots of loops in the application making database calls each time (our main page is making over 1000 database calls). A bit more thought in the database design, knowing a bit more about how Django's ORM works and you could do all that work in the database in less than 10 calls and make it a lot faster. A couple of extra database indexes in the right places, and a bit of caching and it should run even faster.

Thats the sort of knowledge that is relevant to (most of) our jobs, not big-o.

ryandrake · on Aug 24, 2016

This is exactly true. Even at places where the primary money-making IP is a truly novel application of some algorithm, 99.5% of the code are "supporting actors": an API onto that algorithm, plumbing up through layers of middleware, bindings to other languages/frameworks, integration code to shim it through some legacy system, a CRUD interface, etc. My guess is that Google's search code is actually just tiny island of real computer science surrounded by a vast ocean of plumbing, frameworks and middleware, and that when you're hired there, chances are you're not going to be working on that island.

IMTDb · on Aug 24, 2016

"Lots of loops making database calls each time" improved by "a bit more thought in the design" and "a bit of caching" sounds an awful lot like Big O analysis, and algorithmic optimisation even if it is on top on Django ORM.

lucasnemeth · on Aug 24, 2016

Sometimes only. big O cuts all the constants and they are really important in actual software development. A lot of actual work involve improving the speed from a 10N to a 2N, it is a big difference for the costumer, but they are all O(N).

vonmoltke · on Aug 24, 2016

That's not Big O. Big O is asymptotic complexity. Caching, loop optimization, and the like are irrelevant noise to that calculus.

collyw · on Aug 25, 2016

Its not really. Its the same algorithmic complexity (the exception being adding appropriate database indexes), but shifting the where the computation is done.

eksemplar · on Aug 24, 2016

I live in Denmark, so the short answer is that I don't. Big-o came up in an interview once, but it was mainly along the lines of being asked if I knew what it was and me answering with a yes.

I've spent a semester doing all sorts of silly algorithms, including * sort, so it's not like I couldn't handle them if it was ever required. The thing is, time to market and code consistency will almost always leave you using the standard libraries which came with your environment.

There are obviously a few jobs where it's relevant, but I'm into things like architecture, digitizations and business development, so I'll never work one of those jobs, and neither will 95% of you.

wpietri · on Aug 24, 2016

Your basic complaint here seems to be, "I am not in the audience for this book. I don't like what it covers." Maybe the problem isn't the book, but your expectation that a book that isn't for you should cover what you want, not what it's readers want.

As a person who has been programming for a long time without a CS degree, my guess to why it doesn't have "best practices, design patterns and concepts like SOLID" is because working programmers without CS degrees know that stuff already. I sure do. Things like "best practices" and "design patterns" are distillations of experience. (And I'll note that he explicitly mentions covering SOLID, so maybe double-check your complaints before posting them.)

The parts easiest for to miss, though, are the most theoretical ones. I had an intuitive grasp of algorithmic complexity from my teens, but going back and learning big-O notation was helpful in talking with theory-oriented approaches. I still don't really get lambda calculus because I've never had a practical problem where learning it would help me ship things. I wouldn't mind learning it, but it's just never gotten to the top of the to-read pile. I love the idea of a book like this because it strips out the 90% of a CS curriculum that I know and gives me only the pieces that I don't.

lanna · on Aug 24, 2016

http://www.joelonsoftware.com/articles/fog0000000319.html

philippeback · on Aug 24, 2016

Ah, these articles are worth their weight in gold.

Not related directly but from the good old days: The Ars Digita Systems Journal. http://www.eveandersson.com/arsdigita/asj/

kabdib · on Aug 24, 2016

These things come up regularly. Well, bubble sort is usually the punch-line to a geeky joke. But I've spent an unreasonable amount of time on data structures that fell apart at scale because of O(N) issues.

Had to look up SOLID. Makes sense, follows the principles I've soaked up from books and a couple O-O courses. That acronym was apparently invented 20 years after I dropped out of college :-)

Just keep learning. No drama. Just improve.

qwertyuiop924 · on Aug 24, 2016

Yeah, and most of SOLID are just good ideas in general in software design, regardless of whether you're using OO. Write objects/programs/functions/libraries/interfaces that do one thing. Depend on an interface, not an implementation. That's the S and the D. L becomes irrelevent outside OO, and the sorts of things you would do with L you can do by instead loading a different library for D, or passing in the correct implementation for the situation, in a manner akin to DI. I is really just a more specific restatement of S, and O falls out of D easily, so long as you design your functions and libraries well.

So in short, the idea of SOLID is to write units that have a single, small responsibility, split large units into smaller ones, make the interface abstracted from the implementation, and ensure that it is easy to write alternate implementations of a given interface as necessary.

Those aren't OO ideas, those are just good design ideas.

noelwelsh · on Aug 24, 2016

Just got back from a meeting where we discussed on-disk data structures, in-memory data structures, and design of a query language for a time-series database with the goal of achieving <1s response time for queries on TBs of data. For comparison I worked on a project some time ago (arriving well after the original developers) that tried to store an event log in a relational database and ran into severe performance problems. Knowing a bit of theory sure helps avoid a lot of mistakes, and it's super interesting as well.

There's lots of good stuff in the front-end and web dev space as well. I know enough about UX to understand there is a lot of depth there. If you understand functional programming you'll go a long way with React and friends (and perhaps be able to innovate in this area as well.) Not all of this is in the typical CS curriculum but that doesn't invalidate the usefulness of what is.

philippeback · on Aug 24, 2016

When you deal with large data sets (e.g. Hadoop/Spark...), BigO starts to hit badly.

Dynamic Programming also requires you to have a clue about it.

See https://www.youtube.com/watch?v=OQ5jsbhAv_M for some cool explanations.

kamaal · on Aug 24, 2016

>>I have a degree in CS and I've never found myself in a situation where anyone would discuss bouble sort vs merge sort.

Yeah everyone knows stuff like that isn't helpful in your everyday work.

But that is what they ask you in the interviews.

The tragedy is you have to do this ritual every now and then just to get a job, and that knowledge is largely useless everywhere else.

mbostleman · on Aug 24, 2016

Then you've apparently never been in a technical interview with ALOT of companies.

jna_sh · on Aug 24, 2016

It has chapters on those things, including a chapter specifically on SOLID.

GFK_of_xmaspast · on Aug 24, 2016

Perhaps you just aren't taking/finding the more challenging jobs?

drak3 · on Aug 24, 2016

The excerpt on the Boolean Satisfiability Problem reads

> The basic concept that people have figured out, so far, is that a number of NP-complete problems can likely be solved if we crack the Boolean Satisfiability problem.

And

> If NP-Complete problems get resolved, it is likely (though nobody knows for sure) that we'll crack every NP-Problem

Isn't the definition of a NP-Complete problem exactly that it is in NP _and_ every other problem in NP can be reduced to it in polynomial time. So we know _for sure_ ([Cook71]), that as soon we have a polynomial algorithm for SAT _every_ problem in NP can be solved in polynomial time, and not just some of them as the excerpt claims.

Am I missing something? Because this seems like a very confusing, if not downright wrong, way to explain NP-completeness and its link to SAT.

[Cook71] Cook, S.A. (1971). "The complexity of theorem proving procedures". Proceedings, Third Annual ACM Symposium on the Theory of Computing, ACM, New York. pp. 151–158.

acs5 · on Aug 24, 2016

Also, the description of the boolean satisfiability problem isn't the boolean satisfiability problem at all, but just what we might call the boolean evaluation problem, or a version of the circuit value problem, which is certainly in P.

I have no idea what's going on in the lambda calculus excerpt further down the page, in particular substituting (λx.x x) with (x x)? There seems to be a fairly big misunderstanding here. And lambda calculus isn't reduced in any particular order -- there are many ways to reduce the same term.

tom_mellior · on Aug 24, 2016

I was going to post the same thing about satisfiability. The excerpt about the Y combinator is also misleading. There is no practical way in which the Y combinator "finds the fixed point of any function"; (Y cos) certainly does not magically evaluate to 0.793085.

I think books like this are a good idea, and having self-taught people write them is also a good idea. BUT it looks like this particular book is in serious need of quality control.

(Also it should be split up into several parts. Any book that teaches both the Y combinator and how to configure zsh is... weird.)

chrisseaton · on Aug 24, 2016

In my opinion there are also serious errors with the description of currying.

https://github.com/imposters-handbook/feedback/issues/50

lorenzhs · on Aug 24, 2016

The lambda calculus excerpt on the site has some issues as well. It says:

> Lambda Calculus is reduced from left to right, which is very important

No, you can do feasible reductions in any order. The problem is that you need to define β-conversion and normal forms (β normal from is obtained by repeatedly applying β-conversion to the leftmost redex).

Also, the example reduction is just wrong. You cannot transform (λx. x x) (λx. x x) into x x (λx. x x) - that's just not how β-reduction works. Applying a β-reduction to that term yields the same term, because you substitute the second expression for x in the first, thus obtaining the input all over.

lorenzhs · on Aug 24, 2016

Yes, you're absolutely right. The other excerpt paragraph right next to it is correct but terribly misleading:

> That "undecidable" part is what makes this problem [the Halting Problem] NP-hard.

I mean, that's true - but it's very misleading. The most common context in which NP-hardness is discussed is when talking about NP-complete problems - those that are NP-hard and in NP. The way you go about showing that a problem X is NP-complete is showing that it's in NP (most of the time, this is the easy bit) and then showing that it's at least as hard as some NP-complete problem Y. This is usually done by transforming an arbitrary instance of Y into an instance of X in (deterministic) polynomial time, and showing that the X-instance is satisfiable iff (<=>) the Y-instance is.

Then there are problems for which NP-hardness has been shown but it's unclear whether they're in NP. Those are often of a continuous nature. I think deciding whether a level of Super Mario World is doable falls into this category.

marcosdumay · on Aug 24, 2016

You aren't missing anything. It's wrong.

apeace · on Aug 24, 2016

I have a funny anecdote about CS.

I am a CS dropout who has been working in startups for a few years. About once a year, I see another programmer making a common mistake, and I draw on my CS knowledge to help them out.

The mistake is parsing HTML with regular expressions. It is so tempting to write a good ole' regex to grab that attribute value off of that element. And it works on the 5-6 samples you write your unit tests with. If you have run into this before, you may know that zalgo[0] tends to appear in this situation. Parsing HTML with a regex will always fail eventually.

Of course the reason is that HTML is not a regular language. It is a context-free grammar, and thus requires a parser to parse it.

The funny part is, I failed CS Theory once, and was in the middle of taking it again when I decided to drop out. But this tidbit of knowledge has always stuck with me, and I've used it again and again to fix or prevent bugs in real software.

Takeaway: CS knowledge does have real-world value to everyday programming. You just have to know what you're looking for. And of course, always use an HTML parser to parse HTML.

(I'll add one more note to anticipate a common response. If you look at any HTML parser, you will see regular expressions in the code. These regex are used for chunking the HTML, and from that point the chunks are parsed.)

[0] https://stackoverflow.com/questions/1732348/regex-match-open...

EDIT: Formatting

brassic · on Aug 24, 2016

I had to scrape the comments out of a blog. As you might expect, there was a div for comments, with comments nested within it. This made it easy to grab the comments using an HTML parser with xpath support.

Unfortunately, the blog software had a bug and it was possible for markup to leak out of the comments. Sometimes a spurious </div> would close the comments div and the scraper would miss comments that came after it. However, the HTML did contain helpful HTML comments, something like "comments start here" and "comments end here". The reliable solution was to use a literal string search on these HTML comments to pull out the entire comments section, and then to use regexes to pull out the comments' content.

The only unbreakable rule is that there are no unbreakable rules.

legulere · on Aug 24, 2016

I think the HTML/XML/JSON is not a regular language story is a bad one, because subsets of them are indeed regular. Most of the time the data you're trying to parse doesn't contain arbitrarily deep nesting and could actually be parsed with a regex. Further regexes of languages like perl can parse a superset of regular languages.

The real problem with regexes is that they are hard to maintain and are extremely hard to get right to neither have false positives nor false negatives.

ultramancool · on Aug 24, 2016

This is one I've never quite understood. Having written many scrapers I will say - Scraping is always hard to maintain. I've written regex and parser based scrapers many times, but it's still a need for constant updates as the page changes.

HTML will change about as often (only slightly less in my experiences) than regexes will need to be changed and can take more time to test and develop on each update, especially if you don't have access to a very durable parser which can stand broken HTML, unlike most XML parsers. So if a regex does the job, use it IMO.

apeace · on Aug 24, 2016

That is a fair point. If you're sure you have a regular subset of HTML, a regex is a great way to extract data from it.

In my experience (again about once a year), I have always seen this done with HTML that is coming in from the wild. After all, if your HTML is coming from a source you control, you most often have a means to provide the data in a format other than HTML.

GFK_of_xmaspast · on Aug 24, 2016

This is a case where a little knowledge can be a dangerous thing, because "regexes" are often strictly stronger than just "regular expressions".

(That said, the moral of this story is still correct, i.e. use a parser, but mostly because html-in-the-wild is uniformly awful and it's better to let someone else worry about that).

anexprogrammer · on Aug 24, 2016

The main thing people miss out on not having a degree is not getting past silly HR "must have degree" filtration.

Never once found a CS degree a worthwhile indicator of ability.

It may be a superb book, but not even giving a sample chapter out to judge writing style, quality of explanations, depth and so on?

Clubber · on Aug 24, 2016

I think there is more to a degree than that, but you make a good point. Now a days, instead of trying to figure out a problem, coders try to figure out which framework has already figured out the problem. It allows for less experienced people to get things done, but limits you to being able to mortar together bricks rather than make bricks.

The problem arises when you need to make bricks. You don't necessarily need a CS degree to make bricks, but you need to learn most things taught with a CS degree.

oillio · on Aug 24, 2016

Spot on. There is an old Joel on Software post that goes into detail on this issue: http://www.joelonsoftware.com/articles/LeakyAbstractions.htm...

The last paragraph is the money quote:

Ten years ago, we might have imagined that new programming paradigms would have made programming easier by now. Indeed, the abstractions we've created over the years do allow us to deal with new orders of complexity in software development that we didn't have to deal with ten or fifteen years ago, like GUI programming and network programming. And while these great tools, like modern OO forms-based languages, let us get a lot of work done incredibly quickly, suddenly one day we need to figure out a problem where the abstraction leaked, and it takes 2 weeks. And when you need to hire a programmer to do mostly VB programming, it's not good enough to hire a VB programmer, because they will get completely stuck in tar every time the VB abstraction leaks.

6DM · on Aug 24, 2016

Even funnier, coming across companies that don't accept anyone who doesn't have a degree from xyz university.

r0fl · on Aug 24, 2016

It's a way to easily filter when you get 1000 applicants for 1 job posting. Not the fairest but it's quick.

nulagrithom · on Aug 24, 2016

And an even easier way to filter out the company.

harperlee · on Aug 24, 2016

There is bias in the risk of their decision: hiring someone given that (s)he is good is not important - the important thing is that (s)he is good given (s)he is hired. That's why they are conservative in their selection process.

peruvian · on Aug 24, 2016

That's the case for some elite finance institutions, but does it happen with programming?

hibikir · on Aug 24, 2016

Oh yes. While this might not be a hard, publicly facing requirement, I have seen resume review processes that took into account things like school or whether you have worked for a hip, big name tech employer.

If your sensible resume with 10 years of experience doing work on an in demand field doesn't even get a recruiter call back, it might be because you did it for an enterprise company the recruiting team doesn't really know anything about. Chances are that no engineering manager ever got to read it.

I for one have worked with people coming from all kinds of backgrounds, and I don't think that the filtering makes any sense. My favorite software engineer hire had a Physics degree from Missouri-Rolla, and whose career highlight was work at a company that makes billing software for telcos. She would not have been given the time of the day in a lot of big tech companies.

If I was leading a company's recruiting strategy today, I'd aim at those kinds of candidates: If I aim at pedigree, I am competing, both economically and prestige-wise, with all the big names in tech. If instead I aim for the kinds of people that the market undervalues, I'll close more candidates and keep them longer.

softawre · on Aug 24, 2016

Sounds like a good movie in the making (Coderball?)

chris11 · on Aug 24, 2016

I once got a rejection from an internship application late one night a few hours after I applied. I wouldn't be really surprised if I had been automatically rejected. And my rejection might have been because the state university I went to was not specifically mentioned in the drop down list of schools, it was in the "other" category. I think I might have not gotten a rejection so quickly if I had been going to a target school like MIT or Stanford.

peruvian · on Aug 24, 2016

Sure, that's fair. Do you think you would've gotten at least a phone interview had you had a lot of experience and a good resume (aside from the non-target school)? I have friends who have such experience in finance and firms won't even look at them because of the university name in their resume.

chris11 · on Aug 24, 2016

I don't know what their criteria was. I guess they could have skimmed it for a few seconds. I actually just had a classmate tell me that they just had an interview for an internship at that company.

But I can say I do have a good resume. I have a little bit of full time experience (before I decided to go to school full time for cs) and an internship at a company they recruit from.

So it's technically not an immediate disqualifier. But I did get a rejection quickly enough that I think it might have been automatic, and my school is still listed as "other - please specify" on their internship application.

rianjs · on Aug 24, 2016

More likely their internship program was full already.

chris11 · on Aug 24, 2016

It's definitely possible. It was a startup, I think they only had a few thousand employees. But I think applied in the fall. So their intern program probably filled up really fast if it was full.

6DM · on Aug 24, 2016

I can't remember specifically, but my area is really huge on government contracting so I doubt it was a finance institution.

continuational · on Aug 24, 2016

Comments like these can only really have two causes:

- You've bought into the anti-intellectualism wave that's going on in the US.

- The universities near you are really poor.

Is it really the case that you learn so little on these universities that you literally have no advantage over those who didn't attend?

anexprogrammer · on Aug 24, 2016

I'm quite firmly against anti-intellectualism. Otherwise I think I'd then have to be anti myself!

In interviews, and across my 25 year career, I've met some excellent degree holders who brought some great skills to proceedings and a roughly similar number of excellent developers who didn't have the paper.

I've also come across occasional degree holders who I'd barely trust to make coffee let alone put near code.

In short, people are people.

Similar to other commenters I've found no correlation between degree on CV and later ability in employment, or it to be a useful indicator for prospective employees. In consequence I do find requiring a degree for applications silly.

_bbks · on Aug 24, 2016

Having a degree doesn't guarantee that you're automatically better than anyone without one, there will be good and bad people on both sides. People that don't take the traditional education route will generally be self studying through private courses or just reading lots and lots of books. It takes a lot of drive, determination and study skills to do it on your own too. Additionally many of them may have been able to break into the job market early, so they'll have years of real world experience and learning from peers by the time they would have usually finished their degree.

noobiemcfoob · on Aug 24, 2016

I think it's more many people vastly underestimate just how much they and others got out of their education.

Sure,everything is a bell curve, but spending the majority of your time thinking of little else beyond software concepts for 4ish years will fundamentally change how you think and reason.

collyw · on Aug 24, 2016

I learned a lot in my time at university, but to be fair it was just a starting point and I have learned a lot more by myself afterwards.

enraged_camel · on Aug 24, 2016

There are genuine questions surrounding how valuable CS degrees are (or should try to be) for professional software development careers. It has nothing to do with anti-intellectualism.

JeremyMorgan · on Aug 24, 2016

In some cases it's garbage in, garbage out. They may have been terrible going into university and came out a little bit better. They scrape by barely passing and come out with a degree that is just as valuable as the person standing next to them with natural ability, drive, and passion who excelled through school. After all half the graduates are below average.

cmrdporcupine · on Aug 24, 2016

Having a CS degree gets you through a Google-style interview -- where they will hammer you for hours on your ability to recall the skills you needed to pass your algorithms classes in university, and grade you almost solely on that.

And then you will start working there and almost never use those skills again, because you'll be using a vast library of datastructures and algorithms in the language of your choice, and the skill you will need the most is the ability to analyze why you might need one vs the other.

Unless you're on the team which is writing said libraries, which is rare.

nostrebored · on Aug 24, 2016

Implementation is only a portion of what's important in an interview. The thing you will not get anywhere without is a knowledge of what algorithms exist and when to use them.

cmrdporcupine · on Aug 24, 2016

You will not get through a Google style interview knowing what algorithms exist. You will be expected to implement them. On a whiteboard. Almost perfectly. Without access to a computer.

nostrebored · on Aug 24, 2016

"Only a portion". And almost-perfectly is a big deal. You're not expected to write compiler-ready code, just to show that you understand what's happening under the hood. Which meshes up with practical expectations -- if you can't understand what potential bugs/space/time complexity you're introducing when leveraging a library, then you're a fundamentally worse coder than someone who can.

estefan · on Aug 24, 2016

Except this is in fact bullshit since if you're ever working on some speed/memory critical code, that's the time to bother filling your brain with the different options available and picking the appropriate one. Carrying around arbitrary knowledge to show off is pointless.

Most code most programmers write won't ever run at sufficient scale or be sufficiently critical that CPU or memory bounds will matter, and using e.g. one sorting algorithm vs another will be largely irrelevant. It's recognising when you need to bother optimising that is the sign of a competent developer.

And asking someone to write actual (not pseudo) code on paper/whiteboards - as I've seen in interviews - is like assessing someone's driving ability by asking them to mime driving a car.

cmrdporcupine · on Aug 25, 2016

I agree.

But it's worth pointing out that a Google style interview also includes some portion which is "system design" oriented which would touch more on that kind of thing.

But they're more interested in avoiding false positives, than false negatives. So the process is designed to weed out most people.

datguacdoh · on Aug 24, 2016

I've interviewed somewhere around 200 people at this point and from my perspective, the degree, university or GPA have had no correlation with how well the candidate does. Now, I interview for Ops roles and not software engineering, so this might not apply across the board for other roles, but I have my doubts.

jdubs · on Aug 24, 2016

Ops is a pure experience job. A degree will never reflect how good an ops person is.

riskable · on Aug 24, 2016

This is a fair point but isn't software development (in general) trending towards this same reality? Consider a job where the plan is to write a straightforward web application that runs inside a custom-built Docker container. What sorts of skills are more important?

Someone with a CS degree but little experience is going to have a hell of a time writing a Dockerfile and the scripts necessary to safely and securely deploy their application. They may be great at writing web applications but "that's not the hard part." The hard part is getting the architecture right so that the app will stay up despite being deployed, rebuilt, and re-deployed constantly.

Someone who's self-taught probably picked up Linux experience along the way (because how else are you going to teach yourself web development these days?) and probably has experience with things like AWS and the realities of hosting a web application (as opposed to just writing a web application). They will have had to setup their own development environments, do their own Linux installs, and probably got used to figuring things out on their own.

The way I see it computer science is science. The point of a CS degree is to give someone a career path as a scientist. These are the sorts of people that should be figuring out the next great algorithm (for whatever), finding uses for quantum computing, and finding solutions to similar fundamental problems.

Software development isn't computer science. It's like the difference between the structural engineers that invented that hurricane-proof nail and the architects that decided to use it in their build.

smnscu · on Aug 24, 2016

It'd be interesting to correlate not just credentials to interview performance, but credentials, interview performance, and actual job performance. There is however the challenge that for the third aspect you can only assess the people that passed the interview and accepted the offer.

galdosdi · on Aug 24, 2016

It may even be useful to lack a degree, since the job market is otherwise good and you automatically filter out the lamest companies that way ;-)

(No, not a perfect heuristic, but I think it may not hurt)

isp · on Aug 24, 2016

Via patio11's tweet: "Pretty brilliant idea: 'Imposter Handbook', for teaching self-taught devs what they missed by not doing a CS degree." - https://twitter.com/patio11/status/767204505578409984 (with replies)

squeaky-clean · on Aug 24, 2016

I'm always pretty skeptical of anything that offers "Learn X Quick!". Is there any reason I should trust the author of this as a good source? They even state they only started learning this stuff a year ago. I wouldn't take a course or buy a book from anyone claiming only 1 year's study in any subject, why should CS be any different?

edit: It's also hard to find the author's name to find out who they are. It's not in text anywhere on the page, only in the image of the book cover.

ruraljuror · on Aug 24, 2016

Agreed, it is hard to find the author's name. I noticed that it was by Rob Conery in one of the blurbs and that turned my skepticism to interest as I am familiar with Rob as a frequent guest and collaborator with Scott Hanselman.

As a self-taught programmer (English/Classics major), I am (or was) the target audience for this book, but with deliberation have returned to school (ultimately pursuing a master's in Computer Engineering).

PUSH_AX · on Aug 24, 2016

http://rob.conery.io/about/

randomnumber314 · on Aug 24, 2016

>I've learned more in this last year since I started programming over 25 years ago

From the OP link. The author spent a year "filling holes" in his knowledge.

haddr · on Aug 24, 2016

Yep, people are really forgetting that most of the time it's just being clever rather than having (or using) in-depth knowledge is what allows to bring some good solution to the table.

An example from today: we have a very slow pattern matching code, that starts to be a bottleneck for the application. What can you consider? Well you can dive into sexy bloom filters, experiment with some Trie-based structures. But then when you analyse the problem it results that simple word lookup with a simple hashtable is the fastest solution for given constraints. No big deal, no rocket science.

Probably the same goes for rocket scientists, but one level higher ;)

pc86 · on Aug 24, 2016

Seven or eight levels, I would imagine.

scaleout1 · on Aug 24, 2016

when I was college I thought all the data structure and algorithm courses were complete waste of time. Primarily because I was working as an intern on the side, making crud applications, slinging xml, writing DAOs etc.

For first five years of my career I never had to touch any of the stuff I learned in school and I was particuarly happy that I mostly mailed it in in those classes. Eventually my career evovled into dealing with data at massive scale and working on some of biggest services on the planet and the way I have taught myself to program completely changed. No longer it was possible to just sling code and hope that it will just scale to million of users. All the stuff that I slept around in class was relevant again and I had to go back to coursera and take those classes all over again. So moral of story, if you will be slinging webapps rest of your life you probably dont need to know Big O, different search algo, linear algebra and statistics etc but if you think you will be working on stuff thats coming around like automanous cars, IoT, augmented reality etc, you should definitely read up on it

jasonjei · on Aug 24, 2016

I think one of things that "millennial" self-taught programmers have trouble understanding is OS. While the need to implement OS functionality is now a niche (like process scheduling), I have noticed that "millennial" self-taught programmers have gaps in knowledge with respect to threading, mutex/semaphore, consumer-producer pattern, synchronization vs. lock-free (blocking vs. non-blocking), concurrency and parallelism. I still feel these are essential topics to understand since these issues often come up in most programming languages (JS, Ruby, Go, etc). When deciding between Nginx or Apache, for example, knowing the difference in philosophy is useful (asynchronous vs. thread-based). One of things that CS trained me very hard was to think about cost; when I programmed before my CS education, the old adage of everything looking like a hammer was particularly true for me. Also, many of the self-taught guys need to use fewer libraries/gems and more stdlib and primitives because every additional include is not always necessary and may unnecessarily increase complexity :)

Pipelining is another concept that I feel self-taught programmers have weaker foundations--many of whom I have worked with write code that waits for all results to become available, while an operation that's blocked by I/O doesn't necessarily mean we can't do stuff with the CPU while we're waiting for the next batch of I/O to come through.

I have also seen self-taught programmers accidentally write O(n!) or O(2* * n) functions and not realize it. I think data structures is definitely a good chapter to have. Especially when writing queries to a data store.

I think explaining how a hash table works would be excellent since it is such a useful and fast data structure. A lot of set-taught programmers sort of treat them like magical black boxes when it's not a very complex data structure yet it's practically O(1) for most insertion/reads/deletions.

Memory management is fortunately something we don't really need to worry as much about. With languages and interpreters that do a very good job of cleaning up after our code and now that memory is relatively cheap, we can afford to ignore it until we need to scale.

Of course, if you have a good product, you can get away with inefficiency and hire CS guys when you have built a unicorn. ;)

nspriego · on Aug 24, 2016

Just curious, why the "millennial" descriptor? Is having trouble understanding the OS something unique to only "millennial" self-taught programmers?

Ensorceled · on Aug 24, 2016

In my experience, Gen X and Boomer self taught programmers spent a lot of time hacking at the application or system level and have less holes in that area. They have other holes in their knowledge base, just not those ones.

jasonjei · on Aug 24, 2016

That's exactly the reason I chose that descriptor. In fact, the field "Computer Science" may not have existed before.

collyw · on Aug 24, 2016

Everyone has gaps in their knowledge. There are two devs where I work, and we are both better than average in my opinion. I know Django inside out and deal with server deployment, while the other guy knows more front end and HTTP / web stuff. We both know how to use a database properly and not to over engineer stuff. We compliment each other pretty well.

jasonjei · on Aug 24, 2016

Right, and I agree everyone has gaps in their knowledge. People obviously specialize and I wouldn't ever work on compilers, for example.

I'm just describing what I often see the self-taught "millennial" programmer is missing with a minimal number of data points. Doesn't mean all of them do, but hopefully by me listing it here a book such as the above can tackle these topics. I hope somebody reading this doesn't take offense of my laundry list; it's just intended to show what I think would be useful to cover in such a book given their importance to app development.

I'm no way knocking self-taught programmers. They learn a lot faster than those who went the formally trained route. One of my interns who was trying to get a master degree in CS I showed how to link to a library 3-times, and he still couldn't get it.

xenihn · on Aug 24, 2016

I appreciate this post. I definitely lack knowledge on the stuff you described, sadly...

fitzwatermellow · on Aug 24, 2016

As an alternative, I would recommend something like Michael Kerrisk's The Linux Programming Interface in lieu of a survey of foundational CS. Gaining a deep understanding of memory, files, processes, threads, signals, sockets, etc. As well as strong Emacs-Fu and bash scripting ability. These are the first steps on the path to mastery ;)

riskable · on Aug 24, 2016

What's interesting is that what you're talking about is computing fundamentals. Stuff that kids should be learning about in high school before they go to college. At the very least they should teach kids what happens when they type a character in a text editor and then save that as a file. Kids should know that the key switch state change gets detected by the keyboard hardware, sent as a signal over the wire, detected/handled by the OS/kernel/driver, sent to the program as an actual keystroke which decides to "display" it to the user by updating the interface, etc etc.

Discussions surrounding what happens when two programs try to write to the same file. How to detect when a file changes. Stuff like that. These things just don't seem to come up in high school education and I can't help but wonder why. It'd go a long way to giving people explanations as to what's wrong with their computer when it's running slow or a basic means of interpreting error messages/conditions.

anoother · on Aug 24, 2016

Proofreading error near the 'buy' button.

> I've learned more in this last year since I started programming over 25 years ago.

Should be '... this last year than since ...'

Personally, I think '... the past year than since ...' reads better also.

Do I get a free copy of the book for pointing that out? ;)

mkchandler · on Aug 24, 2016

There is a repo for posting feedback (such as proofreading errors) here: https://github.com/imposters-handbook/feedback/issues

bogomipz · on Aug 24, 2016

I am curious, do people actually pay $30 for an ebook without even being able to see a(propsed) table of contents of a sample of the writing?

This is unusua. With both Amazon and LeanPub you can at least gauge the writer's style or get a feel for writing quality by looking at a sample chapter and a table of contents.

I'm skeptical that all of those people praising the book bought the book site unseen.

ljk · on Aug 24, 2016

apparently this is the ToC for "pre-release #2" http://i.imgur.com/ssYr5ki.png

seems like pretty simple stuff; couldn't all the information be found with one internet search away?

bogomipz · on Aug 24, 2016

Couldn't the table of contents be available on the official site for the book?

I should instead resort to doing a separate Google search for the TOC?

cookiecaper · on Aug 24, 2016

This thread is long and this will probably get buried, but I'll leave it here anyway. I'm really excited about this kind of thing and have sometimes thought of writing one myself, primarily as a means for me to hammer out all the theoretical areas that are still foggy for me as an autodidact. If this can make it easy to pass impractical textbook-style interview questions and give a good, reliable foundation of CS knowledge that won't go away ten seconds after closing the Wikipedia page, I'd love to buy it (and I still may write my own some day just for good measure :P). I think autodidact programmers is a rapidly-growing and under-served market (though, unfortunately, I don't think it'll be allowed to go on much longer; I expect professional licensing organizations similar to the ABA to show up on the horizon soon).

pjc50 · on Aug 24, 2016

I like the approach of hammering out knowledge through writing it down. I taught myself to fill in a lot of knowledge gaps by answering questions on electronics.stackexchange.

brians · on Aug 24, 2016

The example text about the y combinator looks mistaken to me. It says Y can "find" a fixpoint, and sketches an example of a fixpoint in a numerical function. It implies Y is doing something like convergence.

But that's not what Y is at all. It's called the fix point combinator, yes, but with the assumption you're going to use it in some curried lazy evaluation scheme with higher order functions.

All this on ycombinator.com, too!

LeonM · on Aug 24, 2016

Anyone knows if there will be a hardcopy available? I live a paper free live, except for books, I just hate reading from a screen...

gagagababa · on Aug 24, 2016

From the "Questions" section:

Will there be a print version?

That's my goal, yes. I want to be sure all the edits are made and, technically-speaking, the book is completed. If I do end up with a paper edition, I'll send out a note to the mailing list in late September, early October.

edit: Formatting.

LeonM · on Aug 24, 2016

Thanks, I missed that because apparently the site only loaded half on my browser, it's probably being hugged to death right now...

RawData · on Aug 24, 2016

What programming language is generally used throughout the book?

internals · on Aug 24, 2016

C# & Javascript/Node. You can find the sample code from projects in the book here: https://github.com/imposters-handbook/sample-code/

ManlyBread · on Aug 24, 2016

For C# I recently found this course, looks pretty good: http://www.brpreiss.com/books/opus6/html/page10.html

Jayakumark · on Aug 24, 2016

He also wrote another book http://www.redfour.io/ take off with elixir. Anyone have read that ?

cheeseprocedure · on Aug 24, 2016

I went through the video version. I found it worthwhile overall, but grew frustrated later in the tutorial as the code displayed in the video (and in the associated GitHub repo) drifted substantially from what it had actually been guiding me to build.

Jayakumark · on Aug 24, 2016

Thanks, will wait for final release, by then i guess the code should sync up with video.

outworlder · on Aug 24, 2016

Is there a book like this for Calculus? My CS is fine, but I had horrible Calculus teachers and was unable to cross the chasm myself.