Rob Pike’s Rules of Programming (1989)

no_wizard · 2023-11-01T14:35:01

I love this

>Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

I completely agree with this. Which is why all the LeetCode interviews always struck me as odd. They focus on algorithms, not data structures, which is exactly what you don't want to do out of the gate, most of the time.

I suppose, if you don't know algorithms at all you wouldn't realize when it is either

A) an exception to the rule

or

B) A time where you need to lean into specific algorithm for X, Y or Z reason

However, you can teach algorithms in relatively short order, I have honestly found people grok which data structures to use less easily, though its all anecdotal

softfalcon · 2023-11-01T15:32:06

I can also anecdotally agree with your analysis. I've seen for myself how much more some folks will respect you during interviews if you "get past" the part about fizzbuzz (algorithms) and move onto immediately talking about data structures, architecture, and how that analogues to the domain in question.

All of the sudden, the interviewer relaxes and you can just see on their face the, "oh, we've got an actual senior engineer on the call here." They open up about their technical problems and stop caring about "proving if you can even code".

Similarly, the teams I've had the most problems creating positive change, meeting milestones, or collaborating together properly as a team with is where no one seems to have a good grasp of data structures and code architecture. It seems to have become more common too. Many folks are quite used to having a framework that can just do everything, and if it can't, someone "smarter than them" has made a plugin or middleware to save them from having to think too hard.

I personally find engineers who avoid data structures to be shooting themselves in the foot, they're giving up on one of the most useful tools in their arsenal and I witness how it limits them day-to-day. It's tough to watch.

pklausler · 2023-11-01T17:04:00

Asking "FizzBuzz" is not done to test algorithmic knowledge.

dagw · 2023-11-01T15:05:25

Which is why all the LeetCode interviews always struck me as odd. They focus on algorithms, not data structures

I helped my nephew preparing some competitive coding event, and transforming the data into the right data structure was a huge part of solving most problems. For example most of the code you might end up writing is for finding the longest path through a weighted DAG, but the trick was realising that you could represent the problem as a weighted DAG, and if you did then the length of the longest path through that graph was your answer. If you didn't spot that then you might very well still be able to solve the problem, but your solution would end up being much slower and more complicated.

mikhailfranco · 2023-11-01T16:22:39

This is the low-hanging fruit algorithm:

- build the DAG with heavy nodes (e.g. lead split-shot)

- use string or fishing line for edges, with length proportional to 'weights'

- hold the root node(s) high in the air, if more than one, hold them together, at the same height

- the lowest node, or the length of taught line (drop distance below your hand), is the answer

a1369209993 · 2023-11-01T19:31:03

That doesn't actually work, because a DAG only lacks directed cycles. Consider:

  /-3-> D
  A -1-> B -1-> C
  \------4------^

In this case, the longest path is A-C, but C will be held at level 2 by B, leaving D lowest at level 3. This works for a fully-acyclic directed graph (eg a tree), but not for a directed-acyclic directed graph. It can also find the node whose shortest path is the longest (so longest shortest path, minimax style). (It also works for a acyclic undirected graph, as illustrated at [0], but that's a bit tangential.)

0: https://www.youtube.com/watch?v=wGrOPSBPpyk

mikhailfranco · 2023-11-02T05:10:20

Yes, you are completely correct.

It only works for fruit trees.

a1369209993 · 2023-11-03T20:51:29

Actually it works for all trees, fruit or otherwise (and also for some non-tree graphs, though I'd have trouble giving a useful category more general than tree).

roughly · 2023-11-01T17:01:17

What’s great is that it’s constant time, once you’ve prepared the data structure. Never underestimate the parallelism of physical systems.

pfisherman · 2023-11-01T15:54:16

How much of this is finding the right data structure (graph) vs translating the problem into a new domain?

I think maybe the former follows from the latter?

jasode · 2023-11-01T14:55:14

>Which is why all the LeetCode interviews always struck me as odd. They focus on algorithms, not data structures,

The typical leetcode questions also focus on "data structures" because the candidate needs to have the mental catalog of data structures in his brain to pull from when he "pattern matches" on the problem/solution.

The interviewer isn't going to volunteer whether the candidate needs a "priority queue", or "adjacency matrix", or "trie", etc to solve various algorithm questions. If the candidate is stuck, then the interviewer might give a hint to the data structure to use. But of course, too much hand-holding will not make the candidate come across as a strong hire signal.

no_wizard · 2023-11-01T15:03:18

Its been a time since I interviewed in such a manner, perhaps things have shifted. When I was interviewing a few years back, it was algorithm heavy, often a basic data structure was provided and there was alot of focus on traversal (hence the memes around reversing binary trees and such)

Icathian · 2023-11-01T17:22:51

To add another point of anecdata, my experience very recently at tech companies aligns with the parent comment.

bitwize · 2023-11-01T14:39:52

"Show me your flowcharts, but keep your tables hidden, and I shall continue to be mystified. Show me your tables, and I won't need to see your flowcharts, they'll be obvious."

kwhitefoot · 2023-11-01T14:57:29

You forgot the credit: Fred Brooks, Mythical Man Month,

jrpelkonen · 2023-11-01T19:05:30

Great quote and so true in the general case. Unfortunately, I have seen some database designs that have left me more mystified than I was before.

capableweb · 2023-11-01T22:28:05

I've seen programs that got more mystical the more the employees showed me how it worked internally.

mikebenfield · 2023-11-01T15:31:40

I don't really see how you can choose one without at least a rough idea of the other. How do you know what data structure to use if you have no idea how you'll be accessing the data?

softfalcon · 2023-11-01T15:43:04

You're not wrong, data structures and an algorithm for data retrieval are often connected and developed together. You often see someone come up with a novel way of traversing data, then they model an elegant data structure to match it and enable said traversal.

What isn't covered in this flow is how composition of multiple data-structure/algorithm pairings work together amidst many others. It's often not an algorithm on its own that defines an overall solution. It's the connecting of many of them together, to build a whole software stack, domain language, etc.

When you understand the algorithms, the corresponding data structures, when to use them, similarities in various frameworks in your chosen domain, how that analogues to your current problem set, etc you really start getting somewhere.

eska · 2023-11-01T16:04:30

* data structures and an algorithm for data retrieval are often connected and developed together*

Often? Always! An algorithm always only works on a certain data data structure.

nonrandomstring · 2023-11-01T16:22:13

Or as Fred Brooks put it:

   " Show me your tables, and I won't usually need your flowcharts;
   they'll be obvious."

But to be fair, you need to be a pretty damn good comp sci person (and actively thinking in algs and DS) to quickly look at DS and see the "obvious" processing implied.

sfn42 · 2023-11-01T17:08:28

A problem I see a lot is when the flowcharts should be obvious, but aren't because whoever wrote the code didn't write the obvious solution.

Instead they wrote a horrible buggy mess that tries (and usually fails at least a little) to do the same thing as the obvious solution but with much more, and more convoluted, code.

softfalcon · 2023-11-01T16:58:31

I appreciate your particularness in this regard, and you'll have to forgive me as I've spent a lot of time with folks who are very wary of the word "data structures".

I tend to use softer words like "often" as it makes folks feel less defensive towards the discussion. If someone came up to you and told you that your outlook on something is 100% definitively wrong, you might balk at the statement merely because I said "100%". Just as you have stated "Always!" and corrected me so emphatically.

Since I've found this to be a difficult topic for some, and given this is a public forum, I chose to be cautious in my wording.

a1369209993 · 2023-11-01T19:57:49

Also they're just straightforwardly wrong. For example, binary search works on a array, or on a predicate like \x->(x*x <= 2.0), or on a hash table with contiguous integer indexes, or even on a linked list. Of course it works very badly on a linked list (worse than linear scanning unless the comparison is ruinously expensive), but they didn't say "An algorithm always only works properly on a certain data data structure.".

eska · 2023-11-01T21:54:27

A binary search on a linked list is a different algorithm than on a a sorted array (which is different from a generic array). In this case linked lists don’t have random access for example. So binary search on a linked list is actually not possible.

a1369209993 · 2023-11-01T22:13:27

> linked lists don't have random access

`list.get_nth(n)` has O(N) runtime, as does `list.length()`, so binary search is actually completely possible, with runtime O(N^2) (aka "works very badly").

(Fair point that all four data structures need to be sorted, though, although ideally that would go without saying, since it's kind of inherent in what binary search is.)

eska · 2023-11-03T11:42:42

Having random access means that it‘s O(1).

a1369209993 · 2023-11-03T20:47:21

No, having efficient random access means that it's O(1). That just usually goes without saying, because anything that has access at all can be made to have inefficient random access. But in this case, the fact that it's (necessarily) inefficient is the point: binary search works very badly on a linked list, and you shouldn't actually use that algorithm with that data structure even though you technically can.

Conversely, using the same algorithm on a (sorted) array, (monotonic) predicate, or (sorted,contiguously-keyed) hash table works fine, even though those are three different data structures.

eska · 2023-11-04T09:54:37

At that point the definition would be useless. It’s not the definition I’ve seen during my CS education, in papers and even what’s written about it on Wikipedia.

no_wizard · 2023-11-01T16:43:46

I have found you can, even if its slowest of the slowest, brute force your way through traversal of a data structure, if you need to, since most languages (all?) give you iteration out of the box.

Being able to choose the appropriate data structure and explain trade-offs etc is much more valuable than simply knowing how to reverse a binary tree sorta stuff.

As I noted elsewhere in the thread, I haven't interviewed in awhile where I needed to grind out DS&A questions, but at the time where I did, I often found myself given a particular structure and asked to traverse it in some way, with is pretty heavy on the algorithmic thinking but not really testing my knowledge on data structures themselves.

Sounds to me like things have become a little more even handed out there

BearOso · 2023-11-02T01:38:32

> Being able to choose the appropriate data structure and explain trade-offs etc is much more valuable than simply knowing how to reverse a binary tree sorta stuff.

Indeed. The simplest way to invert a binary tree is to include an "invert" flag in the data structure. If it's set, walk the tree right to left instead of left to right.

bob1029 · 2023-11-01T17:47:40

All software is effectively a set of ETL jobs with a user-friendly interface wrapped around. Those fancy algorithms, languages, frameworks, etc are simply a very ceremonious way to get data from A to B.

The schema/data are completely agnostic to the computation. You don't even need a computer. 100% of schemas are possible to print out on sheets of physical paper and iterate offline.

gridspy · 2023-11-01T19:08:19

Computer used to be the job title for people who DID do this by hand. It was co-opted by the technology which replaced the job.

Of course modern computation is often impractical to do by hand. It might even be so complex that humans would make too many errors and take to long to ever complete correctly.

esafak · 2023-11-01T19:55:57

Common use cases were abstracted into libraries and services because nobody should have to reinvent the wheel. This let people operate at a higher level of abstraction and concentrate on more complicated tasks. These too became abstracted, and the cycle repeated.

sirsinsalot · 2023-11-02T01:11:34

While true, I think this is reductive.

When you're dealing with most modern software, there's so many of these ETL-like pipelines, abstractions, black-boxes and just sprawling complexity that the job becomes much more about complexity management than anything.

No component exists in a vacuum, I would argue software is as much about managing the complexity of the data and algorithms towards a purpose as it is about the purpose, data or algorithms.

Entropy comes for us all, after all.

jacobgorm · 2023-11-01T16:24:38

See also Peter Naur's letter to the editor in https://dl.acm.org/doi/pdf/10.1145/365719.366510 . In Denmark, CS is not called CS but Datalogy.

asalahli · 2023-11-01T18:35:21

I prefer the term Informatics over Computer Science for this reason.

lawn · 2023-11-01T14:54:17

I'm not a fan of leetcode, but when I did some competitive programming it was quite common that you first had to transform the input data into the proper data structure, and then you can apply some algorithm to produce the answer.

galaxyLogic · 2023-11-01T17:44:57

If you want a better data-structure you must ask: Better for whom? Better for the algorithm that manipulates the data!

So I don't think that "data dominates". You may need to adapt your data to the algorithm or vice versa. What dominates is what we want to do. Without algorithm we are doing nothing.

In a sense data is part of the algorithm, it is implicitly coded into the assumptions the algorithm makes about its arguments and results.

namaria · 2023-11-02T10:06:58

On the other hand, algorithms are just data. In the end it's just a set of bytes that determines what happens to other sets of bytes. Data structures and algorithms are two sides of the same thing. In the abstract meaning space in our minds, data structures are an extension of the algorithm's logic. In the memory space, algorithms are just data structures that change other data structures.

Seeing the connection from both sides is insightful.

galaxyLogic · 2023-11-02T21:37:44

Yes. But there is a difference between data and program. Program is data as you say but it must be data that can be interpreted as a set of possibly conditional steps. It is often pointed out that Lisp programs are "just data" but not every list can be executed as a function.

Further programs are what interpret data, by making branching decisions based on the data. Program must be interpreted by the language interpreter, and then on the 2nd level the interpreted program interprets the data.

samhuk · 2023-11-01T17:31:09

Although LeetCode does have a strong algo slant, choosing the optimal data structure is almost always a key part in solving the problems.

If you think leetcode-like problems is always about the algo, "BFS or DFS" etc., then at best you are not realizing the data structure choices you are making, at worst you may not be so good at solving them or haven't progress that much through leetcode-like challenges.

tabtab · 2023-11-01T20:51:21

Indeed! I've long been a fan of "table oriented programming" where the common CRUD objects are specified mostly as data. You could also create them in code by calling a RAM-table constructor, so it's not either/or. Most the data fields, navigation structure, and event handlers (or handler stubs) could be readily table-ized.

Code is a lousy place to store boat-loads of attributes. You can do data-oriented transformations and filters on them if in a data structure. Hierarchical file systems are limiting and messy, because grouping by one factor de-groups another. I want to be able to "query" dev code units via SQL or similar. CRUD patterns are pretty consistent across industries such that most CRUD idioms shouldn't need custom re-invention, and thus should be attribute-tized.

You will still need occasional code-based tweaking, and this can be accomplished by having the attributes generate "runtime draft" UI markup and/or SQL clauses, which can then be tweaked via code as needed.

I'm building a proof-of-concept that uses what I call "fractal rendering" to be able to intercept the "draft" construction of screens and SQL clauses at a fine level or course level, depending on need. This avoids the all-or-nothing problem of prior attempts per attribute-vs-code debate. Ya git both! (Dynamic SQL generation should probably be avoided for public-facing sites, per injection risk, but limits the wonderful power of query-by-example.)

I don't claim it will be as performant as the code-centric approaches, but if it catches on, performance tweakers will find a way to scale it. (For CRUD apps, the performance bottleneck should usually be the database, not app code anyhow, unless you doing something wrong or special.)

The CASE tools of the 1980's and 90's started going in this direction, but were too expensive & clunky, and then the OOP push ended the idea by favoring attributes-in-code, back to square one, sigh. (CASE tools don't have to be proprietary.)

It's ripe area for groundbreaking R&D. CRUD may not be sexy, but it runs the world.

usrbinbash · 2023-11-01T19:16:00

> Which is why all the LeetCode interviews always struck me as odd. They focus on algorithms, not data structures, which is exactly what you don't want to do out of the gate, most of the time.

Which is one of the reasons why I will never use that style of problems as an interview question.

And the other reason is: that style of problems also don't teach anything about architecture.

Swizec · 2023-11-01T15:01:14

> Which is why all the LeetCode interviews always struck me as odd

90% of leetcode is choosing the right data structure. If your algorithm doesn’t wanna come together, you probably missed a better way to model the data.

danielmarkbruce · 2023-11-01T15:26:50

I had a checklist one time for interviews. One of the items was: "consider data structures x, y, z" - only maybe 4 or 5 data structures. It worked 95% of the time.

hdlothia · 2023-11-01T16:29:59

Which data structures were those?

danielmarkbruce · 2023-11-01T16:59:20

Oh, I don't have the list I used to use, but it was very basic stuff.

hash table, stack, queue, linked list, sorted array, binary tree, graph.

Many problems seemed difficult until I just considered shoving the data into one of those.

TeMPOraL · 2023-11-01T17:11:17

> >Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

I've been thinking about it a lot. I want to still believe it, but then my experience tells me that the "right data structure" is usually "the data structure most convenient for the operations/algorithms you want to run". Which makes this principle either backward, or circular.

nyssos · 2023-11-01T22:29:58

The right data structure is the one that's produced and consumed in a way that matches the domain you're trying to model. That domain can be inputs to a particular algorithm, but it can also be things like "states of this system" (constructors correspond to state transitions), "witnesses to this property" (constructors correspond to valid inferences), etc.

TeMPOraL · 2023-11-02T00:39:28

Leaving aside the unusual, to me, meaning of "domain" in this context, your description matches what I wrote above: the right data structure is determined by what you're planning to do with the data.

It only convinces me further that Pike's take is wrong. Specifically:

- "Data dominates" and "Data structures, not algorithms, are central to programming" are just plain wrong;

- "If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident" is correct only in the descriptive sense, not in the prescriptive sense. That is, it only works when you read it as restatement of Brooks's "tables vs. flowcharts" line (quoted elsewhere in this thread): it's easier to guess what your code is doing from how you organized your data, than it is to guess how you organized your data by looking at the code. I.e. it's an advice on where to start understanding someone's existing solution; it has nothing to say about solving new problemms / writing new code.

yxhuvud · 2023-11-01T15:54:23

But the distinction is meaningless. A data structure is just a collection of algorithms for working on underlying memory. More complex algorithms are built with simpler algorithms as building blocks. With that in mind, it essentially boils down to picking algorithms which when put together allows solving the problem in an as easy way as possible.

Which of course is self evident. Though I suppose it doesn't hurt to remind people to step back and look at the full picture.

e12e · 2023-11-01T16:02:11

You need to do data modeling and (de) normalization even if your data is in "data structures", not an SQL DBMS.

dzonga · 2023-11-01T21:56:13

my algorithms D/S professor told me one trick. always use a 'map'.

works for most cases.

gigatexal · 2023-11-01T15:30:21

This point I really resonated with as well. Really cool. A lot to think bout here.

karmakaze · 2023-11-01T14:50:06

This should be Rule #1.

epiccoleman · 2023-11-01T15:02:57

> Fancy algorithms are slow when n is small, and n is usually small. Fancy algorithms have big constants. Until you know that n is frequently going to be big, don't get fancy. (Even if n does get big, use Rule 2 first.)

Another point on this, which I really saw in action on a recent project - a "big" n is probably much bigger than you think. Sometimes it can be easy to think "oh, this is going to have to do 100,000 operations, I definitely need to optimize it" - but computers are fast, and 100,000 multiplications (for example) happens so fast you probably don't need to think too hard about it.

That's not to say you shouldn't think at all, just that it's often surprising how insanely fast modern computing hardware is.

kevincox · 2023-11-01T16:53:48

I don't know how strongly I agree with this one. Quadratic algorithms are the kind of thing that bite you when you least expect it. I have seen production outages due to accidentally quadratic code and it is also the type of code where some users are suffering with a really slow application because they are frequently experiencing a big N even though 99% of users have a small N all the time. In most cases I would prefer to pick a less than quadratic algorithm even if it is a bit slower for the common case and a bit more complex to implement. Slow common cases get optimized, slow in rare cases often slips by the developers (who don't hit those cases) or break in production.

Of course this is a tradeoff with Rule 4. If the fancy algorithm is much more complicated I will be more likely to pick the simple one even if it is quadratic and the data has the chance of occasionally being large. But by default I try to stick sub-quadratic if possible.

I wrote an article about this relatively recently: https://kevincox.ca/2023/05/09/less-than-quadratic/

lynguist · 2023-11-01T23:08:19

We recently had a HN article about it: https://news.ycombinator.com/item?id=26296339

Some amazing person found that the load time of GTA V Online was awfully slow. It was slow because the developers put in a quadratic function at a time when the iteration object was small. But over time that object grew and so grew the load times quadratically until it was unbearable. And it was unbearable for millions of GTA V Online users.

I fully agree that the accidentally quadratic functions will eventually come back to bite you/the user.

mumblemumble · 2023-11-01T16:15:18

The memory hierarchy plays into this, too. A lot of fancy algorithms have poor locality of reference and additional branching, so they tended to work better 40 years ago when CPUs weren't that much faster than memory and branch misprediction wasn't a thing that people had to worry about on consumer-grade hardware.

nine_zeros · 2023-11-01T15:38:02

I also keep seeing leetcode interview problems about iterating over a list of 100k items a few times. I can see that it is not optimal but iterating over 100k items is NOTHING compared to the network call you made right after that iteration - in terms of actual production time.

And every time I interview, the hiring managers want me but my hiring decision gets vetoed by a leetcode newbie who hasn't experienced production scars yet.

mikhailfranco · 2023-11-01T16:37:42

So true. Managers really want to hire architects/seniordevs who are experienced, practical, smart and get things done. But if they let the other devs into the interview process they will get vetoes all the way down, because the reports will be:

- envious of your amazing pragmatic and effective skills

- jealous guarding of the architect promotion that they covet for themselves

wizofaus · 2023-11-01T18:54:16

Good managers for dev teams should have enough technical knowledge themselves and demand explanations from participating devs why a candidate is or isn't good enough to see through that though. Further personally as a tech lead I've always been keen to take on new devs that clearly are a cut above, as they usually mean an opportunity to work more effectively as a team. And I really don't want to spend even more of day doing reviews of mediocre code.

foobarian · 2023-11-01T21:12:24

> And I really don't want to spend even more of day doing reviews of mediocre code.

Or writing essentially pseudocode in Jira description for the dev who can't figure things out. Ask me how I spent my day

nine_zeros · 2023-11-01T21:16:21

> Good managers for dev teams should have enough technical knowledge themselves and demand explanations from participating devs why a candidate is or isn't good enough to see through that though.

I have a hunch that this is quite rare in most companies. Most managers are unskilled enough to override their own intuitions in favor of the mediocre leetcode dev that just vetoed a strong engineer.

diarrhea · 2023-11-01T16:37:00

In an async framework of execution, this doesn't apply. A lot of programming happens in that space, and in it, the network call is "free", but you're clogging the thread(s) with actual CPU work. If execution is single-threaded, the problem becomes very relevant, but it applies to multi-threaded async just the same (you might exhaust the pool of workers).

Keeping this in mind isn't esoteric either, as it applies to JavaScript, Python, Rust, C#, and probably others.

nine_zeros · 2023-11-01T16:52:51

> In an async framework of execution, this doesn't apply.

That's right. Async execution prevents the IO from being the bottleneck by offloading it to a different thread.

There are 3 situations where this statement falls apart:

1. If the execution is single threaded, as you rightly pointed out

2. If the response of the async execution matters to the final response of your service. In this case, the primary thread may finish its work but its still waiting for IO to complete. Basically making it synchronous but using async primitives.

3. The CPU utilization of iterating over 100k items in a list is negligible compared to the hundreds of daemons and services running on the host. Even a docker container will utilize more CPU than iteration over 100k items.

The point is: over-indexing over iteration and time-complexity in interviews is pointless as real systems are going to face challenges far beyond that.

mikhailfranco · 2023-11-01T16:44:15

The canonical reference for this:

Scalability! But at what COST?

https://www.frankmcsherry.org/assets/COST.pdf

zoogeny · 2023-11-01T18:06:46

When I got my first real programming job at a games company in the early 2000s, the Technical Director of the project once gave me some advice: if the number of things you are working with is on the order of 10,000 then don't bother optimizing it. Given the increase in computer power in the last 20 years I believe bumping that to 100,000 is pretty appropriate.

wizofaus · 2023-11-01T18:45:49

And yet some of the worst performance issues I've had to deal with were in code typically dealing will merely 100s of items, but using algorithms and slow network-based operations that caused everything to run sluggishly most of the time and not infrequently making the resulting user experience intolerable.

I do agree though that a lot of time is wasted on premature or even completely counterproductive optimisations for situations where the data being processed is too small to cause noticeable slowness in processing.

kagakuninja · 2023-11-01T16:53:03

I was testing something, and wanted to add some useless for-loop addition code to simulate "doing work". I had to make huge nested loops before I noticed any significant CPU usage on my laptop.

ska · 2023-11-01T17:30:25

Optimizing compilers are tricky this way, if you want to do it with optimzations turned on, you usually have to make the work "real" in some sense. Sometimes making enough nesting depth that the compiler can't fully reason it out works, but usually it's easier to modify some memory it can't be sure isn't touched otherwise (and hence elide or register allocate it or whatever).

cratermoon · 2023-11-01T17:08:19

Were you using a language where the compiler/interpreter was smart enough to optimize out certain kinds of busy loops? It can sometimes take a little extra work to convince the compiler or runtime. In C the keyword 'volatile' can help.

fasterik · 2023-11-01T18:51:01

Not only that, but often our intuitions about what is "fast" are wrong if we are basing them on theoretical big-O concerns rather than the specifics of modern hardware. One example that's ubiquitous is using a hash table with chaining when a linear array lookup would be faster due to cache locality.

lkjflakjsdeowe · 2023-11-01T14:41:21

Sigh, here we go again.

> Tony Hoare's famous maxim "Premature optimization is the root of all evil."

It's actually from Donald Knuth and this quote is frequently taken out of context to argue against optimization in general.

Here is the entire quote

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."

The point is to spend time optimizing where it will have impact.

karmakaze · 2023-11-01T14:53:59

Knuth attributes it to Hoare, and Hoare attributes it to Knuth. So it come's down to who you want to believe. Probably best to attribute it to both. My guess would be that Tony said it first, Knuth refined and printed it.

It's always good to have the longer quote which gives needed context.

eesmith · 2023-11-01T17:25:29

Hoare attributed it to Dijkstra. See https://hans.gerwitz.com/2004/08/12/premature-optimization-i... .

"I’m sorry I have no recollection how this quotation came about. I might have attributed it to Edsger Dijkstra."

karmakaze · 2023-11-01T18:29:19

Someone needs to make a Spider-man meme with the quote.

zelphirkalt · 2023-11-01T21:06:45

Maybe it is in the end a secret deal between them, to have a joke about circular references. ; )

preommr · 2023-11-01T15:00:27

Also people forget that quote is from the 70s. Almost 50 years go.

Programming used to be very different from what it is now. "Premature optimization" wasn't "hey just use this popular lib cause it scales", it was "let's use some impossible to understand bit fiddling algorithm that only works on this piece of hardware".

ska · 2023-11-01T17:00:08

> Programming used to be very different from what it is now

Programming has definitely evolved. This maxim seems to be exactly as applicable then as it is now though, and as misunderstood.

bluGill · 2023-11-01T16:11:57

In any compiled language your optimizer will do all those weird things for you, and will even handle all the different generations of CPUs for you. Compilers never give you a better algorithm if you write the wrong one.

Almost all languages have a standard library that has all the normal algorithems you would want, and where something wierd is better they have that done for you.

eesmith · 2023-11-01T17:46:01

Compilers can and do replace some (simple) algorithms with a better one.

At https://stackoverflow.com/questions/74417624/how-does-clang-... is someone asking why the compiler replaced:

    int a = 0;
    while (n--)
        a += (n * n);

an O(n) algorithm, with the O(1) equivalent of:

    a = n * (n-1) / 2 + n

zimpenfish · 2023-11-01T21:40:00

I think your `n * (n-1) / 2 + n` should be `n(n+1)(2n+1)/6` according to the SO article.

`n * (n-1) / 2 + n` would be the sum of numbers, not sum of squares.

eesmith · 2023-11-02T06:22:31

Thank you for the correction!

randomdata · 2023-11-01T15:03:57

I don't see how the larger quote adds any additional meaningful context. Once you have identified (measured) the critical 3%, the state is no longer premature. That is already implied in "Premature optimization is the root of all evil". The the maxim is not "Optimization is the root of all evil".

Narishma · 2023-11-01T18:48:47

> The the maxim is not "Optimization is the root of all evil".

In my experience, that's exactly how most people understand it.

MarkMarine · 2023-11-01T15:34:16

Too many people take this as dogma and just don’t learn the efficient way to do things. I’ve lost count of the number of FE devs that interview in my company’s DS&A section and tell me bubble sort is the best we can do. I don’t need a derivation off the top of your head, just know a couple and tell me a good choice for the problem and I’m good… same thing here. If people live the “don’t prematurely optimize” to the point that they don’t even know the efficient ways to do things, how will they know where it’s important.

mcphage · 2023-11-01T15:24:50

> this quote is frequently taken out of context to argue against optimization in general

Maybe it is, but that's not how it's being used in this context.

bawolff · 2023-11-01T15:05:05

I don't see anyone taking this out of context here. The entire quote is less pithy but not different in meaning. "Premature" is literally the first word.

avg_dev · 2023-11-01T15:07:54

How is this not covered by points 1 (don't put in hacks because of guessing) and 2 (measure)?

sdfghswe · 2023-11-01T15:45:29

Premature "premature optimization is the root of all evil" is the root of all evil.

justin66 · 2023-11-01T16:02:57

> The point is to spend time optimizing where it will have impact.

Your whinging would be more appropriate if Measurement was not emphasized, right near the top.

ttfkam · 2023-11-01T14:41:27

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

This goes double for databases. Folks who use a DB as a dumb bit bucket or a simple 1:1 reflection of their object definitions are often surprised when the DB takes it personal and dooms performance.

If I ever see another ORM-generated DB schema it'll be too soon.

mrweasel · 2023-11-01T15:02:33

> If I ever see another ORM-generated DB schema it'll be too soon.

I'd argue that most ORMs generate the schema you ask it to. Using an ORM isn't going to create a worse database layout that you could do by hand. The issue is that some/many developers don't know SQL, nor have the required knowledge of their databases to use an ORM.

The ORM requires you to know what lies underneath, it is a fairly leaky abstraction. Understanding that, you can get most ORMs to create nice schemas.

ttfkam · 2023-11-01T15:40:28

ORMs are typically lowest common denominator for compatibility with many database engines. The best thing you can do is learn about the underlying engine (Postgres, MySQL, MS SQL Server, SQLite, etc.) and its distinct feature set. Once you know that, you will often find yourself quite a ways away from the lowest common denominator.

That may be built in temporal table support in MS SQL Server or MariaDB, so you don't need explicit audit tables in your schema. Or perhaps timestamp ranges with exclusion constraints in Postgres for enforcing valid schedules without hacks in the app layer.

Eventually you notice that you're having to define everything twice: once for the DB and again for the ORM definition.

This is why I prefer solutions like PostgREST, Postgraphile, and Hasura (for Postgres while other DBs have other similar solutions). Define it once in the DB and just let it bubble up. You may want to expose your storage API through views, but it still ensures optimal data structure at the foundation.

DB -> data-generated API service -> client API codegen -> client app such as web or mobile.

With a typical ORM, you're building from the middle out. It almost always results in a faulty foundation. Pour and set the foundation first. Always.

https://postgrest.org/ https://postgraphile.org/ https://hasura.io/

KronisLV · 2023-11-01T16:19:06

> This is why I prefer solutions like PostgREST, Postgraphile, and Hasura (for Postgres while other DBs have other similar solutions). Define it once in the DB and just let it bubble up. You may want to expose your storage API through views, but it still ensures optimal data structure at the foundation.

> DB -> data-generated API service -> client API codegen -> client app such as web or mobile.

This seems like a nice approach, albeit mine is even more basic. I pick a DB, create SQL migrations for the schema (sometimes even generate those from an ERD planning tool, with manual edits where needed), apply them to a DB with something like dbate: https://github.com/amacneil/dbmate

After that, if I want to use an ORM, I generate entity mappings from the tables/views in a schema-first approach, for example, like https://blog.jetbrains.com/dotnet/2022/01/31/entity-framewor... or https://www.jetbrains.com/help/idea/persistence-tool-window....

I don't think that sort of codegen is as popular as it should be, but for the most popular frameworks in each language, the schema-first approach is usually doable with no additional software that would need to be deployed to prod.

mrweasel · 2023-11-01T17:33:08

> ORMs are typically lowest common denominator for compatibility with many database engines.

That really depends on what you mean and the ORM. Typically the larger and more popular ORMs can and do take advantage of database specific features, if you let it.

Using the database specific feature will prevent you from using the ORM as an abstraction for replacing you DBMS, but outside open source projects that support SQLite, MariaDB and Postgresql, I've never seen that used in professional settings. One system I've worked with had on paper support for DB2, Sybase and MS SQL. It ran solely on MS SQL, the others had not been tested in years and the developers had pulled in so many MS SQL specific features that it was never going to run on neither DB2 nor Sybase ever again.

no_wizard · 2023-11-01T17:12:18

That assumes that you should just expose your Postgres tables as your data.

As someone who consumes more APIs than they write nowadays, I appreciate when data is scoped to task[0]. Employing window functions and stored procedures that can make queries into data fit for purpose is the ideal - databases are faster at manipulating data than any intermediate language, most of the time. Unfortunately, I don't see them employed enough. Backend developers seem content with throwing the schema over the wall way too often.

[0]: As an aside, this is why I personally like GraphQL so much. The middleware lets me cleanup data that backend engineers simply refuse to do, most of the time.

ttfkam · 2023-11-01T18:27:50

You may have missed this part of my comment:

> You may want to expose your storage API through views

Views are a great way to decouple on-disk storage structure from access patterns.

dkarl · 2023-11-01T15:49:37

If you are sophisticated enough to design a good relational model and use an ORM to generate it, what does an ORM give you? Serious question. The answers I've seen typically stress the advantages for developers who are uncomfortable with SQL (who I think are either going to get comfortable or fail, and the ORM will only delay them getting comfortable or enable them to create a more expensive failure) or more easily generate lots of different complex queries (which sounds plausible, but I've never seen it.)

mrweasel · 2023-11-01T17:25:11

The ORM, in my mind, is just there to help you translate data into objects, that's it really. You could do that manually and I have worked on projects where we did just that, but it's a lot of repetitive work that brings little value.

I have seen and done complex queries using a DSL for an ORM, in my case Django, but now you're just learning a different query language, so you're right that the ORM doesn't bring much to the table. Realistically, those who are uncomfortable with SQL are going to create poor queries with the ORM as well.

For quick prototyping and systems with limited amounts of data ORMs can speed up development quite a bit. Technically there's a cost, but computers are fast enough that it doesn't matter on the small scale.

dkarl · 2023-11-01T18:47:56

I guess the way you describe it is the way I like to work, keeping my object model in code close to the relational model to minimize the mental and performance cost of mapping back and forth, and using a SQL library to minimize boilerplate. I don't think of it as using ORM because my favorite tools for working that way don't bill themselves as ORMs, but I'm not sure what I'd use in Python other than SQLAlchemy. Even projects that seem to be stripped down non-ORMs like SQLModel turn out to be built on top of SQLAlchemy.

skydhash · 2023-11-01T17:57:03

Before it was about “What if we need to change the database DBMS?” But now it’s more readable code and easy conversion to the language data structures. But it’s the first thing that is looked at when improving performance

nabla9 · 2023-11-01T16:00:40

Let's add Conway's law to that:

"Any organization that designs a system will produce a design whose structure is a copy of the organization's communication structure." (Melvin E. Conway(

"The structure of any system designed by an organization is isomorphic to the structure of the organization" (Yourdon and Constantine)

Coming back to your point. How do you ensure that data structures are organized well and stay that way as design changes?

You separate data and code in organizational level. You keep the design of database schema, use cases, and mapping between them is separate from the implementation of the rest. This group also writes all integrity checks and etc. Data and code organizations are separate.

IF you don't do it this way, it's hard to separate code and data because the structure of the organization does not separate them.

ttfkam · 2023-11-01T16:34:52

> Coming back to your point. How do you ensure that data structures are organized well and stay that way as design changes?

It's a hard problem if not THE hard problem. Using an ORM or not has no bearing on this. Conway's Law extends to the lowest levels. If an organization changes substantially, no UI, data structure, or database schema will be left unscathed.

Solve the problems you know about. Tomorrow always delivers problems you couldn't (and shouldn't) anticipate. You'll drive yourself and your coworkers crazy by trying to cover all possibilities. It's a disease called Flexibility Syndrome.

nabla9 · 2023-11-01T19:40:18

Your viewpoint is from someone who writes code for others.

My viewpoint is hiring others to write code. My business is tomorrow and keeping contractors in check.

Planning and maintaining data and their schemas in-house and contracting out writing code has been incredibly successful so far.

ttfkam · 2023-11-01T21:16:32

For what it's worth, most of my time is spent inside the DB, not app code. I learned long ago that once bad data gets in, it's far harder to get out and trust the system again. Best defense against bad data is a well-designed, strict schema. In other words, proper data structures.

pjmlp · 2023-11-01T17:13:21

Stored procedures for the win.

alkonaut · 2023-11-01T14:58:32

My additional rule: tiny bits of wasted perf will accumulate and eventually make the program slow even though each one doesn't cost much. So don't leave perf on the table so long as complexity/readability/maintainability/implementation cost isn't affected. That is: all else being mostly equal, it's not OK to choose the slower of two options.

Also: if you assume your N's are small then you can get away with almost anything. But if you write a piece of code that is going to work well for N below 100, but suck for N over 10000 Say, anything with O(N^2), then just cap it. Make a big fat error if the small N assumption breaks. It's still better than the AWS bill surprise or hung program or whatever it could be otherwise.

cratermoon · 2023-11-01T17:24:46

> tiny bits of wasted perf will accumulate and eventually make the program slow even though each one doesn't cost much

Rules 1 and 2 apply here, though.

alkonaut · 2023-11-02T08:17:59

If you spend 100ns you spend 100ns. It doesn't matter if it's in a bottleneck or not. Note that this of course assumes that spending 100ns anywhere is actually making the program and user _wait_ 100ns, i.e. it assumes we are cpu bound all the time.

For any program that does I/O this wouldn't be the case. A "bottleneck" is going to be a network request, and for much of the programs execution time you can just make the CPU do anything without the user even noticing.

So this argument is based on CPU bound interactive programs, which isn't all programs. In programs with e.g. IO (which would be most) then rules 1+2 would come along.

But I guess the thing is that in an interactive CPU-bound program like a game, there _is_ one single bottleneck and most of the code is on that hot path. It's pretty easy to establish that you are in the bottleneck: it's when you are writing per-frame code.

gfiorav · 2023-11-01T14:34:22

Many of these guidelines essentially boil down to strategies for preventing over-engineering.

I concur; in my experience, premature optimization is one of the most expensive pitfalls. This primarily stems from the fact that circumventing potential issues too early leaves them unchallenged, resulting in the next team having to devise expensive solutions to address the unnecessary complexity.

The approach that was instilled in me is this: optimizations rely on presumptions, and these presumptions are often incorrect in the beginning.

Additionally, I've discovered that managing ego and understanding psychology play crucial roles in dissuading individuals from creating overly complex code.

vrosas · 2023-11-01T14:41:54

I like to say, “solve problems you have, not problems you think you have.”

arethuza · 2023-11-01T14:58:10

YAGNI:

https://en.wikipedia.org/wiki/You_aren%27t_gonna_need_it

vrosas · 2023-11-01T18:09:50

Sort of, but people get defensive and start to argue that they _will_ need whatever it is at some point. But my argument is that, fine, you may be right, but if it's not needed _right at this very moment_ there's no reason to rush it in. Often the best way to prepare for the future is to do as little as possible - keeping things simple now makes adaptions much easier down the road if and when the need actually arises.

digbybk · 2023-11-02T22:35:37

Our future selves will always have more information than our present selves, so we should let them make the decision whenever possible.

kvmet · 2023-11-01T14:42:32

This concept is also in-line with Lean/Six-Stuff and identifying "wastes". Overproduction (analogous to over-engineering) is usually considered the worst type of waste because not only are you making something you don't need, you're spending effort that could have been used on something that you _do_ need.

goto11 · 2023-11-01T17:12:14

And digging a step deeper: Over-engineering often happen because you think you might need the complexity later, but it will be more difficult or risky to extend the system at a later time.

E.g. starting out with a microservice architecture even though you only have 100 users, because you think it will be too difficult to re-architect a monolith the day you hit a million user.

So you should address why it feels like the code becomes less malleable over time.

m463 · 2023-11-02T06:50:07

Someone I know said - don't get fancy with errors. fail early and simply.

quelsolaar · 2023-11-01T15:58:49

Generally these are good, but in practice #1 doesn't hold.

When you start out you need to have a _theory_ about what will be you bottleneck. A lot of times you cant implement XYZ, and then measure what was slow and fix that. X, Y, and Z are connected and sometimes you need to build X and Z in a specific way, just so that Y can be made fast, and you know that Y is going to be the bottle neck. Later, when you do measure and know for a fact what is slow, you still have to make a bet on an approach that will make it faster. The more educated you bet is the better.

Good programmers measure, but good programmers have to litterate less because they can predict what will be slow, buggy, use too much memory, and make educated bets in advance, that avouid issues. If you start saying that its a rule that you cant predict performance behaviour, then you are dismissing a lot of experience and skills that good programmer accumulate.

mumblemumble · 2023-11-01T16:06:43

In practice, #1 is an iron rule for people who don't believe it, and a loose guideline for people who do.

Because observing rule #1 happens to be the best way to get the experience and empirical background that are necessary to develop a good intuition for where to expect bottlenecks.

sjducb · 2023-11-01T16:23:33

You can spike the algorithm that you think will be slow. Usually the slow algorithm is simple to implement and test.

If you’re wrong about predicting the speed of the algorithm then you’ve got needlessly complicated code for the rest of the life of the project.

People are wrong about the speed of algorithms all the time. Often O(n) and O(n^2) take the same amount of real time because the computer spent 99% of the time getting n from the database server.

Your algorithm written in C is often slower than the equivalent Python because the bytecode compiler did something clever.

I’ve spent a lot of time speeding up legacy code. It’s usually much easier than you think, and it’s slow for reasons that wouldn’t have been obvious to the original authors.

In fact it’s usually slow because the codebase got so complicated that the original author of the slow code couldn’t reason about it any more. But it’s easy for me to fix because I have a concrete example that is “too slow” so I can debug it by running the code and observing the slow points.

physicles · 2023-11-01T22:34:38

I love it when I find an actual performance problem with code that doesn’t hit the disk or network. Optimizing is super fun, but it’s so rarely needed.

My last time was last year. I noticed that our metrics service, which had been running for a couple years, was now often using 100% cpu. That can cause issues for other services on that node, so I looked into it. Turns out the Go code I wrote to parse Prometheus metrics was a little clunkier than you’d like for the sheer amount of metrics coming from some monitor (I think I’d just upgraded cadvisor or something). I tried getting it to return fewer metrics, but couldn’t figure it out.

So I spent an afternoon and rewrote my parser. My hypothesis was that I could make it faster by allocating fewer strings, which turned out to be correct. End result was nearly 10x faster. 10/10, super fun and satisfying.

I’ve got about a half dozen other things I’d love to optimize with our current system, but for now I can’t justify the cost.

matt_j · 2023-11-02T07:43:49

That’s a good opportunity to run a profiler and it will tell you pretty quick which things are taking the most time/cpu/memory.

furyofantares · 2023-11-02T00:44:55

I think you are (understandably) not responding to the whole quote.

It says not to put a "speed hack" in until you know where the bottleneck is. That doesn't sound like what you described at all.

If you're making a video game with tons of physics objects and you know for sure from experience that collision detection is gonna be a major problem, and this is what you're pushing the limits on, I don't think it's a "speed hack" to design the game and its systems around this.

Additionally, if you're working on something that you know is gonna be a big performance concern then you definitely want to measure it! Not to find out if it's a concern but to find out how successful you are at dealing with the concern.

e12e · 2023-11-01T16:09:44

Could you give a few concrete examples? I'm doubtful it makes a difference in most cases?

If you're building a new system, from new requirements - often just getting started is fine? Build, test/measure, throw-away/refactor - repeat?

Take rust as an example - start with a draft language, a compiler in ocaml. Iterate.

(I concede that Hoare might have known the project might move from ocaml to self-hosted at some point - but I'm not sure that makes much of a difference?)

quelsolaar · 2023-11-01T16:54:07

Sure, Right now I'm working on a Constructyive Solid Geometry algorithm, it requires me to cast a lot of rays, so i know this will be slow. I can up-front say that the Raycaster needs to have fast data structure and that its worth giving up some performance to set that data up correctly. I also know it needs to be multithreaded, so thinking up front about how to manage that is a given too.

A lot of times, I try to count the zeros. If im doing anything over the network, its going to be miliseconds, but if i do something with memory its going to be nano seconds, so optimize to take remove network requests. If i work on a problem that is multi threaded, i need to conside what can be multi-threaded and what can not. Can i minimize the stuff that cant be multi-threaded, and is there other work to be done while waiting for things that cant be multi-threaded? A lot of times I consider latency vs bandwidth. I know that Latency will always be a harder problem, and a problem hardware has harder time solving, so I try to design for low latency first, and then worry about bandwidth.

These are all architectiual decission, that are made early and have a big impact. They are VERY expencive to change, so you want to be as right as you can be the first time. If one part turns out to be slow you can profile and work on it, but changing the struecture of how the program operates and how data is stored and flows is much harder.

The author is right, that data structures are the most important tool for optimization. Especially on modern hardware where cache misses are so expencive. The problem is that _everything_ depends on your data structures, so changing them is a lot of work. (I was just at the blender conference where there was a talk about changing the mesh structure from arrays of structs, to structs of arrays, and it took them two years to make this simple change)

ska · 2023-11-01T17:47:01

> I can up-front say that the Raycaster needs to have fast data structure and that its worth giving up some performance to set that data up correctly.

I don't think this is a great example, really. You're going to want the brute force raycast working anyway for testing, and you're not going to know the details of what the right space partitioning approach will be until you a) understand the CSG application better, and b) have done some measurements.

So it follows the 5 rules pretty well - the only thing I'd add is that you architect the system knowing that the data structures will change for representing the collections of objects and object/ray intersection paths etc. But your first pass should almost certainly be pretty simple and slow, and you should have your measurement code implemented and testing before you try anything sophisticated on the data side. Same goes for chunking up work across threads.

physicles · 2023-11-01T22:44:39

> structs of arrays

It’s such a bummer that the optimal memory layout is also a pain to code with in most languages (thinking of numpy in particular, oof). Are there any languages out there that abstract this away? You’d have to give up pointers, but it’s worth it for some use cases.

earthboundkid · 2023-11-02T00:48:28

physicles · 2023-11-09T09:42:34

And apparently Fortran too. TIL.

randomdata · 2023-11-01T16:11:14

> they can predict what will be slow

If that were the case, would developer-lead startups not have a 100% success rate? After all, a function that takes hours to complete, when an optimized function could take milliseconds, is still more than fast enough if you have no users. I'm not sure anyone has actually proven that they can make such predictions accurately.

quelsolaar · 2023-11-01T16:58:37

No, Pro Poker player loose all the time, but they are far better than your average players becasu they make better bets, because they understand the game and the odds better. Good programers are wrong all the time, (I know i am) but they make less misstakes, and they can fix the misstakes faster, because they can predict when there may be an issue.

Also being a good programmer is not the same as being good at running a startup.

randomdata · 2023-11-01T17:38:41

> Also being a good programmer is not the same as being good at running a startup.

But being able to predict the future of the business is necessary to determine where to optimize. Again, if you have no users, you don't need to optimize at all –period. The success rate should be 100%, as those who predict what to optimize will know when not to go into business.

If you get it wrong, you haven't predicted anything. You were guessing.

aidenn0 · 2023-11-02T00:40:21

1. GP didn't say that they could predict what will be "fast enough" they can predict what will be "slow".

2. A kind reading of GP would also understand that one is predicting what will be slow under some set of conditions.

3. Requiring 100% accuracy to call it a prediction is absurd. A weather forecast is definitely a prediction even though it's not 100% accurate. The NOAA isn't "guessing" weather in the general usage of the term "guess"

4. By your definition extensive benchmarking, measuring a program, and optimizing based on those results is wrong, because we don't even know if the end user will ever run the program!

mrkeen · 2023-11-01T16:28:40

Counter-point to 5:

Complex algorithms over simple data can have big performance payoffs, remove obstacles, and even simplify things.

For instance, binary search over a sorted array (as opposed to a BinaryTree object):

• simplifies merging (it's just concat & sort)

• no pointers, which makes serialisation easier, but also...

• bypasses the need for serialisation

• an array can be on disk, or in memory, or both (via mmap)

• the data can be bigger than your ram

• allows for 'cold starts': just point your program at the file/mapping and tell it to run, no 'loading into ram' step.

• it's also cache-oblivious

Another example is huffman coding. If you studied it at uni, you probably saw it as a Tree-based algorithm with the associated O(n logn) time complexity. I was unaware that there was an in-place, array-based method of constructing a Huffman tree in linear time.

Of course 99% of the time, I'm doing back-end microservices and just using standard collection data structures. But if I ever had a 'big data' task at work, I would strongly prefer to be able to do it on a single machine with a big disk locally rather than buy into whatever the current flavour of MapReduce is.

kagakuninja · 2023-11-01T16:49:58

IMO binary search is not a fancy algorithm. Modern sort functions are fancy, and can have subtle bugs, which is why ordinary devs should not write their own. Even quicksort has foot guns.

Rob Pike's response would probably be to profile the code first, then see if your fancy code or alternate data structure makes it faster.

returningfory2 · 2023-11-01T19:28:34

I don't see this as being a counterpoint. From the perspective of Pike's advice both "binary search over a sorted array" and a "BinaryTree object" are identical. They are just different implementations of the same data structure.

rapsin4 · 2023-11-01T16:38:27

Don't forget, you, the dev is 99% of the time the most expensive resource. Maintainability and first to market are usually way more important.

wredue · 2023-11-01T17:09:35

>dev is the most expensive resource

This is not true. Ask Facebook, who have rewritten things multiple times explicitly because this is not true, but someone assumed it was

>maintain ability and first to market are usually more important

Maintainability and first to market are not trade offs for performance in most cases, no matter how much you want to propagate this ridiculous propaganda.

arp242 · 2023-11-01T18:39:34

But the question is, would Facebook still be around if they didn't "just ship this turd lol"? I don't really have any insight in Facebook engineering over the years and it's a "what if" type of question that's essentially unanswerable, but the answer to that being "no" is very plausible.

And Facebook really does have unique(-ish) scalability problems, and I bet rewrites would have been inevitable even with the best possible engineering, because who can write an application to deal with a billion users (2012, currently 3 billion) right from the start? When Facebook launched in 2004 this was pretty much unheard of, and even in today it remains pretty rare (much less 2012).

bigstrat2003 · 2023-11-01T18:29:09

This type of thinking is what has turned everyone's "desktop" app into an Electron piece of shit. It turns software into a race to the bottom where as long as it's just good enough for users to not drop it, companies say "ok let's do it". It's not good advice to give, imo.

zelphirkalt · 2023-11-01T21:04:42

I would not count electron app build on top of NPM or similar as a good example of what the GP was stating.

RetroTechie · 2023-11-02T13:24:36

> Don't forget, you, the dev is 99% of the time the most expensive resource.

That boils down to not valuing time of the people who use your software. Despicable attitude imho.

Developer time is wasted once. User time is wasted for every user, each time the software is used, for as long as it remains in use.

All these can be large: billions of users, many uses each day, software in use for decades.

If you're the only user, no point to spend time on optimizing anything, unless you perceive it as too slow.

If software has billions of users, then almost any effort in optimizing the tiniest bottleneck in it, is worth the effort. A good example of "the needs of the many (users) outweigh the needs of the few (developers)".

A lot of software sits somewhere in between though (not that many users & performs at acceptable speed).

jd3 · 2023-11-01T15:02:44

I first read this on cat-v 10+ years ago and it left an indelible effect on the way that I approach and think through both design and complexity

http://doc.cat-v.org/bell_labs/pikestyle

ndr · 2023-11-01T14:35:03

How does one go from

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

to

> Rule 5 is often shortened to "write stupid code that uses smart objects".

"smart objects" left such a bad taste in my mouth, the original rule is so much better albeit lengthier.

cgdub · 2023-11-01T15:29:21

I think Rob Pike would agree that "smart objects" is the wrong way to think about it: https://commandcenter.blogspot.com/2012/06/less-is-exponenti...

dbalatero · 2023-11-01T14:58:58

Seems pretty clear to me, what issue are you having?

lowq · 2023-11-01T17:27:31

Agreed. I find that "smart objects" are much more difficult to make cohesive with one another. Punting your "smart" logic to a higher level is easier to understand, test, and change.

sowbug · 2023-11-01T15:26:02

Write code that naturally follows from well-structured objects.

hardkorebob · 2023-11-01T14:34:01

Great format. Love easy to view fast pages. Great advice too for the true hacker. Today the advice goes well but only for a small, tiny niche group. What is considered today as mainstream programming is so abstracted that speed of an algorithm is not a concern on anyone's plate when they fire up an Electron app.

lowq · 2023-11-01T18:19:11

One might say Electron's algorithmic constant is quite large..

eftychis · 2023-11-02T21:52:40

Are these quotes the reason we end up with bloated rarely optimized or respectful for memory, CPU, or other resources software systems nowadays?

https://www.techradar.com/news/move-over-chrome-macos-monter...

https://www.tomsguide.com/news/google-chrome-reportedly-dest...

A great majority using these quotes act like no engineering design work is allowed whatsoever.

I have seen that mentality permeate everything even hitting concurrency decisions for obvious things. (E.g. We need to respond to HTTP requests while doing stuff in the background, and accessing IO and run multiple server state syncing but let us have purely synchronous, single threaded code and we will figure it all out later. That went well.)

And for some reason there is a subgroup that thinks security architecture is "optimizations."

It depends on the product but when you have https://news.ycombinator.com/item?id=38029479 or https://news.ycombinator.com/item?id=38076636 it is not over-engineering: it is management and business forcing poor product.

firefoxd · 2023-11-01T16:19:20

On my first Amazon interview, I'm pretty sure I bombed because I brute forced the algorithm question.

In the real world, when we have an issue on production, I fix it first, then find a solution. Meaning in that first hour customers are complaining, I'm not gonna try to create a fancy algorithm that can handle all cases. I'll make it work with nested loops if need be. Then I can sit down and figure out a permanent and elegant solution.

nevir · 2023-11-01T16:24:11

FWIW, that's typically the structure that Amazon is looking for in an interview (though, some interviewers are better/worse than others)

- start with the super inefficient solution, solve it pretty quickly

- point out performance problems or edge cases that would need to be solved for a larger scale solution

- go implement those optimizations

kevincox · 2023-11-01T17:00:55

Yup, when I was interviewing for Google I was expecting basically the same. I would like you to work though the problem and come up with something simple, bonus points if you noted places that could be buggy, slow or anything else. Then we would look at it and discuss what could be improved and if we had time maybe even make those improvements.

rogierhofboer · 2023-11-01T16:45:40

I am missing a very important one:

Don't communicate by sharing memory, share memory by communicating. https://www.youtube.com/watch?v=PAAkCSZUG1c&t=168s

And lot's of others here: https://go-proverbs.github.io/

parasense · 2023-11-02T01:37:15

I could really care less what Rob Pike things about programming rules, coming from the person who things capitalising is a great way to export things, ignoring that capitalised characters are mostly just a Latin thing, and so the whole concept of his toy language is just that... forever a silly toy language.

38 · 2023-11-02T01:48:02

its a shame that you would toss away an entire language because of a small gripe. I agree that the case thing is annoying, but Go has many strengths and is broadly useful in a variety of situations.

when I started C++ was the "defacto" language, even taught in schools. for many years I avoided programming and I didn't know why. once Go came along it all clicked. I avoided programming because languages like C++ are awful. I could give reasons but people mostly know what they are. Go is great because its mostly simple, and the dev team is not afraid to "say no", even aggressively, if a feature is not strongly useful. plus the tooling is better than most languages one might come across.

timacles · 2023-11-03T12:47:30

Ah yea docker and kubernetes. The silly little playthings of the tech world

dang · 2023-11-01T18:55:54

Rob Pike's 5 Rules of Programming - https://news.ycombinator.com/item?id=15776124 - Nov 2017 (18 comments)

Rob Pike's Rules of Programming (1989) - https://news.ycombinator.com/item?id=15265356 - Sept 2017 (112 comments)

Rob Pike's Rules of Programming - https://news.ycombinator.com/item?id=7994102 - July 2014 (96 comments)

TheRealPomax · 2023-11-01T16:13:41

Unfortunately, Rob didn't stipulate what "fancy" means. For instance: A* is definitely fancy, but it's neither buggy nor particularly hard to implement. And because of what it solves, there isn't even a simple alternative to it.

I'd probably replace "fancy algorithms" with "self-invented algorithm" instead: you're not solving a new problem, resist the urge to "tackle it yourself" and just look up the almost-certainly-decades-old, and documented-by-thousands-of-people algorithm that's going to get the job done.

(Of course, that'll be less fun for you, but if the only person who appreciates a piece of code you wrote is you, that's one of the signs that you wrote bad code.)

tomxor · 2023-11-01T15:59:21

These are all correct and good advice, but I suspect most people misinterpret "avoid premature optimisation" style advice as "don't bother to write efficient code".

Caring about basic efficiency, and being thoughtful about compute is _not_ the same as premature optimisation. This kind of wastefulness in modern software is what causes death by 1000 cuts style inefficiency, it's not because we aren't optimising, it's because we are wasting.

bluGill · 2023-11-01T16:13:01

I call this premature pessimization. Often we know of the best algorithm, so not using it is wasteful. Often not making many copies is easy in the language but we do it anyway.

fasterik · 2023-11-01T19:13:46

I think the wastefulness comes from people imposing a top-down structure on the code before they have a working system. If you first write stupid simple code that solves the problem, then identify areas that need to be faster, you won't have layers of abstraction getting in the way of optimizing it.

travisgriggs · 2023-11-01T15:56:47

>Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

Isn’t this a plug for some sort of object oriented programming? Or at least highly structured programming? Interesting in today’s trendy clime of “objects bad/passe, function(al) good.”

Having done quite a bit of OOP where “binding behavior to data” is the thing in an ideal laboratory for it (Smalltalk and others), I gained some experience with watching people do this. I observed that there were plenty of good ideas across the industry on how to model data (inheritance, prototype, classification, mixins, mutability,etc), but what I observed is that which data groups together well is very much an art, and that was where people struggle to succeed with the data centric emphasis. The easy objects/structs are of course easy, Point, Rect, Person. But as the data got more complex, getting people to come up with good boundaries gets more and complex quickly, and you ended up with really poor object constitutions. Reification is hard. I don’t think it is impossible, I’ve seen people who have a knack for reification come in a refactor an object soup into a cluster of data structures that was beautiful. But yeah, good reification I fear is as much an art as science.

rstuart4133 · 2023-11-01T20:41:22

> Isn’t this a plug for some sort of object oriented programming?

No. You see the same thing happen in SQL database design. How you lay out the tables has a profound effect on the final application. Or indeed in database engines. Some use write behind, some write ahead, some log structured. In fact you can often predict how well they will do locking, perform on writes and mass inserts by knowing what datastructure they chose on day 1. Possibly the best examples are in source code control. Git won, yet git has a very ordinary API (command line interface). Git won because Linux used content addressable memory as his underling data structure, and it turned out to be an inspired choice.

The other side of the coin is once you've decided on a data structure, it's often damned hard to change. Thus one Linus chose CAM, git was going to live or die on that choice. Once there were a few git repo's out there, changing it got very hard. Changing the API is not so hard: add a new one, depreciate the old one.

softfalcon · 2023-11-01T16:50:54

I find the statement is true for either object oriented or functional programming.

As someone who has done both extensively, I find that the patterns are still applicable to both. The presentation is very different, but still viable. For instance, you might use recursion to traverse data over a for-loop, but that doesn't inherently change the concept of a data "structure".

No matter what, we're still speaking in design patterns and if we share that common language, we can apply it to a domain and solve the larger problems of our day-to-day.

If you want more examples of this, look up Data Driven Development, but also append Haskell or C++ to your search queries, you'll find they repeat the same concepts in two very different "grammars"... ahem... I mean "languages".

travisgriggs · 2023-11-01T19:05:59

Yes, given the downvotes, I fear my comment may be misconstrued as a function(al) bash. I write lots of Elixir these days. And see the same issues. All I was trying to say is that, my experience in OOland, that was hyper focused on this idea, gave me lots of opportunity to see that while this rule is obvious and good and everyone wants to do it, I observed that for many programmers decomposing complex data structures is a very non-intuitive task, and difficult to actually realize this rule.

cowl · 2023-11-01T20:27:47

The downvotes are most probably becasue data structures have nothing to do with the concepts of OOP. the datastructures stand on their own and have been present long before the concept of OO came about. yes you can model them as classes/objects to incapsulate the set of operations that you can do on them but it's not mandatory and certainly does not require any concept of inheritance, mixins etc.

kazinator · 2023-11-02T04:50:18

Rule 0: you often can predict where the program is going to spend a lot of its time; don't use rule 1 as an excuse to be completely CS-ignorant. Likewise, don't use rule 3 as an excuse to use completely lame algorithms; "not fancy" doesn't mean Bubblesort. Unlike Pike in 1989, you have libraries.

layer8 · 2023-11-01T22:09:12

> Rule 5. Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.

As a corollary, types are important. If everything is a string or a hash table, then things won’t be so self-evident.

kfrane · 2023-11-01T16:26:38

One more that I've heard just recently: "If your code is just juggling with pointers/references, it is likely that it isn't doing much useful. It is only when it starts dealing with actual values that it is starting to do something useful."

m3kw9 · 2023-11-01T16:34:18

No need to argue details, don’t over engineer your software, you all know who you are

abtinf · 2023-11-01T15:37:01

Enlightenment comes from understanding that they are all the same rule.

zubairq · 2023-11-02T05:27:44

Great rules, need to read them every 6 months to embed them in my neural circuitry!

spacedcowboy · 2023-11-01T16:43:51

Reminds me of the two rules for optimisation:

1. Don't.

2. (for experts only) Don't, yet.

mikhailfranco · 2023-11-01T16:49:36

Correct, beautiful, fast - in that order

“Make it work, then make it beautiful, then if you really, really have to, make it fast. 90% of the time, if you make it beautiful, it will already be fast. So really, just make it beautiful!”

- Joe Armstrong

teo_zero · 2023-11-02T06:45:35

While I appreciate the humor in it, it may make you feel excused for writing inefficient code.

frou_dh · 2023-11-01T15:25:15

Then when he designed Go he made the act of data modelling akin to having one hand tied behind your back, because it supports only product types and not sum types too.

dang · 2023-11-01T18:50:03

"Eschew flamebait. Avoid generic tangents." - https://news.ycombinator.com/newsguidelines.html

We detached this subthread from https://news.ycombinator.com/item?id=38098729.

frou_dh · 2023-11-01T19:33:22

Fair enough but I'd say this thread can only be considered such where it veered into "Go error handling". My data comment if cheeky was relevant to the quote.

dang · 2023-11-02T00:10:17

Your comment spawned a generic flamewar-style tangent that lowered the discussion quality enough that we got email complaints about it.

Btw, the site guidelines also include "Don't be snarky." - I realize that takes some of the fun out of posting a particular kind of comment, but people tend to overrate the quality of what they post when they post that way, and tend to underrate the degrading effect it has on discussions, and that's a double whammy of badness. Having a don't-be-snarky rule, and sticking to it, is a way of globally optimizing for fun—specifically the fun of interesting, curious discussion. That's more important than the sugar rush of being "cheeky".

ikari_pl · 2023-11-01T16:37:20

They also made sure you can't easily tell what the code is doing due to the noise of edge case handling in 75% of the lines

icholy · 2023-11-01T17:51:17

It's not 75% of the code-base. During the error handling proposals, the Go team analyzed a large corpus of Go, and it turns out people drastically overstate how much error handling contributes to the line count.

jahewson · 2023-11-01T18:03:24

It does make me wonder if the fact that people feel this way says something about how much cognitive effort is consumed by error handling and that it might be disproportionate.

t-3 · 2023-11-01T18:28:22

Error handling is something you always need to think about for serious code, but often feels like unnecessary work and boilerplate for exploratory programming or simple hacks.

cowl · 2023-11-01T20:41:01

not 75% of all code but 75 of all code that is doing anything with meaningful practically. the classic example of a simple copyFile func.

  func CopyFile(src, dst string) error {
    r, err := os.Open(src)
    if err != nil {
     return err
    }
    defer r.Close()

   w, err := os.Create(dst)
    if err != nil {
     return err
    }
    defer w.Close()

   if _, err := io.Copy(w, r); err != nil {
     return err
    }
    if err := w.Close(); err != nil {
     return err
    }
  }

xarope · 2023-11-02T02:35:25

You could almost shorten it to:

  if  r, err := os.Open(src); err != nil {
    return err
  }
  defer r.Close()

But then r is not in scope for the r.Close proper

no_wizard · 2023-11-01T23:33:08

could you in theory break this into other functions or no?

You could have smaller discrete functions that abstract the handling of each action a little bit and be more reusable, or is that not possible in Go

ikari_pl · 2023-11-02T06:49:50

like Open, Copy...?

These shorter functions need to signal their failure somehow, so calling them looks exactly like the example.

cratermoon · 2023-11-01T17:02:25

Buggy "edge case handling" is the source of many critical failures[1]. Go makes explicit where a called function can return and also provide information for any anomalous conditions encountered. The alternative of just pretending like a return means success is wrong, and other ways to determine if the result of called function is acceptable (e.g. checking errno in C) are just as verbose and introduce other failure modes.

Here's a thought experiment for you: pretend the return type is something other than 'error': result, statuscode, responseContext, anything that doesn't imply failure. Would you then suggest handling that is "noise"?

ETA: "there are countless other [than if err != nil] things one can do with an error value, and application of some of those other things can make your program better, eliminating much of the boilerplate that arises if every error is checked with a rote if statement."[2]

1 https://www.eecg.utoronto.ca/~yuan/papers/failure_analysis_o...

2 https://go.dev/blog/errors-are-values

groestl · 2023-11-01T17:39:22

> Buggy "edge case handling" is the source of many critical failures[1]

And to fix this, we introduce 10 places per function to improperly unwind the stack, have a chance at missing an error result, and completely ignoring that fact that anything can fail anyway, even a simple addition. Instead of just writing exception safe code in the first place.

LispSporks22 · 2023-11-01T17:32:29

> Go makes explicit

Is kind of an understatement. If the handling code for that is duplicated as 75% of your code base, there's something wrong with the language. There's got to be some other way than all that noise.

fasterik · 2023-11-01T18:36:49

Explicit error handling is a design choice, not a language defect. If you don't like it, you don't have to use the language. Many people choose to use explicit error handling even in languages that support exceptions. Knowing that every function call returns a value that is handled locally makes it a lot easier to reason about your program and debug it.

Also, this 75% number sounds made up out of thin air. If your program is doing something non-trivial, it should have far more code doing work than checking errors.

LispSporks22 · 2023-11-01T20:22:42

I asked a Go programmer how much of the code base was Go error handling boilerplate. He measured it and said 75% I suppose it varies from code base to code base. There’s no denying it’s high though.

> handled locally

Except in practice, you don’t. You just keep returning the error up the call stack until something handles it at the top, probably by trying again or more likely just logging it.

AnimalMuppet · 2023-11-01T18:46:28

I recall reading back in the 1970s or 80s that error checking and handling took 80% of the lines of code of production-ready software. That would be pure procedural code, not exceptions, not FP style. (And that's all I've got - one hearsay-level report from decades ago.)

I have not, ever, seen any numbers for exception style or FP style. My perception is that their numbers might be lower, but I have no evidence, and I am not dogmatic about my guess.

fasterik · 2023-11-01T19:30:03

I can't really imagine 80% error handling for the whole codebase unless literally all of your functions look like this:

  int foo(Data *data) {
    int error = do_some_io_request(data);
    if (error) log_error(error, "Request failed");
    return error;
  }

For propagating errors up the stack, the ratio is only 50%:

  int error = foo(data);
  if (error) return error;

For the rest of your code, I guess it's domain specific. But most projects should have a significant amount of the codebase doing things with data in memory that can't fail.

AnimalMuppet · 2023-11-01T20:06:51

A lot of code looks like this:

  int handle = open(file, S_IREAD);
  if (handle == -1)
    return false;

  int size;
  if (read(handle, &size, sizeof(size)) != sizeof(size))
  {
    close(handle);
    return false;
  }

  char *buffer = malloc(size);
  if (buffer == null)
  {
    close(handle);
    return false;
  }

  if (read(handle, buffer, size) != size)
  {
    close(handle);
    free(buffer);
    return false;
  }

And so on, with the number of things you have to clean up growing as you go further down the function.