Hacker News new | past | comments | ask | show | jobs | submit login
Fundamentals of Optimal Code Style (optimal-codestyle.github.io)
82 points by todsacerdoti on March 25, 2021 | hide | past | favorite | 50 comments



The "Building the Visual Structure" starts with an example that's doing way too much, to start. This analysis talks a lot about "style" without taking into consideration practical problems...ie

Option 2: inner alignment - what happens when your param names are longer than your call statement prefix? It's fucked visually.

    try foo (at: tempStoreUrl,
    destinationOptions: dstOptions,
    withPersistentStoreForm: storeUrl,
    sourceOptions: srcOptions,
         ofType: store.type)
So Option 1 is better (from his own criteria), but moreso, why group the first argument into the prefix of the call (or the end arg) ?

    try coordinator.replacePersistentStore (
                                           at: tempStoreUrl,
                                           destinationOptions: dstOptions,
                                           withPersistentStoreForm: storeUrl,
                                           sourceOptions: srcOptions,
                                           ofType: store.type
    )
This whole thing feels rather inconsistent and poorly thought out at times.


Maybe I'm just a sub-optimal code monkey, but a lot of what TFA advocates for seems bananas. Most egregious was separating function names from the opening bracket for their arguments.

I don't want to start a fight about that, but I do want to draw attention to the paradox that the "optimal" readability is so jarring to read.


Yeah, it's interesting that they present two choices, but not my preferred choice, thus leaving me to wonder why their options are supposedly superior. E.g. out of

    result = someFunction(firstArgument, secondArgument);
    result = someFunction (firstArgument, secondArgument);
They view the second as easier to distinguish the function from the arguments. I agree. But then again, I generally might write that like so:

    result = someFunction( firstArgument, secondArgument );
Which I think accomplishes the same purpose, and as a style has the added benefit of making KV pairs more distinct if you group them:

    result = someFunction( argX="foo", argY="bar" );


By the second paragraph (not counting the introductory quotes), I had the uneasy feeling that the author was basing this essay on a restricted, and not very relevant, concept of readability.

When I read code, or any other highly technical prose[1], I am trying to get into the mindset of the author: what problem are they attempting to solve here, and how did they envision this code advancing that purpose? What must one assume about the problem and the author's solution in order to be persuaded that it will be successful?

If this sounds too hand-wavy, there are more pragmatic issues, such as: what, beyond the specific predicate, is this conditional testing for? And/or why here? After this method call, what has been changed? How could this code have produced that result? Often, the answers to these questions is obvious, but it is the cases where it is not that determine readability.

Indentation is remarkably effective in helping one read code, perhaps unreasonably raising expectations that other changes to the layout of code will be similarly effective, but I personally have not seen anything else that gets close, and the nature of the problems to be solved makes me doubt that there will be.

[1] One could argue that code is not prose, but one goal of higher-level languages is to make it more like prose, and I think my points here are relevant regardless of whether one thinks this is the case.


It is sometime confusing that people call readability as textual readability - i.e., a visual and positional property of the code.

What I would consider readability is probably closer to what you might call 'ease of understandability'. For example, how much of the code do i have to keep in my head, in order to make sense of what is on the screen. How much conceptual understanding of the underlying algorithm must I understand before the code can make sense? Is there any special language quirk or knowledge needed to understand the line of code (for example, this is where operator precedence rules often exhibit problems).

I don't care that lines are indented "wrong", or that they use ternary expressions, etc, unless this makes material difference to cause understanding to be more difficult.


Linus Torvalds disagrees about line lengths for what it's worth: https://news.ycombinator.com/item?id=23356607


I find it difficult to have meaningful variable/function names and stay under 80 characters.

Plus, with code completion (especially in a strongly typed language) it’s no longer necessary to optimize for fewer keystrokes.


I do think having lines around 80 characters makes it more fluent to read code. It is not about saving keystrokes or to be able to do split screen even on smaller screens.

But what has worked better for me is to have it as a soft not a hard rule. Better to have some lines go over a bit than having awkward breaks that make it harder to read


The main problem is that code isn't like other text.

Strict rules are BS because they count characters that don't contribute to the actual content that you're parsing.

This includes punctuation and especially whitespace. I tend to keep lines short in general, too, but strict limits just for the sake of it are indeed more hinderance that helpful.


At one point he use to be fanatic about it. The VIM configuration line specific to linux kernel development originated from that insistance on 80 chars per line in the early days.


It made a lot more sense back then as well.


Count me in. I'm pretty tired of 80 (79) column limits.


Interestingly the text (not code) of Torvalds' posting discussed there is formatted at 70 characters wide.


I'd be curious to see experiments with variable numbers of line breaks to create additional structure. I find my own code significantly more readable by separating certain "sub-blocks" of code with two line breaks, with their internal structures having single line breaks.

  thing_1_a
  thing_1_b

  thing_1_c

  
  thing_2_a
  thing_2_b
It fits into the framework proposed here but is not mentioned specifically as having been investigated.


In some long functions (really only very long app-setup functions), i break up related bits into blocks:

  {
    thing_1_a
    thing_1_b
    
    thing_1_c
  }
  
  {
    thing_2_a
    thing_2_b
  }
Blocks are are a language feature in a few languages, but are barely ever used, so it is rather weird to write and read at first.


Note to onlookers: blocks can soemtimes act weird in javascript. I loved using them but alas.


Not just in Javascript. Since blocks define a scope, visibility and unexpected lifetime issues may occur in other curly-braces languages like C and C++ just as well.

Using blocks like that is something you should be very careful about - might have some surprising effects especially for novice programmers.


The blocks here act exactly the same as any other blocks, like the ones you write for loops and ifs etc. So i wouldn't exactly call the consequences "unexpected".


I can't recall right now but I remember having errors that would not happen inside a normal for loop block. And I was using let/const not var.


The problem is, to most people it just looks like you made a mistake. If you want to create breaks in your code, consider adding comments or breaking blocks out into their own functions.


Does it look like I made a mistake to most people? I've never received a comment in a PR about it and I observe other developers using the same technique, so I'm surprised to hear this, do you feel very confident in this assessment? Just trying to gather feedback here.


Maybe not a mistake, but code smell, often, yes. If you add blocks, it's usually because you want to group things together, and avoid scope leakage (if that's an issue, then you're probably doing similar things multiple times). Often a function will be a better fit here.


Tangential at best, but I derive great benefit personally from extensive use of `paredit-mode` (even in curly brace languages) along with brace matching and block highlighting when trying to figure out the structure of a piece of code.

I imagine there are similar tools available for other environments.


The conclusions are similar to what I've seen in this talk by Kevlin Hennley: https://www.youtube.com/watch?v=ZsHMHukIlJY

And I think they are pretty reasonable.

I've used camelCase a lot but I am about to switch to the good old underscore_style as it really might be easier to read.

And there is no doubt in my mind that alignment, spacing and structure is almost as important as good naming.

And of course, Allman style braces.


While Allman style can be very pleasing, it requires the right language, as certain language features will make it off putting to see it used in some places in not others. For things like C, it makes sense. For languages that use braces for blocks inline (e.g. Perl), you're going to constantly have a mix of Allman style braces and inline/infix braces if you use the language features in a useful manner.

Put another way, style is something that should be chosen per language, perhaps with very similar languages having a style that it makes sense for you to share across them. A blind adherence to any specific style across languages is not necessarily useful. This is obvious when you consider languages as disparate as C, Lisp and APL, but it applies similarly, if more subtly, to languages that aren't nearly as different as those are from each other at first glance.


I have yet to see a good case for alignment over indent-wrapping.

For example, if there is a map of object properties that is ~20 or so items long, with an alignment of each value you cut down on the jaggedness of each line. I see that jaggedness as an advantage, where each line is slightly more unique, and thus easier to scan in context. Additionally, the key on the front of the line is generally what you are searching for, so alignment generally does nothing for searchability anyway.


In a grand tradition of HN comments opting for a facile critique of a meta aspect of TFA, I have to say that it's difficult to take seriously a post about "optimal code style" that is presented in a bold and italicized, blocky, nigh-unreadable font (and a title-font, at that, not one meant for main text)...

[1] Edit: I opened it on my phone and it looks ok-ish, to the point that many wont understand what I meant above. Well, here's how it looks on Chrome/macOS desktop: https://imgur.com/a/YqycyNc - Damn, it's also justified, to add insult to injury (since the web, as fellow commenter CharlesW below says, just does crude unreadable justification).


Also, today I was reminded that web still does not handle justified text well, which can already be a challenge to read when properly and thoughtfully typeset.


One thing a lot of these supposedly grounded in congnitive science ignore is that the human brain is crazily adaptable. For example, just look at reading. Who would have thought that the human brain can process “fruit” about as quickly as seeing an image of a fruit, even though the glyphs of “fruit” are nothing like a real fruit.

In the same way, I believe consistency is probably a more important thing than actual indentation and spacing. If it sees it often enough, the brain can pretty quickly adapt to about anything.


> In the same way, I believe consistency is probably a more important thing than actual indentation and spacing. If it sees it often enough, the brain can pretty quickly adapt to about anything.

Sure thing. The author just argues that we can help the average brain doing it quicker by adhering to some simple rules. But yeah, consistency is the main factor - takes off a whole bunch of cognitive load.


I have been coding for a long time and I believe more and more that you can put as much science in as you want. But in the end these things are highly subjective. Some people like it one way, some the other way. And in the end you can get used to almost anything once you have been doing it for a while.

I wouldn't mind if languages left no freedom for formatting and enforced a style. People would get used to it and a lot of ultimately fruitless discussions wouldn't happen.


> And in the end you can get used to almost anything once you have been doing it for a while.

No. I never got used of line lengths above 80 chars. It hurts.


> I wouldn't mind if languages left no freedom for formatting and enforced a style. People would get used to it and a lot of ultimately fruitless discussions wouldn't happen.

That's exactly what enforced style guidelines are for. Most big software companies do exactly that. They define a set of rules and those are enforced using tooling (i.e. you cannot check-in and/or successfully build code that doesn't adhere to the guidelines).

It's a pain sometimes (especially if line-length restrictions are enforced), but it frees your mind and that of the entire team from the fruitless discussions you mentioned.


It's less about "one true way" and more about "pick a way and stick with it".

Almost none of this matters until it's not consistent, then it makes everything harder.


5080 words and numerous illustrations before you get to see any code - It seems to me this shows only one of Larry Wall's virtues of a programmer.


Heads up; this article is only a thoughtful proposal for good style, not a proven "optimal" style as I initially believed.


> When I speak of objective readability, I mean that the full readability of a text consists of a subjective component, caused by developed habits and skills, which we talked about above, and an objective one.

> Thus, the subjective component is associated with some private habits that can be changed, and the objective – with the general psychophysical features of a human's vision, which we do not assume is possible to change. Therefore, when talking about optimizing the text of a program, it makes sense to talk only about the objective component of readability, and therefore further in this article, the term readability will always refer to this component of it.

Although this separation may be necessary to set the scope for the rest of the text, a lot of which is about cognitive processes that are "wired" into the human visual system and cannot be changed by learning, the above somehow not ringing true for me as a generality and detracts from the text.

Why it isn't ringing true has less to do with the subjective/objective separation per se and more with the author's relegation of learned skills into the realm of what is "subjective".

When people acquire skills, and that makes some tasks easier compared to untrained people, that difference categorically is not subjectivity. It is not subjectivity because we can reproduce the training effect in person after person, and even measure it with some kind of numbers that we can plot on nice graphs and see things like that similar "learning curves" are consistently reproduced in different people.

When we optimize we cannot discount this "subjectivity". We must assume it. We must assume it because the purpose of optimizing the activity, such as reading code, should be geared toward someone who will be doing it a lot. Someone doing the activity a lot will learn something, whether we like it or not, and that learning will reduce their difficulty.

To use a code optimization analogy, modeling the human as static is like generating code based on the machine not having branch prediction or caching.

I think the subjectivity we should discount is in how we measure readability. If we, say, ask subjects to rate the readability from 1 (very poor) to 5 (very good), that is subjectivity we might not want.

Lastly, if you're going to tell the reader you won't be discussing subjective aspects, then it behooves you to subsequently refrain from wording like "the program text often looks like one poorly structured massive chunk" and "placing the last open brace at the end of the line looks quite natural, and the resulting space only adds a small accent, compensating for the small visual mass of the last block".


I think the author's POV still stands just perhaps needs some massaging the framing.

> When people acquire skills, and that makes some tasks easier compared to untrained people, that difference categorically is not subjectivity. It is not subjectivity because we can reproduce the training effect in person after person, and even measure it with some kind of numbers that we can plot on nice graphs and see things like that similar "learning curves" are consistently reproduced in different people.

It sounds like you're referring to convention. Adhering to conventions a reader is familiar with definitely improves readability, but _any_ style can be improved by adherence to convention, so we want to ask, what is the most readable style before we start layering on conventions?


Not having "reader mode" available for this webpage irritated me greatly because I hate serif'ed fonts and prefer how I have my reader mode configured.

My "rules" for readability. YMMV

Opening braces are syntactic sugar to allow for single line statements without braces. Using what the author calls ITBS and requiring braces avoids this.

Putting the "else if" and "else" on a separate line avoids the "} else if () {" structure and puts the "else" at the same syntactic level as the "if".

snake_case variables areMuchMoreReadable than camelCase.

Spaces (after_function_names) means that it is difficult to distinguish them from language keywords like while () and for () (in C/C++).

Excessive attempts at alignment on ":" or "=" boundaries or using a convenient arrangement for a particular group of declarations leads to inconsistencies that cannot be dealt with by an automatic formatter.

Function parameter declarations should be aligned one indent further than the name of the function with the brackets used in the same way as braces, so something like:

    function my_function(
                param_1,
                param_2
    ) {
        // body
    }


An aweful pedantic tome for a problem of one's own making. Compose and you won't have any readability problems.

People write a method/function and then don't bother to pull out blocks of specific functionality that can be contained within other "helper" methods.

Your 100+ line method can always be refactored to a 20 line method that calls the other smaller methods needed to compose its total functionality.

Not only can I now read it, I can understand it.

Giant blocks of code manufacture bugs.


It can be the exact opposite too (which is why a shorter as opposed to longer function is a tradeoff that should be questioned each time, as opposed as a clear winner).

A 250-line function might better encapsulate the required logic to understand a piece of code.

1 function composed of 10 20-line functions might requiring hunting around to get the point and understand the big picture, with everything always happening "somewhere else".


Nope. All the methods should clearly name the functionality. No guess work. It reads like prose.


I respectfully disagree. If those functions are only used in this one place, I'd much prefer to have the code in one large block. John Carmack makes a good argument for why: https://news.ycombinator.com/item?id=25263488


I have worked on code bases where they followed the 20 line per method rule religiously. It resulted often in artificially broken up functions that were way harder to read than one big function. I would call it "small function spaghetti code".


So don't artificially break them up?


Refactoring a larger function into a group of small ones doesn't come for free. Even though it might often be a good idea, when it's not done with care it can confuse, not clarify. It comes down to a matter of judgment of the whole and deliberate expression. That's contextual, not absolute.

With that said, I somewhat agree with the spirit of what you are saying. I tend to lean in the direction of smaller methods when possible because they often make things more readable and more explicit.


If it isn't clear it's not easy to change, making it "bad code."

Do what needs to be done to make the intent of a block of code super clear, in plain English. Comments don't count, as illustrated clearly in The Pragmatic Programmer, for instance.


As much as I like this idea in theory, in practice, a lot of the hairiness and opacity of code I've seen comes from the business domain itself, not the code. There is only so pretty you can make code which has to do painful and unintuitive tax calculation, or credit underwriting logic. But often times, this code is the most critical to the business as a whole. It is code which is counterintuitive because it reflects a reality which is itself painful and unintuitive.

When it comes to writing clear code in reality, I'll repeat what I said before -- there aren't absolute rules, and things are contextual. Where it's possible, it's absolutely ideal to keep things compact and expressive. But that only will take you so far. When you run into domain logic that is to an extent incompressible, trying to compress things further will only impede clarity.


The rule of thumb of 80 chars, which was(sometimes still is) applied for width, also applies for height: a function should not be more than 24 lines :) If it is greater than that then it needs to be refactored. Helps in readability as one does not have to scroll to read the entire function. Of course these rules came into being when terminals were 80wx24h. This was a guideline for Linux kernel developers for many years.


I think strict adherence to such rules is silly. Some 20 lines functions are massively more complex than some 120 line functions.

If the computation is simple and linear, then long functions are fine. If there are nested loops and if statements and other forms of flow control and deep nesting, then long functions are painful.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: