I've written code numerous times to deals with files, and if there's anything more than feeding the path as given into open()`, then
the code inevitably gets filled with variables of related names, and if you stare at it long enough, they all blur together. 'filename', 'name', 'file', 'files', 'file_handle', 'path', 'dir', 'dirname', 'parent_dir', 'file_ext', 'basename'...
Do I want 'filename', which is whatever was given?
Do I want 'file', which is the file object?
Do i want 'name', which is just the name of file, with directories, if any, stripped off?
Do I want 'files', which is the list of files, as given?
If copying/moving is going on, then there's a source and a destination and it gets even worse.
The obvious way out is to refactor by renaming those variables, but 'unsafe_filename_as_given_by_user' or 'list_of_files_as_given' doesn't quite roll off the keyboard.
>The obvious way out is to refactor by renaming those variables, but 'unsafe_filename_as_given_by_user' or 'list_of_files_as_given' doesn't quite roll off the keyboard.
That's absolutely the way to do it.
"unsafe_filename_as_given_by_user" is clear, precise and a much better name than just "filename" when used in a large scope. It's a stepping stone to better code.
The length and unwieldy nature of the specifically named variable is a code smell, but it's a code smell indicating a different problem this time - the problem is that the variable's scope and/or context is too large.
When I set out to refactor a large code base I almost always increase the variable name length to begin with to prevent name conflation, ending up with the 'unsafe_filename_as_given_by_user' kind of name and then bring it down again later when short variable names in a short scope are precise enough.
You only need to type unsafe_file_name_from_user once. After that, even the simplest, language-agnostics autocompletion of identifiers from the current buffer goes a.long way. A proper IDE makes it even simpler.
I have found that when I start reaching for long variable names it's because there is a design problem. Ideally the use of the variable should be obvious from the context. As you find yourself adding more and more context to the name, you should realise that it's because you lack context in your design.
Deriving meaning from the context requires understanding the context. That's fine for the original author or the refactoring dev, but the troubleshooter swooping in to fix that problem that's obvious in the UI needs to be able to understand the variable names with as little context as possible. "Well, if you'd just spent a few hours to truly grok the intent of the code, ..." is not an acceptable answer when it's possible to make the solution transparent as easily as making more descriptive names.
Descriptive names are great when a symbol is used in many scattered places. Context is okay when the definition is right in front of you. When a function is short enough, context for a parameter should mean reading the function's documentation written immediately above it. Or if the scope of a local variable is three lines, its initializer often explains what it is, and it's right there.
Very well put. I wish everybody could get this concept. With this kind of attitude, you are going to write code that is easier to read, is easier to update and debug, and has fewer bugs in the first place.
Code with long identifiers is harder to read, not easier to read. It's harder to see the flow of data and the flow of control, because there is invariably a soup of common prefixes throughout, performing the contextualization.
If you're in the middle of the spreadsheet reader, it should be self-evident that 'row' means a row in the spreadsheet; if you're in the middle of the database insertion logic, it should be self-evident that 'row' means a row in the database.
I use 'row' as a variable name quite a lot, especially in loopy situations where it's obvious. The more problematic times are when you have things like a source and a destination row dealing with, say, cows, and you call one of them 'row' and the other one 'cow\_row', and in different places pick which kind of row is which at apparent random.
Or the situation with the file names, as someone else mentioned at more length.
Long variable names are a smell: they mean multiple concerns have been mixed in a single body of code / scope, and the long names are required to disambiguate them. They're a sign that something has been designed incorrectly.
If your functions are short, and logic straight, you usually don't end up with overly long lines and deep indentation. It's kind of similar to enforcing 8-space tabs, only with space consumed by a more expressive approach.
I often notice that I can remove a comment line after renaming a few identifiers in the code below it, because the code starts to read self-explanatory enough.
Identifiers have the same problem that comments have: the thing identified can grow different responsibilities than its name. You don't get magically updating identifiers any more than you get magically updating comments.
I've found that people who believe too much in the ability for identifiers to tell the story of the code factor too much stuff into separate non-reused functions. The result being that you can't take the name at face value; you need to push the current context onto your mental stack, drill into the identifier being called, and if necessary continue drilling, until the unstated side-effects and extra bits and bobs that always accrete in continuously maintained code become clear. This code is ironically harder to read than inlined code.
> I often notice that I can remove a comment line after renaming a few identifiers in the code below it, because the code starts to read self-explanatory enough.
Careful with that. I'm of the mind that the code author is incapable of describing how readable any codebase is. It's too easy to mistake your innate understanding of your own creation for readability.
I've come to not see it as a big problem: It's like in the real world, the same words mean different things in different contexts.
Opening a file, i.e. interacting with the filesystem, is a totally different thing than writing out stuff, so I try to separate these tasks into different procedures. It's not a big problem if "file" might actually mean "filepath of the file in the filesystem" in one place, and "file handle" in another.
Don't put everything in the same object .... A file object should not be used to open files, make something else to open the file, like editor.open(file), editor.clone(file), editor.move(file) ... And the file doesn't need to have all those properties, make functions that derive from file.path like dirName(file.path), fileExtension(file.path), fileName(file.path) ... But object oriented (OOP) is not the only paradigm. Don't make classes if you don't have to. Once you have discovered OOP you tend to make everything into classes, and spend a lot of time building cloud castles.
I used to have difficulties with this, but if you apply the system metaphor those problems become less of a problem.
In your example, you're dealing with a file system, so the file is the entity and everything else is just a property of that.
With regards to source and destination, I find that people tend to struggle with this, so instead I tend to think about the entity in question (in this case a file) and the target entity (eg a remote file system).
You don't need to add more words to make it obvious. Just be concise.
In my experience there are a couple of reasons that lead to bad names in programs:
- it takes effort to think of a good name. And more of the linguistically creative type instead of the logical reasoning type which makes up most of the rest of programming.
- it takes domain expertise to pick a good name
- if you stare too long at the same piece of code it becomes harder to see it through the eyes of someone who has never seen that code before
- almost everybody uses in English in their programs, but many people are not native speakers
But I think making an actual effort to pick a good name already goes a long way. It forces you to think about the problem at hand and how you could communicate that to somebody else.
<pre> tags are not meant for block quotes. HTML has a <blockquote> tag instead. <pre> tags force mobile users to read in snippets as they scroll horizontally along the entire line. If you want your block quotes to use a monospace font, you can use CSS, and then you don't have to give up on semantic HTML and line wrapping.
Good naming is hard because you are using the wrong tool. Names in code, and even in the business world, are usually abbreviations for much more complex things. You should not be looking for too much meaning from the name itself. Most words have many meanings, so trying to string together some verbs and nouns to describe complex behavior will almost always fail. A BDD test suite is far more suited to tell you what the abbreviated name really stands for than trying to deduce it from the name itself. One exception to this could be programming idioms, for example using 'i' as a counter in a loop which anyone with experience recognizes without even thinking. To many programmers, 'i' has become a word with quite a complex meaning itself: a) I'm a number b) I will be incremented c) I will be used in a loop d) I will be tested against to end the loop. Try naming that accurately without ending up with a really long variable name.
Programming mostly in Python and some C these days, for me names are the most important place to put meaning in.
The trick is being systematic. And consistent in the small. Short names and (at first) non-telling names are not a problem. It makes no sense to use (predominantly) very long descriptive names, not because they are harder to type, but because a long name is and indication that it's not an established concept, not a recurring theme across the code pase.
So my habit has become to just choose a one-word name for this vague but probably important concept I have, even if it's not the best name. I put a comment on the datatype declaration, explaining in detail the meaning of that thing. Maybe later I will come up with a better name, so I will just switch to that name. Or the comment gets improved. It's about growing clear concepts. After some time that single-word name will be a natural thing to use for that concept.
If you can have a clear understanding of the important concept (instead of a vague understanding) that can be reflected in a good name.
Good name meaning one that is descriptive, unambiguous, and reveals a crisp, not vague, concept.
Then with that good name, when you write the code the first time, the clarity of the concept will help you avoid logic pitfalls and messy constructs.
Not to say comments are always bad, but probably a comment won't be needed, because the name itself will reveal the intent.
Then later, when someone does come and edit the code, instead of having to come up with a better name, they can focus on whatever they came there for, and they can do so with a better understanding of each variable and what is going on.
The difference between the two approaches is really about where you spend your extra time related to naming. In my approach, you spend a little extra time up front. But it begins paying dividends immediately, as you continue writing the very first version of the code; it is easier to write clearly and correctly.
In your approach, you spend less time up front, and pour the code out faster, which might seem like a great thing. You also see an immediate payoff, because your process is fast. But then you (or someone who comes behind you) has to pay a heavy bill of technical debt.
When I read your comment I read it as a description of what your current practice is, not something that has been thought through a lot. Growing clear concepts is a great way to think about it, so that's a good line to continue with imho. I just think that part can and should be done up front. Sometimes it's worth a short conversation with another developer to see if you can agree on a good definition reflected in a clear name.
But you're also right that sometimes single word names work fine. And sometimes concepts are just clear and don't need much thought. Obviously those ones we can agree on. I'm more focused on the more difficult, or more vague ones, as a place where I think up-front quality naming is better than having the "fix-it-later" approach.
Yeah I didn't mean to imply that I pick bad names intentionally. Of course I pick the best name that I can come up with. But when I'm doing something I haven't done before, it's hard to judge whether a name is any good, maybe because I don't actually know precisely what I'm doing in the first place.
In these cases I err on the side of short names (preferably non-composite) because I know that making a longer name will not help. I just use that new name a little in a more exploratory type of programming. When it sticks it was a good choice. Examples: tree, spec, coverage, identity, isomorphism, object, struct, key, value, tag... These names need context, but they can turn out to be good names because they are short and distinct.
When the name doesn't stick and I can't come up with a better one, maybe I should do something entirely different.
One way you can end up using the wrong tool is through trying to cram every relevant fact about a variable or function into its name because you have been told comments are bad.
"for example using 'i' as a counter in a loop which anyone with experience recognizes without even thinking."
This actually isn't too bad provided the loop's scope is small and it is obvious from context what it is doing.
It's variable names which could have a variety of different meanings used in larger contexts and scopes - that's when maintainer confusion really sets in.
Of course, the actual solution to using i as a counter in a loop is almost always to elide the counter entirely, using an iterator. Expanding on that principle - if you have something that seems to evade a good name, perhaps you're at the wrong level of abstraction.
I think this article ignores or glosses over the hardest part about naming, which is when you have to start naming things with abstract behavior. Naming things with analogs in the domain is easy, but when you find yourself trying to follow the Single Responsibility principle and breaking pieces off you run into trouble.
My AccountingRow is something the user sees, and there's also a SummaryAccountingRow with similarities; both of these need SQL generated, so there's a AccountingRowSQLGenerator and a AccountingRowSummarySQLGenerator. These generally share some of the same business logic around how to join different tables, so there's a AccountingRowCommonQueries class. We also need to spit this out in JSON, so there's a AccountingRowSerializer and AccountingRowsSerializer and AccountingRowSummarySerializer, AccountingRowSummaryRowsSerializer. An AccountingRow also has an AccountingRowBehavior, and AccountRowAdder, and onward into infinity.
Now you have name soup, and if you are looking at this system for the first time your eyes glaze over.
Well, even non-hard, very simply functional/x-ray naming sloppiness causes plenty of grief, but I think this is partly because we're used to naming being hard enough that -- at a certain level of energy, time-crunch, and lack of investment in a massively brownfield project we really would rather refactor entirely -- we sort of give up. There's a recentish (2015) little empirical study that clustered'linguistic antipatterns' (based on surveys of software engineers): https://www.researchgate.net/publication/276314133_Linguisti...
Wow, I wish every academic paper were that useful to the layman. Free download, well-explained list of antipatterns, in-document hyperlinks to code samples so you can recognize the antipatterns, suggestions for renaming. Thanks very much for the link.
Yes. Right naming hints right abstraction, vice versa. If you cannot find a good name, large chance is that you probably have some confusion as of how to structure your program.
>If you cannot find a good name, large chance is that you probably have some confusion as of how to structure your program.
I'm not sure about that. If you don't know about patterns like factory or observer (event listener), and you invent these concepts independently, you'll have a hard time naming it, and your names will be nonstandard. My point being that it's possible to have an elegant solution yet it's still hard to name succinctly.
That sounds confusion to me. Those patterns are called patterns, because people practice them over long time, during the cause, they decoupled them nicely. Assigning the right name, implicitly means the clear understanding of scope and responsibility. Self-invented solution may echo the original idea in some aspects, but most of time, is not even nearly as clean.
Naming things is the most fun part. At least in my opinion. Closely followed by creating the logo. Then going back to the naming again because for the original name you could not create a good logo.
Maybe I'm weird, but I think actually building things is the fun part. Naming things is the worst - every time that we have to name a product I feel like I'm in the Dilbert animated cartoon [1].
Plus it lets the talkers and the bullshitters look like they're doing something while they bikeshed over naming things, but for some reason they have to bring in the doers to spectate instead of letting them get things done.
The name makes out of the tool a product. Something you can relate to. When I was in charge of rolling out a regional scoring engine in Asia no one wanted the project. Renamed it to Blaze because of the underlying Fico Blaze Advisor software, suddenly things changed. Blaze is fast, efficient, fiery, daring. People are weird.
I always thought this referred to abstract computer science concepts like : constant vs literal vs variable, scope, composition vs aggregation, encapsulation, function, procedure, polymorphism, interface, loops, accessor, mutator, class vs object and instantiation, abstraction, linked lists, arrays, hash (not so much hash), recursion, etc.
I consider these names to be beautiful and succinct and a lot of thought was put into them.
One thing I do when I can't immediately think of a domain-appropriate name is to give an obviously wrong one, like "new_function_01". During a complex refactoring, I may have dozens of such methods or variables. Eventually the variations are all smoothed out and I can replace 20 different method calls with one method call with perhaps an additional argument. That method is way easier to name.
Naming seem to be the last step of refactoring. Refactoring is the process of manipulating / identifying the semantics of an application, names are human-readable tags that ease the mental task of understanding semantics.
Coding software needs to catch up with usability testing of presenting software. The dynamics are equivalent. In usability theory there is no one practice for all domains. In fact, usability theory completely embraces marketing where branding is important, i.e. distinctness is highly valuable. I suggest then that usability studies be companion to software written to the same extant as software presentation. Naming is just one aspect of usability. Information architecture, or grouping of names, is one of the most important roles in usability. In his report this year Jakob Nielson did a comparison study across ten years. Usability of software has increased in every category save one: information architecture. Grouping names into menus for tasks is really, really hard. The current trend of mega-menus is no help from experience. Doing usability studies and task analysis on the code would be a huge boon to usability of the application. It would raise overall usability awareness and its value.
Naming is hard for all the reasons OP mentions. But why I believe it is especially hard in computer science is that, like cache invalidation, it does not have a deterministic or "correct" solution. At some point, humans have to make a decision based on circumstances and tradeoffs.
Naming is very important. I think the problem is that it can take a little bit of mental effort to come up with a good name and it can distract a developer from their current train of thought. It just takes a bit of practice to get used to it.
Has someone looked into a problem of automatic (computer-generated) naming? As in, generate a unique name that somewhat describes this thing that was just generated (like a piece of code)?
I think the bigger problem with naming is simply being consistent. Ask 5 developers what the variable name for an important filehandle should be and you'll probably get at least 3 answers. As long as everyone is consistent, even if the names get a little sketchy here and there, that consistency goes a long way for code with a long lifetime.
The problem comes when someone else already used "product" for another concept in the system, and you now have to come up with an appropriate synonym.
(Or the same for any other common concept:
Object, connection, lock, etc)
Do I want 'filename', which is whatever was given? Do I want 'file', which is the file object? Do i want 'name', which is just the name of file, with directories, if any, stripped off? Do I want 'files', which is the list of files, as given?
If copying/moving is going on, then there's a source and a destination and it gets even worse.
The obvious way out is to refactor by renaming those variables, but 'unsafe_filename_as_given_by_user' or 'list_of_files_as_given' doesn't quite roll off the keyboard.