You know, sometimes duplicating code is just better. I've been around long enough to see many attempts at refactoring to share code! crash and burn. The hoops and complexity that some people go through to DRY (don't repeat yourself) are worse sometimes than the cost of maintaining two copies.
Sandi Metz said in one of her talks that wrong abstractions are worse than duplication and we just tell new developers about duplication because it's the only thing they understand. Strongly agreed.
Most programmers wouldn't know an abstraction if it hit them in the face. In fact, I highly doubt that most existing technology stacks allow you to build any abstractions at all, in the original sense of functionality so self-contained that a user can't distinguish it from a programming language primitive, not just syntactically, but also semantically. In particular:
(0) If you need to know or care about its implementation details when you use it, it's not an abstraction.
(1) If unrelated ad hoc cases are hardcoded together into a single procedure, it's not an abstraction.
(2) If it needs fifteen tunable parameters, some combinations of which don't even make sense, it's not an abstraction.
At some point in time, programmers discovered that selection (if, switch, pattern mathcing, whatever) could be replaced with dynamically dispatched first-class procedure calls. Since then, they haven't stopped making their code more “abstract” by making their control flow impossible to follow. Alas, this isn't real abstraction, because it doesn't achieve the intended purpose of reducing the number of things you have to keep in your head simultaneously.
So, rather than blaming “wrong abstractions”, I'd blame “not real abstractions”.
> In fact, I highly doubt that most existing technology stacks allow you to build any abstractions at all, in the original sense of functionality so self-contained that a user can't distinguish it from a programming language primitive
Two of the most popular languages out there, C++ and JavaScript, are both great at this.
Are you serious? How do I define complex numbers (just to give one example) in JavaScript that actually behave like, you know, complex numbers? Recall that:
(0) The usual symbol for complex number addition is +.
(1) The operation isn't associated to any individual complex number (i.e., it's not an object method), but is rather an intrinsic part of the algebraic structure of the complex numbers.
(2) Complex numbers don't have a physical object identity in memory.
C++ fares a little better than JavaScript, but not a lot better.
The major JS frameworks look like languages onto themselves. JavaScript lacks operator overloading, so there are some limits, but it is quite possible to turn JS into something very unrecognizable.
C++ templates allow for almost anything. I have seen incredibly strange things done to the language with templates. Since Templates can be used to parse, and execute, entirely different programming languages, it is hard to imagine a level of abuse to which C++ templates are not amenable.
> The major JS frameworks look like languages onto themselves (...) it is quite possible to turn JS into something very unrecognizable.
That doesn't mean anything. With enough determination, anything can be turned into “something very unrecognizable”. Heck, sometimes the results are relatively pleasant even: http://libcello.org/ . But, if a language doesn't allow you to define basic abstractions like complex numbers (not to mention slightly more elaborate ones, like queues that are indistinguishable from one another if they contain the same elements in the same order), then that obviously reflects very poorly on its abstraction facilities.
> C++ templates allow for almost anything. I have seen incredibly strange things done to the language with templates.
What you can't do in C++, however, is make your abstractions not leak. Every template library is one template specialization away from being broken. This is unlike languages that enforce their abstractions by more robust means, like parametricity and macro hygiene.
It's a great principle and I reference it frequently, but as a catchy buzzphrase it has the potential to do as much harm as good: it's easy to abuse as an excuse for copy paste programming.
Duplication is still bad, and correct abstractions are still good.
See also "premature optimization is the root of all evil", which I occasionally see stretched to include any kind of thinking about the performance characteristics of your code, even down to basic design decisions like data structures or algorithms.
Great point. I've found that most codebases have some abstractions that work very well and others that work less well, and simply having awareness of this can help make the code far more readable.
For instance, one might say: "The way x is implemented works, but it's not quite the right abstraction, here's why". That is way more informative than saying it's simply the wrong abstraction or considering it good enough and ignoring its deficiencies.
I feel this in my soul. I work on a JavaScript project now where sometimes two duplicate pieces of code are “refactored” into one function unnecessarily. Just because there are two similar lines doesn’t mean they need to be their own function/class/whatever, especially when it makes it harder to generalize later when you need similar code that DOESN’T make sense to lump in with that abstraction.
I keep having to simplify the code and it’s exhausting.
Agreed that fanatical adherence to DRY is a mistake. I did this when I was first starting out: any code that looked similar at all would be factored out into a new hyper-specific function, regardless of how unrelated the original pieces of code were. But this introduces extreme coupling between components that have no right being coupled, and makes evolving the code a nightmare (suddenly all your DRY-borne functions grow immense lists of configuration parameters to account for unforeseeable differences).
At the same time, let's not take this dogma too far in the other direction. Repeating oneself is still undesirable, but DRY-related refactoring should focus on behavior and intent, not on how the code looks. And abstractions are still good (they're how we get anything done at all!), but leaky abstractions should be avoided, indirection-via-abstraction should be minimized, and we should be vigilant against extreme and overeager overapplication of abstraction (insert your blub joke here).
> But this introduces extreme coupling between components that have no right being coupled, and makes evolving the code a nightmare (suddenly all your DRY-borne functions grow immense lists of configuration parameters to account for unforeseeable differences).
* A and B both depend on the same logic X.
* You make a shared helper function X
* Requirements for A change, necessitating a change to x.
* You check if B requires the same change. If yes, change X. If no, fork X into X_A and A_B
Now you're staying DRY, and DRY has saved the day by forcing you to think about whether A's change should impact B.
If you had prematurely forked X, then X_A and X_B would drift without any sanity check on whether they should.
> The hoops and complexity that some people go through to DRY (don't repeat yourself) are worse sometimes than the cost of maintaining two copies.
The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs. So if your code is literally the same, there's no reason not to extract it into a function.
That said, the moment you have to start adding parameters in order to successfully factor out common code, ie. to select the correct code path taken depending on caller context, that's when you should seriously question whether the code should actually be shared between these two callers. More than likely in this case, only the common code path between two callers should be shared.
> So if your code is literally the same, there's no reason not to extract it into a function.
If the two pieces of code are likely to change in different ways, for different reasons, that is a strong reason not to extract it into a function even if they happen to be character-for-character identical for the moment.
Less code means fewer bugs, but that doesn't mean I should be working on the gzipped representation.
> If the two pieces of code are likely to change in different ways, for different reasons, that is a strong reason not to extract it into a function even if they happen to be character-for-character identical for the moment.
The future is much more malleable than your immediate needs.
But even if your future turns out to be true, the code you need to refactor is then already extracted to a function, so you can easily duplicate that function, make your localized changes, and change the callers to the new function. So this is still the best route.
> Less code means fewer bugs, but that doesn't mean I should be working on the gzipped representation.
Don't be absurd. gzipping doesn't preserve your program in human-readable form. Extracting code into a reusable function makes your program more human readable, not less.
In the parent comment, you said there was no reason. I maintain that it's a strong reason. It may or may not be a sufficient reason, weighed against other considerations. In particular, if it harms readability that's also a strong consideration.
But sometimes it can help readability, too. "DRY" as a principle was originally formulated in terms of repetition of pieces of knowledge rather than code, and I think in those terms it's far more useful. If this code represents "how we frob the widget" and that code represents "how we tweak the sprocket" and there's no reason for those to agree, they should probably be separate functions. Pulling them out into a "tweaking_sprockets_or_frobbing_widgets" function is making things less readable, because it's conflating things that shouldn't be conflated. If there is not some underlying piece of knowledge - some statement about the domain or some coherent abstraction that simplifies reasoning or some necessary feature of the implementation - combining superficially similar things is just "Huffman coding".
> Extracting code into a reusable function makes your program more human readable
When done properly, yes. When done to the point where a five line function is created with ten inputs (yes, this is real), no. But DRY tells us that the five lines of duplication is unconditionally worse.
Hell, I've even seen things like logging/write tuples (i.e. log the error, write the to a socket) encapsulated, even though the only non-parameter code ends up being the two function calls.
Anything, taken to extremes is bad. The problem with DRY is it encourages that extremism.
> But DRY tells us that the five lines of duplication is unconditionally worse.
I agree that's often how DRY is understood, and that it can be a problem.
It is not how DRY was originally formulated, which was "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system." This differs from blind squashing of syntactic repetition in two important ways. First, as under discussion here, if things happen to be the same but mean different things, combining them is not "DRY-er". Second, there can be repetition of knowledge without repetition of code. For instance, if we are telling our HTML "there is a button here", and our JS "there is a button here", and our CSS "there is a button here", we're repeating the same piece of knowledge three times even though the syntax looks nothing alike.
I make no claim as to whether the flawed, more common understanding or the original intent is what "DRY really means", but I think the latter is more useful.
This is correct, and the below should not be taken in argument with it.
DRY as a guiding principle sometimes has a secondary beneficial effect that was not discussed. Two pieces of code that happen to be the same but "mean different things" should not automatically be deduplicated by dumb extraction. However, the fact that those two things share code may, when viewed through the lens of "prioritize-DRY-ness", hint that the two share a common underlying goal, which can be abstracted out into functionality that can be used by both.
Put another way: if the code to control a nuclear reactor circuit and the code to turn on a landing light on a plane happen to be the exact same, they shouldn't be blindly deduplicated into some library function, but the fact that they're the same may indicate a need for a more accessible, easily-usable-without-mistakes way of turning that kind of circuits on and off.
> When done properly, yes. When done to the point where a five line function is created with ten inputs (yes, this is real), no. But DRY tells us that the five lines of duplication is unconditionally worse.
I'm not convinced by your example. There are plenty of mathematical calculations taking numerous parameters that I think should be in a distinct function.
Even for non-mathematical calculations, 5 lines of code that are used repeatedly as some sort of standard pattern in your program should also get factored out. Like your logging example, ie. you consistently log everything in your program the same way, then sure, refactor that into a shared function. Then if you suddenly find you need to log more or less information, you can update it in one place.
Of course, I understand your meaning that sometimes factoring out doesn't make sense, but if you find repetition more than twice as per DRY, refactoring seems appropriate.
In web frameworks, there is usually a little bit of boiler plate for each view.
You could refactor this completely away, but not without an almost total loss of flexibility and a good amount of readability too. Often the views will look very similar, then start diverging as the project grows.
With you on refactoring common patterns out, and yes, some people don't do this enough. But really, the important thing there is that those patterns are truly common to a large degree and should stay in sync - so it's worth it to introduce a maintain a new concept to keep them that way.
You DRY up code by writing indirection. That's the expense of all abstractions. You can't believe that all indirection is worth it at all costs, so I'm not sure what point you're belaboring.
> You can't believe that all indirection is worth it at all costs
I think I've been pretty clear about the costs and when this is worth it, particularly in my first post in this subthread, which I'll quote here:
> That said, the moment you have to start adding parameters in order to successfully factor out common code, ie. to select the correct code path taken depending on caller context, that's when you should seriously question whether the code should actually be shared between these two callers. More than likely in this case, only the common code path between two callers should be shared.
Or if you want a more concise soundbite: refactor if your indirection is actually a clear and coherent abstraction.
> When done properly, yes. When done to the point where a five line function is created with ten inputs (yes, this is real), no. But DRY tells us that the five lines of duplication is unconditionally worse.
I work with people I would describe as... junior at best (lots of boot campers) and I see this all the time. Functions that just return an anonymous function for no reason, half JSON blobs returned from functions that are called with one string instead of just repeating the blob in the code, etc.
That’s a popular claim. I wonder how many failed projects could have it as their epitaph.
Have you ever worked on a project where the requirements changed so fundamentally from one day to the next that you truly, honestly had no idea where you were going next?
I haven’t. I’m not aware that I’ve ever met anyone else who has, either.
The claim that requirements always, or even usually, change so dramatically within such short timescales that it isn’t worth laying any groundwork a little way ahead simply doesn’t stand up to scrutiny, in my experience. Any project that was so unclear about its direction from one day to the next would have far bigger problems than how the code was designed.
Otherwise, there is always a risk that by being too literal, by ignoring all of your expectations about future development regardless of your confidence in them, you climb the mountain by climbing to one small peak, then down again and up the next slightly higher peak, and so on. This could be incredibly wasteful.
Of course requirements often change on real world projects. Of course I’m not advocating coding against some vaguely defined and mostly hypothetical future requirement five years in advance. But often you will have some sense of which requirements are going to be stable enough over the next day or week or month to base assumptions on them, and insisting on ignoring that information for dogmatic reasons just seems like a drain on your whole development process.
>That’s a popular claim. I wonder how many failed projects could have it as their epitaph.
Way less than over-ambitious projects that died because of things that they didn't need, immortalized in lots of classic Comp-Sci literature. From Fred Brooks' books to Dreaming in Code:.
There's a reason it's a popular claim. In fact, popular means it's just repeated by many -- but this claim one can read repeated by the most experienced and revered programmers (or an analogous one, e.g. the KISS principle, "Do the simplest thing that could possible work", etc.), from the Bell Labs guys to the most celebrated programmers today.
>Have you ever worked on a project where the requirements changed so fundamentally from one day to the next that you truly, honestly had no idea where you were going next? I haven’t. I’m not aware that I’ve ever met anyone else who has, either.
Welcome to my life :-)
Not being snarky -- rapidly changing requirements is the number one complain in my kind of work.
Well, I didn’t say there was only one way a software project could fail! My point is simply that I believe anticipating and allowing for future requirements is a matter of costs and benefits. It’s about comparing the cost of making a wrong step and then having to backtrack with the cost of following a circuitous route to the final destination instead of a more direct one. Both are bad if we make the wrong choice, and we can’t see the future to make an informed decision about the right choice, but we can at least look at the expected cost either way and make an intelligent decision in any given case.
> If it's not actually re-used then it's just making me jump around to see what's actually happening rather than reading straight through the code.
Firstly, you only actually create a function either when it is being reused, or because it's functionality is a logically separable responsibility and so you factor it out for understandability.
Either way, the function should also have a meaningful name describing its purpose so you don't have to jump around to understand what's actually happening.
Meaning, your future needs are ever changing and often unclear. Your present needs are immediate and usually obvious. Meet your present needs first and foremost without sacrificing flexibility to meet future needs. Factoring code into functions accomplishes this.
> I've come across plenty of small functions that I couldn't understand without checking the calling functions for context.
Sure, happens to me too when I don't assign meaningful names, or the functions don't actually encompass a small, meaningful set of responsibilities, or the functions use deep side-effects that require reasoning within larger contexts.
The problem with such programs isn't factoring into functions though. If anything, this step reveals latent structural problems.
>The number of bugs are proportional to the lines of code, this is undeniable from empirical data.
Do you have a link to this undeniable data? I haven't seen many empirical code quality studies that are not littered with possible confounders. It's very difficult to do these studies.
I do believe that bug density is roughly proportional to the size of the codebase if you average over large corpera of code. But the bug density of different types of code within those corpera varies a great deal in my experience.
So I think the important question is what sort of code is duplicated and why it is duplicated. Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
For a summary, sure [1]. There have been loads of studies on various metrics, but none have actually been any better than simple lines of code, despite the fact that it has such a large variance as a metric.
> Removing duplication means creating dependencies. If we create the right dependencies, i.e. the ones that enforce important invariants, that's a good thing. But that is a big if.
While enforcing invariants would certainly be good, I'm not convinced that's the only reason DRY reduces bugs. Common functions get manual reviews every time the code that calls them also get reviewed and/or refactored, whether due to new features or bugfixes.
DRY increases exposure of more commonly used paths through your program.
>There have been loads of studies on various metrics, but none have actually been any better than simple lines of code
True, but that doesn't mean SLOC is a very useful metric. Say you were to rewrite a large Java codebase in a language that eliminates all getters and setters. You will have greatly reduced the number of SLOC, but it is very unlikely that you will have reduced the number of bugs very much.
In other words, going for the low hanging fruit of programming language design wouldn't necessarily help much. Bugs are not evenly spread out over the entire codebase.
You will have greatly reduced the number of SLOC, but it is very unlikely that you will have reduced the number of bugs very much.
I agree strongly with your point about potential confounders (and was about to make it myself) but now you are making your own assertion that I'm uncertain about. Why are you confident that switching to a language that greatly reduces the number of lines of code would not reduce the number of bugs?
While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely. There might be some counter-force that would confound this, but I wouldn't eliminate it out of hand. So what makes you say "very unlikely" rather than "not necessarily true"?
I think the idea was "A language that is otherwise identical to Java but eliminates the need to write trivial getters and setters." This obviously reduces line count, but probably does not equivalently reduce bug count, as the removed lines are very unlikely to contain bugs.
This obviously reduces line count, but probably does not equivalently reduce bug count, as the removed lines are very unlikely to contain bugs.
While I agree that the bugs are not likely to be in the trivial code, I don't think it's a given that presence of the trivial code has no impact on the number of bugs elsewhere. Consider a "fatigue" based model, where the human brain is distracted by the monotony of the bug-free getters and setters and thus unable to pay sufficient attention to the logic bugs elsewhere in the program. And again, I'm not making that claim that eliminating boilerplate reduces bugs, only objecting to the assumption that it does not.
My intent was to clarify (what I perceived to be) the parent's argument, more than make one of my own.
I think if our process is "1) write the software in Java, 2) remove those lines", it's clear that we've probably changed the average bug density of the project. I agree that there is much reason for concern in generalizing that result to what would have happened if we'd written in that other language to begin with.
I simply haven't found many bugs in (mostly auto-generated) getters/setters during the past 25 years, but it's purely anecdata of course.
>While I don't know of any hard evidence, it passes my internal plausibility test that if some of the "2 screen" functions become "1 screen" function, bugs might be less likely.
Yes, mine too, but only for randomly chosen pieces of code. What I don't believe is that the linear correlation between SLOC and bugs that studies have found in large codebases allows us to pick and choose the lines of code that are easy to eliminate and expect the number of bugs to drop proportionately.
> The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs.
I do not think that this analysis can be applied to decisions about whether to duplicate code or not.
First, if your code is the same, you're not going to have two different bugs in the two copies of it.
Second, trying to change the number of lines of code in your project without making deeper changes is the sort of thing that very directly confounds the analysis. It's like going from "Smaller companies have happier employees" to "So we should fire half our employees because empirical data shows the rest will instantly become happier."
> First, if your code is the same, you're not going to have two different bugs in the two copies of it.
No, but you might see two different buggy behaviours due to contextual differences in how the code is used.
> Second, trying to change the number of lines of code in your project without making deeper changes is the sort of thing that very directly confounds the analysis.
Except refactoring into smaller reusable functions is precisely a deep structural change.
The comment I was replying to was about deduplicating code that is "literally the same". In that case, you wouldn't make the code more or less buggy if you collapsed the multiple copies of the code down into one, and no deep structural changes would be involved in refactoring.
If there are actual changes involved in refactoring, I have no a priori expectation of whether that reduces or increases bug count, and I can see good arguments that it's likely to increase them, since you're making the code more complex in order to satisfy the demands of multiple consumers and therefore exposing each consumer's unique complexity to the other as bug surface. (Case in point: Heartbleed resulted entirely from a little-used extension to DTLS, a variant of TLS over UDP, which 99+% of OpenSSL users never cared about.)
> The comment I was replying to was about deduplicating code that is "literally the same". In that case, you wouldn't make the code more or less buggy if you collapsed the multiple copies of the code down into one, and no deep structural changes would be involved in refactoring.
Firstly, any refactoring involves a code review of what you're factoring out. This has a non-zero probability of revealing bugs, so I already disagree with your claim that it wouldn't change the bug count.
Secondly, if you're having difficulty refactoring, that's a strong hint at deeper structural problems, so it yields information on what kinds of structural changes are needed.
> If there are actual changes involved in refactoring, I have no a priori expectation of whether that reduces or increases bug count, and I can see good arguments that it's likely to increase them, since you're making the code more complex in order to satisfy the demands of multiple consumers.
Are you making it more complex? Because that doesn't seem like a sound refactoring in my mind. Special cases require special code, you don't place special cases in a general function, unless the function itself only handles the special case. I already covered this in my original post where I discuss when DRY isn't appropriate.
> (Case in point: Heartbleed resulted entirely from a little-used extension to DTLS, a variant of TLS over UDP, which 99+% of OpenSSL users never cared about.)
I don't see how this is a point in your favour. It's a point that little used and little inspected paths are more likely to be vulnerable. But a reused function gets more use and more review than inline code. In other words, Heartbleed would probably still not have been found if it weren't part of a common function, and instead were littered in various places throughout a code base.
>The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs.
The first part is obvious (more code of course brings the potential for more bugs).
The deduction (hence more code = more bugs) is only valid if the studies controlled for the similarity of the code, cyclomatic complexity and other such factors.
Just because more code has impact in the number bugs, doesn't mean more code by itself means more bugs. Correlation != causation.
Program A having more code than program B could mean e.g.:
(1) A is inherently more complex (and really needs more lines, the same way an IDE needs more lines than an simple text editor), and thus will naturally be more prone to bugs.
(2) A and B has similar functionality as programs, but B is written by people who meander and write bloated code and needless abstractions (turning a 100 line program into a 1000 line hell of "design patterns" and factorySingletonProxy "flexibility"). Which will again bring in more bugs.
But that's not necessarily the case if A is bigger than B due to simple repetition that doesn't introduce complexity. Which is exactly what we're discussing in this thread.
For a trivial example, 10.000 lines of "print 'hello world'" repeated won't have more bugs than a 1000 line complex C program.
> Just because more code has impact in the number bugs, doesn't mean more code by itself means more bugs. Correlation != causation.
So the only possible causations for the correlation we're discussing are:
1. more program code causes more bugs
2. more bugs causes more program code
3. some unknown third factor(s) simultaneously causes both more bugs and more program code
I think 1 and 3 are most often the case, where 3 could be something like developer inexperience, although some studies have shown that even experienced developers still introduce bugs at comparable rates to novices (just lower constant factors). I think 2 sometimes happens to address immediate needs, ie. hotfix for specific bug X may introduce more bugs, but I doubt it's the rule.
Regardless, my original claim still seems pretty undeniable, ie. more program code tends to yield more bugs.
> For a trivial example, 10.000 lines of "print 'hello world'" repeated won't have more bugs than a 1000 line complex C program.
But 10,000 lines of "print 'hello wrld'" would have more bugs than a 1,000 line complex C program. Probably on the order of 9,000 more bugs in fact.
The numbers we're talking about here are averages across all programs of comparable length, not to be applied literally to any specific program, because it turns out that those specific program qualities don't really matter, ie. LOC is still a more accurate predictor of bug count than cyclomatic complexity and other metrics.
Thus I can say that a 1,000 line program probably has about X bugs, and I probably won't be off by an order of magnitude unless the program was verified by a theorem prover or something along those lines. Something like verification is really the only confounder that I've come across.
> The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs. So if your code is literally the same, there's no reason not to extract it into a function.
Code sharing often increase coupling and the coupling can be at odds with naive attempts at achieving DRY. The real world is not as simple as this makes it out to be.
You are right more code means more bugs, but more code might also be the difference between a viable product and one that hasn't been built yet because everybody is locked into DRY bureaucratic hell. As in everything, there's a balancing act that needs to be performed and that requires judgement and experience.
> The number of bugs are proportional to the lines of code, this is undeniable from empirical data.
Including unit tests? A code base with unit test is likely to contain less bugs and more lines of code. And what about code golf?. This just does not seem right and I believe the original saying is related to languages with/without rich stdlibs and/or the NIH syndrome.
>The number of bugs are proportional to the lines of code, this is undeniable from empirical data. Ergo, fewer lines of code will tend to yield fewer bugs. So if your code is literally the same, there's no reason not to extract it into a function.
Except you've added a dependency that connects two pieces of code that were independent before.
It depends. If the duplicated code is supposed to do the same thing in both places then the two pieces of code are already dependent. You change one of the duplicates, you need to change the other one two.
The number of bugs are proportional to the lines of code, but shared code can massively multiply the cost of making a change, because of the risks involved.
Sometimes duplication is worth it simply because it reduces the impact of changes.
But if I fix all X bugs in code block A, then a code block containing B+A still contains X bugs.
That's one of the big problems with code duplication. "You fixed a bug? Great. Did you fix it everywhere?" It's better if there's only one place to fix it, because when you've fixed it, you've fixed all of it.
But as others have said, eliminating this is not the only good thing in programming. There are limits to how far you should go to prevent or eliminate duplication.
> It's better if there's only one place to fix it, because when you've fixed it, you've fixed all of it.
You've got one aspect, and the other aspect is that code factored into a reused function F is now reviewed more too, ie. whenever you're reviewing the callers, you work through the functions they call as well to trace the behaviour. So you're also much more likely to find any bugs in F.
It's a double-whammy for bug squashing, which is why DRY is such an important principle. Certainly there are some misapplications, but its benefits hold up pretty well in a wide variety of scenarios.
If the bug being fixed was expected behavior by all the other dependencies, you just screwed over everyone else who should have duplicated what they needed in the first place.
Nothing wrong with importing a left-pad module if you're using a sane language with a sane development toolbox. I'll leave it to you to decide whether JS fits that bill.
These days I prefer SPOT (single point of truth) over DRY. I find DRY tends to encourage people to use hideously complicated abstractions to avoid code that even looks similar. It's also way easier to refactor some duplicated code than to try and untangle complicated abstractions.
SPOT more accurately identifies the pain point of duplicated code - if there is an algorithm or configuration value that needs to be specified, than that should just be in a single place, everything consistent and easy to change.
I'm unclear as to whether the parent is preferring the name SPOT for the same concept, or preferring what they are presenting as a different concept (which, as you note, seems to match the original definition of DRY).
Good name, it helps with distinguishing things that are meant to be the same (e.g a single implementation of an encryption algorithm, used in various places) and things that are distinct but accidentally identical (e.g. permutations in different stages of that algorithm).
The question I ask myself is: Are these two things that look the same, actually the exact same problem, and do I expect them to continue being the exact same problem?
If the answer is an easy yes, then build your abstraction. If it's no, don't. If it's maybe, leave the duplication in place until you have a clearer understanding.
One of the problems is that people hear "DRY" and they don't hear "Every piece of knowledge must have a single, unambiguous, authoritative representation within a system", which is the actual idea. They don't even share any words. The actual idea is really cool and demands correct abstractions, DRY is about making things not look the same.
In an ideal world, what you want is more like copy-on-write. Share the code until one of the clients of that code needs a change that the others do not, then copy it. In reality, what happens more often is an accumulation of different functionality used by different clients. Copying "defensively" as you suggest can help avoid that problem, but it has the opposite problem of requiring changes to all copies when all clients want the same thing. In my experience, this is the problem more likely to lead to painful mistakes. There is no silver bullet...
Even with "copy on write" there would still be the problem of wether to copy before write or not: requirements for usage A change because someone recently thought hard about that part. Usage B is most likely not on anybody's mind right now andchanging that is quite some effort. It's a problem no matter wether you copy, DRY or something in between.
Maybe some of our processes, language and experience from the version control space (branch, merge, rebase, upstream...) could be ported over to the reuse vs duplication problem. But when you layer that with the conventional use of those concepts in version control, the result would be quite the multidimensional mindbender. With that as a hypothetical baseline, simply guessing a good compromise between abstraction and duplication and then dealing with the consequences suddenly does not seem that bad a fate.
I don't think prototype-based inheritance (or monkeypatching) is any better at avoiding duplication than "regular" Java-style inheritance, and both will only help avoid certain kinds of duplication some of the time.
If you need to modify a small part of a class's function, there's typically no way in a prototype/traditional inheritance language, to say "like the function from the parent class, but with these 5 lines different".
Inheritance of any kind is a handy tool to help with the "whole class files full of pages of identical code with one different method" problem, but a) that's not the cause of a lot of duplication out there, b) you still have to know/care to implement the inheritance, and c) it can come with its own struggles, a la tight coupling to implementations/classes you may not fully control. This isn't a pro-OO or anti-OO screed; just pointing out that inheritance as a means to DRY up code is a limited solution at best.
I wonder if the man hours wasted on hunting through someone else's code to figure out the bajillion abstracted functions are doing when you are trying to make a change outweigh the minor inconvenience of copying and pasting changes are? Yes the latter introduces more opportunities for errors, but are they more than knowing all your dependencies and ensuring your changes don't break their expectations? Unit tests are only as useful as the man who knows the future.
That's basically what I'm saying: it's unknowable whether copying and pasting and needing to keep all the copies up to date in the future will result in more or less work and bug fixing than re-using code. I think the best solution is for experienced developers to make an educated guess on a case by case basis. But you can't write that into a generic principle.
> In an ideal world, what you want is more like copy-on-write. Share the code until one of the clients of that code needs a change that the others do not, then copy it.
Good point. The problem with that is, of course, that you may not be aware of others existing.
I like to say that programming is not an exercise in Huffman compression. Not all structurally similar chunks of code should be unified: sometimes these chunks are _semantically_ different, and unifying them would create a false and confusing commonality.
The issue comes from trying to DRY the code before the requirements and functionality are well defined. If you're doing major refactoring before you have something in production it's just premature optimization.
I mentioned a few months ago in a similar thread that I now actively try to recognize when I'm creating an incomprehensible 300-line "abstraction" in order to avoid copying four line of code.
The flip side of this occurs when trying to evolve systems with too much copying — particularly when there is not a good way to find where “these things should behave the same” has spread ... I get really nervous when coding up a pattern based on copying when the main reason for the copying is that there _isnt_ a good way to share the logic without a lot of hoops ... copying when I see how I could easily have shared the logic feels like such a safer decision, much easier to justify because I can always see how to refactor later if it proves that the copied pattern really does represent a use that evolves with shared requirements. When I’m put into a situation where I know that I’m only copying because I can’t see a good way for the logic to be shared I feel like I’m setting up an infinite game of bug wack-a-mole as errors due to untraceable synchronization requirements are doomed to reoccur forever as the system changes ... I find these scenarios frequently popping up when using tools with poor composition capacity (lots of build and deployment systems) — “I don’t know how to compose this stuff so I’m just going to copy and will hate this later” is the worst feeling.
Don't abstract it out until it's used in multiple places. Don't abstract it out using a complicated system, like inheritance, when you can use a function in a module. Don't abstract it out if it's superficially similar as opposed to actually the same thing.
Regarding the last point: iterators, good abstraction; averaging function, good abstraction. These things never change. But bad abstractions copy things that are only similar in appearance.
Programmers seem allergic to the idea that they would have to go through and update several functions that do the same thing. This is mentally a very trivial task though, much preferable to trying to alter some 'clever' dependency. In a super agile environment especially I find DRY practices will quickly build a huge web of garbage.
I've seen this also. The amount of hoops and jumps people go through in order to abstract something can be ridiculous, ending in code only the author can understand. Bad for business!
...that only the author can understand, and not after being done writing. But still, good abstractions are much better than duplicated code, yet bad abstractions can easily be worse.
"Minimize both duplication and bad abstractions" should be something everybody can get behind, even if only because it leaves all the details open.
r0ml gave a good argument for duplication over abstraction in his Debconf talk this year: https://debconf17.debconf.org/talks/194/ (video, I'm not sure if there's a transcript, but if you're bored on a Thanksgiving, I would recommend watching the whole thing)
I am pretty sure the crgoculting testless bloated code dismayed the incompetent refactorer. This is not a proof of benefit of duplication, it's a proof of the low quality of the legacy code.
As "michael" points out in the comments on the article, this has already been invented, it's called a macro and has been available for decades in various LISPs. Wikipedia lists a number of other languages [1] that support macros.
The problem with having powerful macros is always the same: as a project grows and grows you end up inventing your own little dialect of the language which is opaque to any 3rd party reading your code unless they take the time to unravel your macros.
I think Rust did two things right in that regard: firstly the macro invocations are always suffixed with '!' so you know it's not a regular function call right away. Secondly Rust macros are so quirky, ugly and painful to implement that you only ever use them as a last resort, so people tend not to abuse them too much.
>..inventing your own little dialect of the language which is opaque to any 3rd party reading your code unless they take the time to unravel your...
And that, ladies and gentlemen, is called a framework.
Macros or no macros, if the problem domain is large and you stay on it long time, you end up with framework. Macros can make that framework easier, not harder.
In most languages you end up with large configuration files and huge number of strings as parameter. The program semantics might be defined by interpreting JSON structures, template languages and ungodly amount of string parameters that are in effect reserved words. If you persist long enough, you have code generators and state machines and define execution semantics with petri nets.
What you want to do us use the programming language that is best suited to solving the problems you have. Growing the language towards the problem you have is great opportunity
to make the programming interface simpler to understand.
With Lisp, especially with Common Lisp it's possible to grow the language so that it stays familiar as much as possible.
I don't understand your post at all. Can you give some examples maybe?
>In most languages you end up with large configuration files and huge number of strings as parameter.
Not in the languages I'm familiar with at least. I don't understand what configuration files and macros have to do with each other.
>Growing the language towards the problem you have is great opportunity to make the programming interface simpler to understand.
Simpler to use, not to understand. You know what it does, you don't know how it does it or how it can be extended because it doesn't play by the language's usual rules. It's nice if you're copy/pasting stack overflow snippets or your use case is very standard, it sucks if you want to modify the code to do something a bit different and out of the box because now you have to learn the rules of this custom DSL.
>With Lisp, especially with Common Lisp it's possible to grow the language so that it stays familiar as much as possible.
So adding custom constructs to the language makes it stay familiar? That doesn't make a lot of sense to me
> So adding custom constructs to the language makes it stay familiar?
There are rules and conventions for writing macros. If you follow them, their usage will be clear for anyone who knows the language - because the language itself includes a lot of macros and they all follow the same set of rules. Common Lisp, Clojure and Elixir are very good examples of this (with Racket taking it up to 11).
Of course, you can write macros which work and behave in unexpected, weird ways. In practice, though, you don't - why the heck would you?
I don't have enough experience with macros to agree/disagree, but parent's comment stood out to me because of the particular framework I've been using lately.
> It's nice if you're copy/pasting stack overflow snippets or your use case is very standard, it sucks if you want to modify the code to do something a bit different and out of the box because now you have to learn the rules of this custom DSL.
This basically describes Spring, and I'm not just being snarky. To someone from the outside, Spring's annotations seem to have all the problems you list. I have trouble imagining that heavy use of macros could be much worse, and I can certainly see the overlap between macros and frameworks.
Every program is a DSL. What do you think you're doing when you define types and functions except make a little DSL of your own? Granted, your "DSL" is somewhat semantically restricted in that it has to conform to the limitations of its host environment, but this limitation doesn't free readers from having to learn how your "DSL" works and imagine ways to manipulate it.
Seen in this light, the kind of DSL you can create with a full-blown macro system isn't all that different.
Programming, in large part, is an exercise in language creation.
> And that, ladies and gentlemen, is called a framework.
It's different. Macros can change the language grammar (it's like inventing a new way to structure a sentence in English, instead of the usual object-verb-complement). Libraries just provide new vocabularies (new names, new verbs, etc.).
Too often frameworks end up also changing the grammar, but doing it in a clunky unnatural way, shoehorning it into the language with tons of boilerplate code that could have been eliminated with a few simple macros.
A framework cannot change a programming language grammar without using macros. Most framework provide new types, functions, classes, constants, etc., which are expanding the vocabulary, not the grammar.
You can use the English language to express ideas on very different topics (food, law, engineering, romance, etc.) without changing its grammar.
> It's different. Macros can change the language grammar
But that's the point. When you can't change the grammar, you instead end up writing in a different language that doesn't even have a grammar, and encoding it in the base language plus a bunch of horrible configuration.
>as a project grows and grows you end up inventing your own little dialect of the language which is opaque to any 3rd party reading your code unless they take the time to unravel your macros.
This is bad use of macros, or an ugly macro system.
Macros, at least in Lisp, made code even clearer to understand; because they let you create the constructs that make the problem domain map more directly, straightforwardly, easily, to the programming language.
So they reduce line count, they reduce need for kludges or workarounds. They allow more straight code.
But this is within the Land of Lisp, where writing a macro isn't something "advanced" nor "complex" nor "esoteric". In the Lisp world, writing a macro is 95% similar to writing a run-of-the-mill function.
No true Scotsman would ever write macros in such a confusing manner!
But really, this is a recognized problem of Lisp, and has been called the Lisp Curse. [0] One is never programming in "just Lisp", but rather in Lisp plus some half-baked DSL haphazardly created by whoever wrote the program in the first place.
Also, don't confuse readability with understanding. Yes, DSLs are typically easier to read, but only after you come to understand the primitives of the language. When every program has its own DSL with its own primitives, even programs that do similar things... That becomes quite a burden.
> Also, don't confuse readability with understanding. Yes, DSLs are typically easier to read, but only after you come to understand the primitives of the language.
This is also true of functions. Without reading the body, you don't know if it's just going to return the sum of the two integers you passed to it, or if it's going to change some global variable, launch the missiles, and then return a random int.
Yes, macros are more powerful, and therefore you need to be more careful with them. But they are still much better than what ends up being used instead. With languages that don't have macros, you end up with complex frameworks that use runtime reflection, or code generators that run as part of the build system (which end up being an ad-hoc, messy macro system).
Or, some horrible solution where you embed a DSL by interpreting trees of objects, which effectively represent an AST. In this case, the embedded language doesn't follow the language rules, but it seems like it does, because you're looking at the implementation of the interpreter, instead of at the syntax of the embedded language.
If I understand "the Lisp curse" correctly, the claim is that Lisp often winds up with a "half-baked DSL" because making DSLs in Lisp is so easy. You can do it without putting very much thought into it, so it's easy for the original author to just slap something together.
Note well: This is my understanding of the claim. I take no position on whether it is true.
Why does it need to be "half-baked"? Why do you assume that writing a good DSL is impossible for most Lisp users? Are you sure it's actually the case?
Writing and maintaining a good DSL is like writing and maintaining bug free code. You always start with the best of intentions, but human fallibility and entropy are always pulling you in the wrong direction.
This doesn't mean that the attempt is not worthwhile. But it does mean that you should expect eventual failure.
As other commenters have replied: It's half-baked the same way Java tends towards half-baked enterprise design lasagna and/or design pattern bingo. It's easy to do, and most won't question it.
Also, yes I would propose that a good DSL would be difficult to write for most programmers of any type. Not because of any inherit deficiency in the programmer, but rather because we tend not to spend enough time in a single domain to understand it well enough to write a good DSL.
>Yes, DSLs are typically easier to read, but only after you come to understand the primitives of the language.
Quoting user quotemstr here:
"Every program is a DSL. What do you think you're doing when you define types and functions except make a little DSL of your own? (...) Programming, in large part, is an exercise in language creation."
That's just wrong. A DSL defines a new language syntax. Most programs don't do that. You might have to learn what functions do but you don't need to learn an entirely new language when you read a Go program for example.
As IshKabab hints at, there's a huge difference between defining a domain within an existing language and defining a domain-specific language. Yes, types and functions do define a domain, by detailing the data compositions and operations available. But those data and operations work within the confines of the existent language.
Isn't loop essentially a macro (probably special op) that a lot of people hate, specifically because it is a sprawling dsl? Because lisp has an expressive macro system, it can lead people to that. The existence of macros requires an at least above average dev community, because it's one thing when a decently-designed but controversial dsl like loop is default to the language, it's another when every project can roll out their own poorly considered and implemented dsls using macros when simpler abstractions would have sufficed.
Most lisp docs will tell you to use macros only when necessary, because as great as they are, they have inherent issues that aren't fixed just by having a good macro system.
This is bad use of macros, or an ugly macro system.
>Most lisp docs will tell you to use macros only when necessary
One also declares variables when necessary, and one also creates arrays when necessary, etc. But imagine a programming language that doesn't support arrays. It would be a nightmare if you need to do certain scientific computations.
So in the same way, yes, not having a (proper, Lisp-like) macro system surely hurts a lot, once you realize how it makes certain problems become really easy.
And, by the way, one should write macros when necessary. In Lisp, we're using macros most of the time!
>Isn't loop essentially a macro (probably special op) that a lot of people hate?
And other Lisp programmers like the LOOP macro, since it can allow for very readable and concise code to do something simple that should stay simple to read.
Examples:
(loop for x from 1
for y = (* x 10)
while (< y 100)
do (print (* x 5))
collect y)
There are some legitimate reasons to dislike loop. It's a high level construct, yet it has unspecified behaviors. A program can be nonportable on account of some manner of using loop. It's been the case in the past that Lisp applications ended up carrying their own private loop implementation which would behave the same way everywhere.
It certainly does not look like it is one way or the other; it is something you have to know. If you don't know anything about loop, but have a belief that it is doing a Cartesian product, or a belief that it is not doing one, your belief has no rational basis either way. You can infer which one is right from the output.
loop doesn't do cross-producting; if you know that, there is no mistaking it. All the clauses specify iteration elements for one single loop.
"for x from 1" means starting at 1, in increments of 1.
"for y = expr" means that expr is evaluated on each iteration, and y takes on that value.
y could be incremented on its own, but then that example wouldn't show the "for var = expr" syntax, how one variable can depend on a combination of others.
The LOOP macro has actually seen a lot of work in designing and implementating it, including a syntax spec. Some people don't hate it because it is a DSL, but because it is a bit different from the regular Lisp macro, in that it requires more parsing and its clauses are not grouped by parentheses/s-expressions. Plus: understanding the relationship of the clauses is not really that simple, since there is some implicit grouping and dependencies.
This is what a LOOP macro looks like in code:
(loop for i from 0 below n
for v across vec
when (evenp n)
collect (* i v))
This is what a typical Lisp programmer would prefer:
(loop ((for i :from 0 :below n)
(for v :across vec))
(when (evenp n)
(collect (* i v))))
The clauses would be grouped in a list and each clause would be a list. The body would then use the usual Lisp syntax and the WHEN and COLLECT features would look similar to normal Lisp macros.
The LOOP macro is historically coming from Interlisp (70s), where it was a part of a certain language design trend called 'conversational programming'. The idea was to have some more natural language like programming constructs, combined with tools like spell checkers and automatic syntax repair (Do What I Mean, DWIM). From there this idea and the FOR macro was influencing the LOOP macro for Maclisp. The LOOP macro grew over time in capabilities and was then transferred to later Lisp dialects, like Common Lisp.
There are actually Lisp macros which are even more complicated to implement and even more powerful, but which create less resistance, since they are a bit better integrated in the usual Lisp language syntax. An example is the ITERATE macro: https://common-lisp.net/project/iterate/
Thus it is not the complexity or the functionality of the macro itself, but a particular style of macro and its implementation. I personally also prefer something like ITERATE, but LOOP works fine for me, too.
The advantage of something like ITERATE or even LOOP is that they are mostly on the developer level, and not fully the implementor level. A developer or group of developers can develop such a complex macro and can integrate it seamlessly into the language, making the language more powerful and allows us to reuse much of the knowledge/infrastrucure about/of the language.
Implementing and designing something like ITERATE and LOOP requires above the usual dev capabilities. Generally macros require some of that, since it makes it necessary that the dev can use or program on the syntactic meta-level. It's there where language constructs are reused, implemented and integrated.
Lisp docs will tell you to use macros only when necessary? They tell you to WRITE macros only when necessary. Since many Lisp dialects have a lot macros, you have to use them anyway. Most of the top-level definition constructs are already macros. If we use DEFUN to define a function, we already use macros.
In my experience actual Common Lisp uses a lot of macros. I also tend to write a fair number of macros.
But generally good macro programming style is slightly underdocumented, especially when we think of various qualities: robustness, good syntax, usefulness, readability, avoiding the obvious macro bugs, ...
Macros are very useful and I use them a lot, but at the same time one needs to put a bit more care/discipline into them and some help of the development environment is useful...
I wonder if in a parallel world we would have a library system similar to the one used in Node.js with npm, and many installable libraries would consist of a single macro.
Macros, at least in Lisp, made code even clearer to understand; because they let you create the constructs that make the problem domain map more directly, straightforwardly, easily, to the programming language.
Macros are a tool for creating abstractions. In the mind of the author, abstractions are always clearer. Others who have to work with those abstractions legitimately may or may not agree.
In the case of authors who think that abstractions are an unmitigated good, I seldom agree with their abstractions. And I don't care whether they are implemented as deep object hierarchies, macros, or functions with a lot of black magic. If you are unaware of the cognitive load for others that are inherent in your abstractions, then you are unlikely to find a good tradeoff between conciseness and how much of your mental state the maintainer has to understand to follow along.
The best macros are the macros you don't even know exist. Take a look around [0] where thing like if, def, pipeline (|>) and common "base" entities are defined simply as a construction of the Kernel special forms (the parts that won't be expanded by macros). Using the language you'd not know that they were macros because they're in the language of the problem, and do not leak their abstractions. A macro is just a tool and like other tools should be used when it's the right one to use, not because you contorted the problem to fit the tool.
>So they reduce line count, they reduce need for kludges or workarounds. They allow more straight code.
>But this is within the Land of Lisp, where writing a macro isn't something "advanced" nor "complex" nor "esoteric". In the Lisp world, writing a macro is 95% similar to writing a run-of-the-mill function.
I agree with you but I think there are two aspects to code readability: A/ is it clear what the code does (the intent) and B/ is it clear how it does it (the implementation).
I think macros can help massively with A (hiding redundant code and clunky constructs) at the cost of obfuscating B. The thing is that if you want to hack into the code at some point you'll need to understand B too.
To take a very simplistic example imagine that you're reading some CL code and see something like:
(let ((var 42))
(magicbox var (format t "side effect~%"))
(format t "~a~%" var))
Now for some reason you need to figure out what this does. Maybe instead of the 2nd "format" call there's a function you're currently working on and you want to know how it's called.
So, without running the code or looking at what "magicbox" does, if you assume that it's a function call, then you can expect that this code will print "side effect" when magicbox is called (since the parameter will be evaluated before the call) and then whenever it returns the 2nd format will display "42" since var is not modified between the let and there.
You run your code and you see that it only displays "magic" instead. Uh, What happened?
Well there you're lucky because there's only a single statement to consider, the magicbox invocation. You ask emacs to find the definition and you see:
(defmacro magicbox (var unused)
(list 'setq var "magic"))
Now you might say that it's a contrieved example and a bad use of macros and you'd be right, but I'm sure you could find real world example of macros with similar side effects that are well justified. Maybe it makes the code a whole lot nicer and maintainable. Still doesn't change the fact that it makes it harder to figure exactly what's going on. If this magicbox call was in the middle of dozens of statements it might take you a while to figure out why some variably seems to magically change its value.
> If this magicbox call was in the middle of dozens of statements it might take you a while to figure out why some variably seems to magically change its value.
Not really. Since the call explicitly names var you could easily narrow it down to that by going through all the places in the code where var is mentioned. If all the other expressions concerning var are non-destructive, you have your smoking gun: it must be magicbox. If var is mutated all over the place in that code then you have to work out whether it is magicbox or something else.
The thing that's nice about macros is that they happen at compile time. If you want to know what a macro is doing, just ask your editor or language implementation to expand the macro invocation. Then you will see exactly what code is being generated.
But when you define a function (or a class), you are also inventing your own little dialect/dsl. Operating overloading is another. They can all be used well or confusingly.
Where do you draw the line? At least with functions, there's only one syntactic structure that is user-definable, making it easy to distinguish foreground from background.
> But when you define a function (or a class), you are also inventing your own little dialect/dsl.
I don't understand what you mean by that. When I call a function in Common Lisp I have some garantees, I know it won't change the value of the parameters that I pass by value, I know it won't add or remove variables from the current scope, I know that the parameters to the function will be evaluated before the invocation. For macros anything goes, for all you know it could expand to a RETURN-FROM and exit your function prematuraly, it could implicitly modify a local variable etc...
> Operating overloading is another. They can all be used well or confusingly.
I completely agree and I consider that operator overloading for anything that's not "semantically" equivalent to the builtin operator behaviour is inherently evil. It makes sense to overload "*" and "+" on a matrix or vector type, it doesn't make a lot of sense to overload "<<" or ">>" to deal with input/output. But obviously nobody would be insane enough to do that.
When I read "a = b + c" in some C++ code I fully expect that "a" will be set to the value of a sum of "b" and "c", whatever makes sense for the types of "b" and "c". If adding "b" and "c" doesn't make any obvious sense semantically (adding database handles with strings) then it shouldn't be done.
I think that it's easier to draw the line and design sane guidelines for when you're allowed to implement overloaded operators. Macros have much broader use cases.
> inventing your own little dialect of the language which is opaque to any 3rd party reading your code unless they take the time to unravel your macros.
I meant that a whole bunch of specialzed functions can also be a dialect, which are opaque unless one takes the trouble to look into them. You can write very confusing code, unless it matches up closely to a domain (and even then, you need to be on top of the domain). Of course, the "unraveling" is easier, (and less literal) than for macros.
I was just trying to say it's a difference in degree not in kind - if you measure in terms of power and usability.
I take your point that it's not just syntax, but differences in (basic) guarantees, so there's less ability to treat it as a black box (information hiding within a module). But any module (e.g. a fn) can behave differently from how you expect.
tl;dr macros vs fn is a difference in degree, in terms of outcome (i.e. confusion).
The reason you know that a function won't modify locals is not because of limitations on what a function can do, but because of what has not been done locally: your lexical scope has not been captured in a lexical closure; or else, that closure has not escaped; or else, if it has escaped, its body doesn't contain any code which mutates variables.
The space of things that a function won't do is tiny compared to the space of what it may. You've taken a vast universe of possible behavior and eliminated from it a few possibilities like mutation of a local variable in the caller; that still leaves a vast space. Generally, to maintain code, at some point you have to grapple with understanding what it does more than with what it doesn't do.
> But when you define a function (or a class), you are also inventing your own little dialect/dsl.
No. When you define a function or a class, you define a new vocabulary (a new verb, a new name), but you don't change the grammar. Macros are a different beast because they can change the grammar.
I don't think that macros are really all that different from the alternatives in this respect. It's also hard to pick apart a pile of generated code, or an overly cute DSL that was implemented without macros.
If there's a sin, it's maybe that macros make it a bit too easy to (prematurely) optimize the conciseness of your code, and that becomes an enticement to play code golf.
> your own little dialect of the language which is opaque to any 3rd party reading your code unless they take the time to unravel your macros.
This exact same thing happens with functions, too. At some point, a third party will have to read the code to understand it. You don't know what a function does until you read its implementation either.
Often, having a domain specific language (even if the domain is very specific to the application) will make the application easier to understand.
If you write code in say, Go, then sure, a third party will be able to look at a loop and be able to figure out the mechanics of what it does. But that person will still not understand how the application works. They will still need to learn both the domain and the structure of the application. And with the right macros, the application will likely be smaller and better organized.
Avoiding abstraction doesn't magically make it easier to understand the solution to a complex problem.
Macros have their problems, but automated code generation is almost always worse.
In any case, I would like to point out that there are other answers to the problem than macros. For example, Haskell has lazy evaluation, currying and strong type system, and many use cases for macros can be implemented just as ordinary functions.
I believe all lispers know this, and don't abuse it. They'll craft a simply-functional metalayer and then drop a few macros for syntactic purposes on top. The whole ordeal being used only if it has more pros than cons. It's an experience based thing.
The crucial difference between TMPL in C++ and Rust macros is that TMPL is unavoidable because it provides so many features of the language, whereas Rust macros are only good for DSLs. For example, parametric polymorphism in C++ must be done via templates, but Rust just has generics, full-stop, and debugging those is no problem. I'm in agreement with the grandparent comment that praises Rust macros for having gross syntax; whether or not it was intentional, it discourages people from using them except in extreme circumstances, which in my mind is a feature. :)
The one thing in programming language design you can be sure of is that it's never time for Lisp.
(This is a joke, but it's also a serious point; people keep thinking of Lisp as the "king over the water" that will one day return and save them, which keeps people from looking at the reasons why Lisp never achieved mass adoption. Or arguing that Lisp cannot fail, it can only be failed.)
I do find myself wondering why, when I was paid to develop in Lisp for ~6 years (and PostScript at the same time, but that's another story) and really enjoyed using it and frequently find myself extolling the virtues of macros, loop or CLOS that I've never actually used it on a project since then (and I do quite a few side projects just to play with new languages).
Edit: I think you're downvotes are bit harsh and as a Scot I appreciate the "king over the water" reference - people get all romantic about the Jacobites without thinking too much about what their policies might have been.
Lisp failing mass adoption ? Common Lisp had a large amount of adoption. Emacs lisp has a really big install base . Clojure is pretty popular also . Do you mean wide adoption ?
The author of the blog, Martin Sustrik, is the original author of ZeroMQ. His late colleague in the project, Pieter Hintjens, extolled the virtue of effectively macro based code generation, especially using a "macro language" GSL:
In some alternative universe Lisp, Smalltalk and Mesa/Cedar would have won the love of mainstream computing, instead of us slowly catching up with the past. Oh well.
If the language ast and syntax aren't iconomorphic then you add a learning curve and need to insulate users from compiler internal changesy quasy quoting doesn't work everywhere.
In a compiled languange first class macros also add significant compile time.
A couple languages have them like nim's macros , haskell's TemplateHaskell extension and Rusts compiler plugins, though.
The author is absolutely right that theres's a ton of data from languages/open source projects that if analyzed could prove massively useful.
But I'm not sure that the metrics in the referenced article have much to do with language design. Most of what they are measuring (their conclusion says as much) are people copy pasting files and entire projects. JS by their stats is the worst about this but JS is a language that reduces duplication more than Java (also measured with less copying).
Seems to be a social or dev skill issue rather than language design. Honestly most languages have an excellent tool for reuse - the function/method, which isn't used enough to remove all duplication as is.
We presented an exhaustive investigation of code cloning in GitHub for four of the most popular object-oriented languages: Java, C++, Python and JavaScript. The amount of file-level duplication is staggering in the four language ecosystems, with the extreme case of JavaScript, where only 6% of the files are original, and the rest are copies of those. The Java ecosystem has the least amount of duplication. These results stand even when ignoring very small files. When delving deeper into the data we observed the presence of files from popular libraries that were copy-included in a large number projects. We also detected cases of reappropriation of entire projects, where developers take over a project without changes. There seemed to be several reasons for this, from abandoned projects, to slightly abusive uses of GitHub in educational contexts. Finally, we studied the JavaScript ecosystem, which turns out to be dominated by Node libraries that are committed to the applications’ repositories.
Reading the conclusion again, they're capturing the existence of node_modules/, which has nothing to do with language design.
Because the JavaScript sample was so heavily (78 out of 80) dominated by Node packages, wehave performed the same analysis again, this time excluding the Node files. This uncovered jQueryin its various versions and parts accounting for more than half of the sample (43), followed froma distance by other popular frameworks such as Twitter Bootstrap (12), Angular (7), reveal (4).Language tools such as modernizr, prettify, HTML5Shiv and others were present. We attribute thisgreater diversity to the fact that to keep connections small, many libraries are distributed as asingle file. It is also a testament to the popularity of jQuery which still managed to occupy half ofthe list.
In visual studio in the late 2000s, code snippets were all the rage, a new feature to allow you to write boiler plate quickly.
Since .Net 3.5 I barely use them, I really only use the built in ones for foreach and property accessors. The addition of lambdas and short-hand property accessors vastly reduced the boiler plate code in C#. I imagine the addition of async/await has also helped a lot with other certain types of code.
My round-about point is that existing languages can improve to reduce code duplication.
I find (given enough time for it that is), more and more, that if code is duplicated or bad or both, that I can rewrite it into some (unexisting) DSL that makes it elegant. Then when trying to rewrite it to the language at hand, it becomes (a lot more) inelegant again. So is low hanging fruit there an Alan Kay strategy of writing DSLs for specific domains (which would put languages like Lisp, Ocaml or Haskell in a good place) or should language designers find ways to make the language itself better for these kind of problems (and is that possible, for instance; advanced types OR elegant macros?).
The other comment about Django (which would also work for Rails); a lot of times, big enough libraries/frameworks written in a language, are actually DSLs which would suggest we lean toward the 'Alan Kay' 'make everything a DSL' solution. The mathematician/CS person in me wants the 'improve the language' solution. But low hanging fruit for language design I don't see here.
The thing about DSL’s (or really any programming language abstraction) is that they make it easier to understand the intent of the code at the expense of making it harder to understand the effect of the code. I find that the cognitive load of maintaining a project with lots of DSL’s is high. The advantage of e.g. java over scala is that while java code is often more verbose and ugly, it holds fewer surprises. Like anything there is a trade-off here, adding more DSL’s can be better or worse.
> find that the cognitive load of maintaining a project with lots of DSL’s is high
Yes, that is currently why I design that kind of scenario in a (possibly blue sky) DSL 'on paper' (if there isn't one handy) and then translate that back to libraries and structures in the language i'm working in. I just notice that this becomes annoying sometimes if the existing solutions really do not match the DSL I envisioned.
In the best case, the DSL or abstraction doesn't leak and the effect of the code matches the intent so that you don't need to peek under the hood to make sure it's doing what you want/think. In that (rare) case, cognitive load remains low because you can work close to the problem domain and stay there without worrying about the specifics of the underlying code.
> java code is often more verbose and ugly, it holds fewer surprises
Last I checked, things like AoP, reflection, Spring/Hibernate contexts, and other magic metaprogramming and annotation shenanigans were commonplace. At least with macros, you can usually just expand them and read what your code is doing.
I think the continuum is: boilerplate, library, standard library, language.
Boilerplate is the most flexible. Language is most unifying. Things tend to move towards language, but if you move too fast, you are stuck with bad decisions, because changing a language is near impossible.
> ...writing DSLs ... or make the language itself better
Emphatically: YES.
:-)
To clarify, hopefully, just a little bit: the DSL approach certainly appears the most promising so far. But there are a few issues. One is that language creation, maintenance, interoperability and comprehension become critical issues very quickly, even with amazing tools. (One could argue that amazing tools make the problem worse, because they hasten proliferation).
Then you notice that even some apparently pretty radical DSLs actually have quite a bit in common. And the things that separate them are just variants of a few simple architectural elements (dataflow, constraints, storage, data-definition).
You also remember from Guy Steele[1] that languages should not be "a thing", but rather a "pattern for growth", and from natural languages that we don't really invent entire new languages for specific domains, at most we add some vocabulary/jargon. So DSLs should not be a thing, what should be a thing is a language that allows us to build APIs that have the benefits we want from DSLs. And so while Lisp Macros certainly point the way, I don't think they're the answer. Smalltalk's keyword syntax is closer, despite or maybe because it is less powerful[2], and Grace's extension of keyword syntax gets us even closer.
So we need a language that allows us to build what we would consider DSLs as APIs. For that, it will also need to abandon call/return as the dominant/only generalized abstraction mechanism[3]
I'm really interested in seeing the evolution of Kotlin with regard to its use in building DSLs.
I saw a talk by Christina Lee and Huyen Tue Dao that builds one on the type system (lambda extension fimctions) that feels very intuitive.
https://www.youtube.com/watch?v=OmwjrVawHqA
A nice complement to this is Andrej Karpathy's Software 2.0 essay. Namely that it is getting much easier to collect the data. Than it ever will be to write the complex programs which can anticipate every case.
> In Java, 70% of duplicates were due to code generation
I don't think, in Java's case, that fixing this is low hanging fruit. Java's more reliant on code generation than comparable languages for a reason. Most the codegen I've encountered is stuff that would be hard to implement in a more elegant way due to limitations of the platform's runtime environment (such as type erasure) that make it hard to handle a lot of the dumb verbose mapping work and stuff like that at run time.
Someone else mentioned Clojure. I haven't tried Clojure yet, but I'm willing to believe it - dynamic typing seems like it could easily be a secret weapon on the JVM, since the platform is so halfhearted at static typing. But I think that teams that are committed to using a static, infix, non-S-expression language may be painted into a bit of a corner here.
A lot of our duplication in our Java-projects is due to Java classes generated from wsdls or xsds or similar. I quite like it, actually. Can generate it from a known xsd and version it as it's own module.
How could a language solve this? I know F# can compile some stuff using things online, but this means the build is suddenly not reproducible. And we could always only generate that code on the fly when compiling, but that always wreak havoc on some tooling or IDEs when the code you're referencing isn't there until compile-time.
Something like F#'s type compilers would work on the JVM. You can make the build reproducible by just making sure that the inputs the type compiler uses are consistent. Practically speaking, the biggest difference between that approach and more traditional code gen is that you don't have to add a whole bunch of build steps and you don't have to have team arguments over whether generated code gets checked into source control or not. It's actually great for the IDE, which does know all that type information at development time because it can invoke the type compiler as a service. F# is the only language I've seen where you can write inline SQL queries and get a red squiggly if you mistype a field's name. (IntelliJ comes close, but, as far as I've been able to find, it only works if you stick all your SQL queries in resource files.)
The other approach is to just handle these sorts of things at run time, using the richer run-time information that your code has access to. So, e.g., if you're binding an XML file to a List<Record<Employee>>, in C# it's possible to actually express something like List<Record<Employee>>.class. The mapping function can just look at that and figure out how to map data dynamically. You, the developer, then get to own your own domain objects instead of having to rely on ones that are auto-generated by some code generation process. You don't have to accumulate a bunch of XML files and XML schema files and associated obnoxious-to-maintain clutter. And you can do it without resorting to a bunch of custom weirdness like Jackson's TypeReference class.
Duplicating my comment from the other discussion of this article:
> In Java, 70% of duplicates were due to code generation. For C++ is was 18%, for Python 85%, for JavaScript 70%.
C++ is the real outlier there. That could be because C++ code is much harder to generate, but I don't buy that it's that much harder than Java. Or it could be because C++ templates aren't considered "code generation". Or it could be because C++ doesn't get used for projects with that much boilerplate code. Or...
A C++ project that features generated code will probably run the generator during the build process - and possibly build it too, if it's written in C++ itself - and so the generated code might well not end up in the repo.
Duplicating code is acceptable, IMHO, if it is either clear the two files are going to diverge quickly, or if it looks like they will not be changed at all in the foreseeable future.
Trouble starts if there are two source files that do almost the same thing but slightly different, and then you need to change them; now you probably need to change these duplicate files in lock step, and that is a rather error-prone process. IIRC, this kind of problem was the reason for adding templates to C++. And say what you will about C++, but templates are a very powerful tool.
I fail to see the connection to language design, though. Is the author saying that one should add some feature to make duplication as unnecessary as possible (like templates in C++)? Or that the tooling around the language should be better suited to automatic code generation?
FWIW, I think code generation is a very powerful tool; in a way, code generation is meta-programming. (Is there a distinction at all?) And I think that there is a lot of potential in this area. Go has supported the "go generate" command for a while now, and I have seen a few very interesting use cases (e.g. ffjson, which generates code to serialize/parse Go data types to and from JSON more efficiently than the builtin reflection-bases mechanism).
Someone once said (I forgot who), "I'd rather write programs that write programs than write programs." That sums it up pretty well, I think. ;-) Okay, okay, so now I do see the connection to language design. Sorry, my fingers were faster than my mind this time.
C# is really bad about this. They seem to be heavy into the Magical Boilerplate coding style in which your "empty" project consists of thousands of lines of duplicated/generated code which you then make slight edits to. My opinion is that if some piece of code is so common that it needs to be inserted into every project - then it should be a library call. Apparently that's a minority opinion though..
This isn't my experience with C# (an empty project starts with a `Class1.cs` file if it's a library or `Program.cs` if it's an application), so you might be using something on top of C# that needs all of that boilerplate.
Sounds like project templates in Visual Studio - which can populate your project with vast amounts of stuff. This can good or bad depending on the context.
No. There are two files, Program.cs and Startup.cs. Program.cs is 26 lines long, Startup.cs is 35 lines long. Project file is 15 lines long.
Of course, there is whole web server Kestrel referenced somewhere, using NuGet and DLLs full of MSIL. Formulating "depending on compiled library code" like "contains _ lines of code" is, especially in the context of the conversation, arguably dishonest.
No. Just looked, and the only large file generated is the dependency cache (~7000 lines of json) which you never touch and can always be regenerated with dotnet restore.
The rest is ~100-150 lines code+configuration and a readme file (~180 lines).
Look- I get that a lot of this stuff (jquery..) is libraries, and some of it is the visual studio solutions file, but if it's libraries, then why was it copied into my project directory?
That's not an empty .net core project; that's a sample project that you can run and see a whole website which itself is documentation on creating .net core site.
However, I'm not sure Visual Studio gives you any way to avoid this when you use the GUI to start a new core project. It is however possible to create a non-Core ASP.NET project with literally nothing in it.
project.assets.json (and everything in /obj) is a build artifact, not source code. Would you include .o files in the weight of a sample C project ?
applicationhost.config (and everything in .vs) is local configuration for your editor (Visual Studio 15, it appears), not source code. Would you include .emacs.d in the weight of a sample project ?
Every folder in /wwwroot/lib/ that contains a .bower.json file is a local package cache, not source code. You don't have to commit it, just run bower when building your solution (if it's not done automatically for you) to restore them. If those were .dll or .pdb files instead of .min.js and .map, would you count them ?
The remaining two hundred lines of code are the contents of the "Sample" project, written to illustrate ASP.NET Core.
I never built the project. So I don't see how any of this stuff can be build artifacts. There should definitely not be copies of any "package caches" in my project folder. Why would I want a separate copy of jquery for every new project? It all shows up as part of the project when in fact that stuff is a logically separate library. Only code that is unique to my project should be in the project folder. Nor should editor config appear in the project folder. Including Jquery should be just one line of source code:
Using jquery;
I really don't care why VS put that stuff in my project- but as far as I'm concerned, if it's code of any kind, and it's in my project, and I didn't write it, then it's boilerplate. If it looks like a duck, and it quacks like a duck, then it's a duck.
I mean if I'm introduced to some other programmers project I now have to wade through this rats nest of package caches and auto generated classes and other trash trying to figure out which little bits are actually unique to the project itself. I really shouldn't have to do that.
There's a lot more to a programming language than the ability to avoid duplicated code, and making the language more expressive is not always better. The flexibility of the C++ preprocessor makes tooling (editor support, static analysis, incremental builds, etc) much more difficult to write, and as a result, state-of-the-art C++ tooling has a lot of disadvantages compared with (say) Java tooling.
I would argue that many of the large extensions, plug-ins, and packages are already their own programming languages. For example you don't just "learn Python", you can "learn Django".
There is an interesting related paper "Copy and Paste Redeemed" [1] that is based on the assumption that a lot of copy/paste happens (and discusses why this is not necessarily a bad thing). [1] investigates how copy/paste can be auto-detected, how similar pieces of code
can automatically be
merged together, and how abstractions can be automatically created from copy/paste code.
I have not used the tool myself, so I cannot comment on how well it works in practise. But I found the idea intriguing.
If you don't check in the generated code, that means that you have to run the generator every time you build (or at least every time you get a clean copy of the tree and build). For code that changes very rarely, that may not be a net win.
Also, if you don't check in the generated code, and then you upgrade to the latest version of the generator, surprising things can happen (or even if different people have different versions of the generator installed).
So there can be cases where checking in generated code can be at least a reasonable thing to do.
The pain of implementing an ORM and the generated code between OO and Relational Databases might be low hanging in it's explanation, but writing a language that does this is probably a Hard Problem. I wouldn't call that low hanging, but I might be really happy to use such a language. (https://blog.codinghorror.com/object-relational-mapping-is-t...)
I was trying to express an inchoate thought the other day, something to the effect that, "total volume of code in the world should be shrinking about now". I think this article could be interpreted as pointing to that.
I imagine a Grand Refactoring... The "great compression" of '23, or something...
Repetition (like in DRY) is duplication without reuse. When I think of duplication in the negative sense, I think of repetition, not reusing a module or library. Another negative thing with duplication is taking up more disk space, but a good file system will take care of that! Then there is "DLL hell", or "library hell" where two programs use the same library, but different versions ... How do you reuse without duplication ?
I'm a lisper at night so I tend to have a heavy use of macros as well as rather abstract patterns you can use in any language. This works for me because
a) I'm the only maintainer of my own side projects (most of the time I never finish nor publish them ... including a cool wysiwyg mouse-driven Clojure POC debugger)
b) I tend to perform multiple full rewrites of my projects (up to 4 times).
c) My code tends to get obscure very quickly since I enjoy giving it an "ontological" twist, and eventually this leads to less code but the code gets less understandable from the static perspective of the source file it sits in, so to get a good grasp of what it does, one needs to run the code and observe what it does when it is evaluated (at macro-expansion time or run-time): this is why I semi-successfully attempted to write the debugger mentioned above.
Meanwhile at work, I have not performed a full rewrite yet (understandably so) and the goal is to keep the code as flat and linear as possible so that anyone in the team can grasp what it does in one glance. Obviously this leads to a lot of repetition, but this is for the greater good.
Currently I'm working on improving my dev experience with Clojure from two angles:
1) Saner macros : I've been tweaking Clojure's reader so that code generated with the backquote reader macro gets printed in its original form when using pprint. For instance `(a b c) expands to (my-ns/a my-ns/b my-ns/c) and becomes `(a b c) again when printed with pprint. I'm also thinking about expanding macros in temporary files in order to get sane stack traces for code generated by macros, but this is a surgically more complex thing to do.
2) "Macros" that expand and persist in the very file they are written in. This allows for in-file debugging and should address point c) from above. Example:
At first the content of your file looks like:
(debug
(+ 1 2))
When you evaluate the file, it turns into (i.e. the file gets rewritten as):
(debug
(+ 1 2)
;;= 3
)
Since the debugging/inspecting of how the code behaves at runtime gets persisted in the file along with the code itself, it should allow for a broader and more direct understanding of what the code does.
Since this can also be used as a language/library level snippet system, I've also been considering using these in-file persisting macros as a templating engine for code. In particular, if it is augmented with a conflict management system à la git, one should be able to have flat/linear yet automatically generated code and still be able to overwrite what's in a code template expansion without losing the benefits of automatic code generation.
I highly recommend you read about composability, various levels of abstraction and how it all plays together. A good programming language should strive for clean interfaces and expose relatively lower level methods that can be composed in different ways to build more complex logic. E.g. Ruby/Python exposing the methods they are exposing now, gems/packages use it to provide a higher level construct, django/rails combines these gems/packages to expose even higher level constructs.
So there is no point in including the most common auto-generated code in the programming language itself. Because the tool that generated it can keep evolving independent of the language and actually uses the lower level constructs exposed by language. By including the logic within programming language we'll just bloat the language and loose out on the beauty of composability.