Hacker News new | past | comments | ask | show | jobs | submit login
It's not what programming languages do, it's what they shepherd you to (nibblestew.blogspot.com)
493 points by ingve on March 27, 2020 | hide | past | favorite | 339 comments



This is what people mean when they say Haskell is "opinionated."

Haskell shepherds you into separating out IO code from library code to such an extent that literally any function that has an IO action taints the value returned from that function, causing it to be an IO value, and trying to pass that IO value into another function makes the return type of that function IO, too. Parametric polymorphism is the default, too, so it also shepherds you into writing general purpose code. Haskell is full of these little decisions where it just won't let you do something because it's not "correct" code, and they kind of don't care if that makes coding in it a fight against the compiler.

Rust took that philosophy and applied it pointers. Every value has a lifetime and an ownership which makes it quite hard to do things that aren't memory safe.

Both Rust and Haskell wrap values that can fail in little boxes, and to get them out you have to check which type of value it is, and in C# there's nothing stopping you from returning null and not telling anyone that you can return null, and just assuming people will check for null all the time. Haskell has a philosophy of "make invalid code unrepresentable." The concept of a value being in a box, rather than null being a possible value makes it impossible to use that value without getting it out.

People who write Go love that concurrency is easy and Go fmt has enforced a single canonical style. Building these sorts of things into the language goes a long way in getting them adopted and becoming the norm.

I think we saw a rise of the easy, anything-goes, screw-performance scripting languages. I think the next fashion seems to be in enforcing "correct" coding style. They all have their place.


I'm very much in agreement with your assessment here. If you rely on programmers to "do the right thing", some people will break those rules, and systems will suffer varying levels of quality decay as a result. Language level enforcement of key concepts prevents--or at least makes it harder--for folks to make bad decisions in whatever areas those concepts apply. Clojure is another good example where they've provided concurrency primitives that allow you to avoid all the major pitfalls you typically see with multithreaded Java programs. As such, most Clojure code does concurrency the "right way", and in 10+ years of using it, I've never seen a deadlock.


>I think we saw a rise of the easy, anything-goes, screw-performance scripting languages. I think the next fashion seems to be in enforcing "correct" coding style. They all have their place.

Having freedom to do what you want is great. Even if you shoot yourself in the foot, you learn your lesson and become a better developer. But as you work with increasing numbers of people, many making the same mistakes you have made, and especially as you end up having to fix their mistakes, you begin to look for a tool that takes away their ability to shoot themselves in the foot. That was one appeal of Rust when I was learning it. It is a pain to fight the compiler over memory, especially coming from a garbage collected background, but it both protected me from myself and protected me from others. At a certain point, at least on large enough group projects, the benefits of that protection outweigh the costs.


   > "It is a pain to fight the compiler over memory, especially coming from a garbage collected background"
What I don't understand: why don't people stick with garbage collected languages whenever possible?


Lots of reasons.

- You don't want to spend time tuning your GC.

- Response Latency REALLY matters

- Response Throughput REALLY matters

- Memory footprint REALLY matters

- Application + runtime footprint matters

- Memory isn't the only resource you need to manage

- Cost matters

I get that you said "whenever possible" but figured I'd list reasons for "not possible" because I think they have a lot of tie in.

In particular. I could imagine that cost is going to be a driving factor in the future for people to want languages like Rust. They want the absolute fastest app runtime with the lowest resource footprint because AWS charges you for the more memory and CPU time you consume. In that case, the most economical thing to do is favoring the fastest to start and run languages available with the smallest amount of resources.

You might be able to technically do the same job with python, but if you can reduce your operation cost by 10x by switching (or starting) with a slimmer languages, why not?


Because GC addresses only one type of resource: memory, but there exist many other types of resources, and handling them correctly is PITA in most GCed languages.


> in C# there's nothing stopping you from returning null and not telling anyone that you can return null, and just assuming people will check for null all the time

You mean there was nothing?

https://docs.microsoft.com/en-us/dotnet/csharp/nullable-refe...


Doesn't that break a lot of legacy code?

Also, it is certainly an improvement, but having `Foo?` as a type is still less explicit than having `Maybe<Foo>` as a type. If you miss the question mark, you can still have null pointer exceptions.


> Also, it is certainly an improvement, but having `Foo?` as a type is still less explicit than having `Maybe<Foo>` as a type. If you miss the question mark, you can still have null pointer exceptions.

A perfect example of Stroustrup's Rule.

  * For new features, people insist on LOUD explicit syntax.
  * For established features, people want terse notation.
A question mark is concise, but it's just as explicit. The risk of people glossing over it isn't much worse, and avoiding tons of repeated keywords has benefits to comprehension.


The only problem I have is the new syntax has trained me to expect a ? when null is possible, and null is not allowed if ? is missing, but in older libraries not yet updated to C# 8 Nullable Types, not having a ? means null is allowed. So reading the syntax in my IDE is a lot harder as sometimes an un-annotated type means null is allowed and sometimes not. I wish in cross-over projects there was the option to use ! to say null shouldn’t be allowed, thus visually and temporarily distinguishing new from old code, and to that end, the ! could be inserted as an overlay by my IDE, I suppose... Maybe I should try to write an IDE plug-in for this, or have a look for one.


> `Foo?` as a type is still less explicit than having `Maybe<Foo>` as a type.

I actually disagree with this. As long as `Foo?` is checked by the compiler, I think are almost identical in use. It doesn't matter if you don't notice the `?` if it's a compile error to miss the null check.


It's the same if your Maybe<Foo>=Just<Foo>|Nothing, and in fact in that case I often prefer the nullable version, unless there's a dedicated, terse syntax for Maybe-checking built in to the language, the equivalent of null coalescing (Kotlin's ?: (Elvis operator), Typescript's ??), along with optional chaining calls (?.).

If you made a Maybe with a Nothing<String> that comes with a reason why nothing was returned, or any more complicated structure like that, then it's better[1] to use that instead of approximating it with exceptions, null, callbacks with an optional error argument, etc.

[1] In most cases. There's always exceptions, no pun intended.


My view is that `foo?` is a nice sugar for the common Option/Maybe/null case, but that a language is severely missing out if it doesn't also offer general sum types. I don't understand why more languages don't offer them, it seems like it'd be a fairly easy feature to add without breaking backwards compatibility.


Rich Hickey's talk "Maybe Not" has some interesting thoughts on this: https://youtu.be/YR5WdGrpoug


I spent three years making iOS programs in Swift, which uses Foo! and Foo?, and never, NEVER, missed question or exclamation mark. Even if I did, the compiler would complain.


It does break legacy code so you need to opt-in per file or per project. It also doesn't fix legacy code automatically. Any of your dependencies that haven't yet opted in (and thus added the right annotations to their assemblies) to strict null checking are assumed `Foo?` (as they've always been) and may still give NullReferenceExceptions.

Almost all of the BCL (base class library; the system libraries) has been annotated at this point, but there will be plenty of libraries that are not yet on NuGet still.

If you miss the question mark and you've opted in to strict null checks you won't get a null pointer exception, you'll get a compiler error when trying to assign null. (That's why you have to opt-in: it makes `Foo` without the question mark non-nullable.)


I recall `Nullable<Foo>` is equally valid c# if we want a verbose syntax.


yes it does, but you enable it with a flag, per file or per project. Converting an existing solution can make you find a lot of bug...


I hope that this will be wildly accepted and not end like many other good things with "you know, our code currently 'works' .. why would we invest so much effort to make the compiler happy?"


Microsoft is currently updating a lot of code to add this feature, in the end the community will pressure the libraries author so they do the same. I guess that in a few year all popular libs will use the nullable feature.


See also: adoption of TypeScript in the JavaScript community. There is pressure from the ts community for libraries to either be created in ts or for popular libraries to adopt it, and it's become the way a plurality if js devs write js very quickly.


Or it may introduce new ones? This is seriously the most confusing feature I ever saw in C#.

I'll try it on a new project, but converting my existing code that is working fine, no way.


Haskell is not opinionated. All in all, it's probably easier (but misses much of the point of Haskell) to just write "IO" and "do/return" on every function in your program than to use IO in a disciplined way. Haskell even supports this with special do-syntax (and the fortuitously-named "return") to make monadic code look more imperative!

Paying that IO/do/return syntax tax (sin-tax? syn-tax?) is still cheaper than the signature/return boilerplate in competing compiled languages like C and Java. Haskell invites you avoud that syn-tax by writing principled IO.

One of the major complains of Haskell is that it is so expressive and powerful that there are so many incompatible ways of architecting modules. (see the incompatibilities in implementations of Monad Transformers / Effects, Lens, etc)

Rails is perhaps the original "opionated" system. https://guides.rubyonrails.org/getting_started.html


I wonder how much of an uptick Adacore has seen with people using Ada and especially Spark in projects lately. Ada has a different niche than Haskell and Rust, but they're obsessed with software quality and provability. I've only played with Ada, but really liked the code that came out of it. If only they could make strings less painful to deal with.


I'd be very curious if, in the same way that "TypeScript is a superset of JavaScript", there could be a superset of TypeScript that encouraged one to annotate when they're performing IO operations (reading from/writing to the DOM, the network, workers, storage, etc) and if you consumed a function that had said annotation it would further encourage you to annotate that function as well. Something like:

    function writeStringToLocalStorage(key: string, value: string): void, *localStorage {
      // impl
    }

    function persistUsername(username: string): void {
      // ...
      writeStringToLocalStorage('username', username)
      // ...
    }
^ compiler complains that persistUsername writes to localStorage but does not have the *localStorage IO annotation. While I feel like TypeScript does a great job of working with arguments and return values there's still a whole class of issues that can crop up from things like unexpected DOM manipulation that still would be useful in detecting.


> Haskell is full of these little decisions where it just won't let you do something because it's not "correct" code, and they kind of don't care if that makes coding in it a fight against the compiler.

They care more about predicability and compositionality than about a novice's struggles. Professional programmers should prioritise those things. Certainly you can not care about those things for personal projects.

That said, Haskell could of course use plenty of ergonomic improvements, but the ones you describe are not among them.


> I think we saw a rise of the easy, anything-goes, screw-performance scripting languages. I think the next fashion seems to be in enforcing "correct" coding style. They all have their place.

This is the sane way of looking at it.

We noticed that a lot of tasks were not worth the trouble of ensuring correctness, and so dynamic languages too over.

But then a lot of system scaled to a point complexity was hard to manage. And those system had huge economic impact. Which made perf and correctness valuable again, especially since they had a lower price of entry.

I do a lot of Python, usually with dynamic types and mixing IO everywhere. It works surprising well for a ton of cases, and can scale quite far. But I recently wanted to make a system that would provide a plugin system that included a scriptable scenario of input collection that would be chained up to a rendering of some sort. This required to disconnect completely the logic of the scenario - controlled by the 3rd party dev writing it - from the source of the input, and to make the API contract very strict.

It was quite a pleasant experience, seing that Python was capable of all that. Type hints work well now, and coroutines are exactly made for that use case. You can make your entire lib Sans I/O using coroutines as contracts. The automatic state saving and step by step execution is not inelegant.

But you can feel that it's been added on top of the original language design. It's not seamless. It's not shepherding you at all, you need to have discipline, and a deep understanding of the concepts. Which is the opposite of how it feels for other more core features of Python: well integrated, inviting you to do the right thing.

I'm hoping the technology will advance enough so that we eventually get one language that can navigate both side of the spectrum. Giving you the ease of Python/Ruby scripting, data analysis, and agility for medium projects, but letting you transition to Haskell/Rust safety nets and perfs with strict typing + memory safety without GC + good I/O hygiene in a progressive way. Something that can scale up, and scale down.

Right now we always have to choose. I've looked at swift, go, v, zig, nim, various lisps and jvm/.net based products. They always have those sweet spots, but also those blind spots. Which of course people loving the language often don't see (I know some people reading this comment will want to shim in their favorite as candidate, don't bother).

Now you could argue that we can't have it all: choose the right tool for the right job. But I disagree. I think we will eventually have it all. IT is a young field, and we are just at the beginning of what we can do.

Maybe as a transition, we will have a low level runtime language that also include a high level language runtime. Like a rust platform with a python implementation, or what v-lang does with v-script. It won't be the perfect solution, but I'd certainly use something like that.


> a lot of tasks were not worth the trouble of ensuring correctness, and so dynamic languages too over.

I'm not sure about this ... to me it seems the dominant effects were firstly that Javascript is the only language allowed in browsers, and secondarily that nobody had really cracked the usability issues to give us a language that had type-safety and no explicit precompile phase and easy integration with the webserver.

(It doesn't help that a lot of experienced developers are either actively hostile to the concept of developer-usability, or think that their own idiosyncratic habits are the definition of usable and cannot be improved!)

People instead moved their correctness work to unit-testing.


Javascript is only dominant in the browser.

Other scripting languages (python, bash, ruby, php) dominate other parts of automation.


>I'm hoping the technology will advance enough so that we eventually get one language that can navigate both side of the spectrum. Giving you the ease of Python/Ruby scripting, data analysis, and agility for medium projects, but letting you transition to Haskell/Rust safety nets and perfs with strict typing + memory safety without GC + good I/O hygiene in a progressive way. Something that can scale up, and scale down.

[...]

>Now you could argue that we can't have it all: choose the right tool for the right job. But I disagree. I think we will eventually have it all.

The idea of One Language that covers everything (or even most scenarios) is a seductive goal but it's not mathematically possible to create because the finite characters we use to design a language's syntax forces us to make certain concepts more inconvenient than others. This inevitably leads to multiple languages that emphasize different techniques. Previous comments about why this is unavoidable:

https://news.ycombinator.com/item?id=15483141

https://news.ycombinator.com/item?id=19974887

(One could argue that the lowest level of binary 0s and 1s is already the "One Programming Language For Everything" because it's the ancestor of all subsequent languages but that's just an academic distinction. Working in pure 0s and 1s is not a realistic language for working programmers and they'd inevitably find the syntax too inconvenient and thus invent new languages on top of it such as assembly code, Lisp, etc.)


> not mathematically possible to create because the finite characters we use to design a language's syntax forces us to make certain concepts more inconvenient than others

There is a huge number of combinations possible, especially once keywords come into play. I don't think that's the limitation.

On big limitation is that people that are good at creating dynamic languages are bad at creating strict ones, and vice versa.

You can see in a comment bellow than some people talk about swift, Raiky, kotlin like solutions to this problem (as I mentioned in my post it would happen). But of course, they don't have the I/O solution Haskell has, the borrow checker rust has, nor the agility or signal/noise ratio of Python. They have a compromise. A compromise that can be good. Those languages are well designed. But it's not "the ultimate solution", because the don't navigate any end of the spectrum.


">There is a huge number of combinations possible, especially once keywords come into play. I don't think that's the limitation."

The mathematical limitation still remains even if you switch from 1-character symbols like '+' to verbose words like "plus()". You can attempt to invent a new programming language using longer keywords but you'll still run into contradictions of expressing concepts because there's always an implicit runtime assumption behind any syntax that hides the raw manipulation of 0s and 1s. If you didn't hide such assumptions (i.e. "abstractions"), then that means the 0s & 1s and NAND gates would be explicitly visible to the programmer at the syntax layer and that's unusable for non-trivial programs.

There's a reason why no credible computer science paper[0] has claimed to have invented the One Programming Language that can cover the full spectrum of tasks for all situations. It's not because computer scientists with PhDs over the last 50 years are collectively dumb. It's because it's not mathematically possible.

[0] e.g. one can search academic archives: https://arxiv.org/archive/cs


It would be like saying you can't have high level languages because assembly have only a few limited combinations of registers. You can always abstract things away.

In fact, you are making the assumption that there is no more generalist logical concepts we can discover that can abstract or merge what now appears to be conflicting paradigms.

I imagine people said that to Argand. Nah, you can't do that sqrt(-1) thing, you'll run into contradictions.

Given that we have been at the task for less than a century, which is like 1 minute of human history, I'm not inclined to such categorical rejection of the possibility.


Unlambda makes everything equally difficult!


I believe this is a failure of imagination. I'm not a beginner programmer. I've played with a lot of different languages and read about a lot more, and I still have a strong gut feeling that they can be unified if we just figure out the right framework. Note that I, for one, would consider a language that shepherds you towards highly interoperable DSLs to be (close to) a success; something like Racket that could generate efficient native binaries would be really close...

I don't believe for a second that syntax is the obstacle. You can express all the complexity you need with composition. Also, the One True Language will obviously have macros.


> I, for one, would consider a language that shepherds you towards highly interoperable DSLs [...] Also, the One True Language will obviously have macros.

But that just motivates someone else who isn't you to prefer another language that has that "DSL" and macros as the baseline syntax of the new language for convenience. Now you have 2+ languages again.

If someone else prefers not to type out extra parentheses ")))))" to balance things and/or requires highest performance of no GC, then a "Racket/Lisp-like" language can't be the basis of The One True Language.

>You can express all the complexity you need with composition.

True, and to generalize further, a Turing Complete language can be used to create another Turing Complete language. But the ability to build any complexity by composition is itself the motivation to create another programming language that doesn't require the extra work of composition.

For example, one can program in the C Language and combine its primitives to build the "C++ Language" (first C++ compiler was C-with-classes) and the C++ Language can be used to build Javascript interpreter (Netscape browser was written in C++). And then Javascript can be used to build the first Typescript compiler. Thus we might say (via tortured logic) that C Language can let you write a "DSL" as complicated as C++ and Javascript and Typescript. Even though that's true in sense, people don't think of "C Language" as the One True Language. It's the same situation of not thinking of low-level 0s and 1s of NAND gates as being the One True Language even though composition of NAND gates will let you build any other language.


> But that just motivates someone else who isn't you to prefer another language

Sure, they'll want to, and they probably will, but they won't have to due to performance or other constraints, which is how I understood the goal.

> But the ability to build any complexity by composition is itself the motivation to create another programming language that doesn't require the extra work of composition.

That's what the macros are for. All part of the plan.

> ... Even though that's true in sense, people don't think of "C Language" as the One True Language.

C is not certainly not a convenient language for hosting DSLs, due to insufficient abstraction capabilities, but the real missing ingredient is interop between the DSLs. C doesn't make it easy to pass data between them, etc.

NAND gates are not a great comparison. You want to be composing abstractions to create other composable abstractions. You could extend the analogy to composing circuits into bigger circuits, but that's really just converging back to a high level language.


>, but they won't have to due to performance or other constraints,

They have to because a GC runtime is too heavy for embedded environments with low resources.

>That's what the macros are for. All part of the plan.

But there's still the motivation for another language that doesn't require creating the macros. E.g. Lisp macros are so powerful that they can recreate C#'s syntax feature of LINQ queries. That's true -- but C# doesn't require making the macros.

Each programming language has a different "starting point" of convenience. If you try to invent the One Language that can create all other languages' convenience syntax via macros, you've simply motivated the existence of those other languages that don't require the macros.

And the NAND gate is an abstraction. It's an abstraction of decidable logic based on rules instead of thinking about raw voltages. We do combine/compose billions of NANDs abstractions to create higher abstractions.


GC is obviously optional for OTL. That's definitely one of the tricky bits (note: not a syntactic problem).

For the rest, I think your goal definition is too narrow. Removing every last desire for people to create alternatives is an unreasonable bar for literally any artifact, based on human psychology alone. That's explicitly not my goal (see previous comment), and with anyone who does have that goal you need to have an entirely different conversation (which, again, does not involve syntax).


>GC is obviously optional for OTL. That's definitely one of the tricky bits (note: not a syntactic problem).

If a GC language lets the programmer "opt out of GC" to mark a variable or block of memory as "fixed" so the GC doesn't scan it or move it to consolidate free space, how would one annotate that intention unless there's extra syntax for it?

Likewise in the opposition direction: If a non-GC language let's one "opt into GC", you will have ambiguities if you have syntax that allows raw pointers of dynamically calculated addresses to point to any arbitrary offset into a block of memory. That means that memory can't be part of the optional GC which would invalidate the pointer. If you restrict the optional GC language to ban undecidable dynamic pointers, it means you've created the motivation for another language with the syntax that lets you program with the freedom of dynamic pointers!

The general case of "optional GC" that covers all situations and tuning its behavior is tied to syntax because you can't invent a compiler or runtime that can read the mind of the programmer.


Set a flag during the (optional) compile phase that tells the compiler to error out if it can't statically determine where to allocate/free. (No, it's not Halting-complete because the compiler has the option of bailing due to insufficient evidence) Without that flag, it still tries but will include a GC if needed. Same for types, btw.

Ok, fine, you probably want some annotations (like for types). You got me. There's syntax involved. It's still fundamentally a semantics problem. If that can be solved, the syntax will follow.

Your post reads like, "You want to add features? But you'll have to add syntax! It's impossible!" Even if syntax is necessary for GC-obliviousness (it isn't for type inference), it implies no more about whether the project is possible than that for any other feature. Note how far we've strayed from mathematical absolutes about possible strings.

Even on the semantics side, 50 years is far too early to declare defeat. There are no actual impossibilities stopping this, unless you have a formal proof you're not telling us about. Even that would just be a guide of how to change the problem definition, in the same way that Rice's Theorem tells us to add the "insufficient evidence" output to our program verification tools. Have some more imagination.


>Note how far we've strayed from mathematical absolutes about possible strings.

Well sure, we can just theoretically concatenate all the existing programming languages' syntax today into one hypothetical huge string and call _that_ artificial mathematical construct, The One True Language. But obviously, we don't consider OTL solved so "mathematical impossible strings" is constrained to mean "nice strings" advantageous to human ergonomics: reasonable lengths that are easy to read, and easy to type, with no ambiguous syntax causing contradictions in runtime assumptions, no long compile times, etc. E.g. I have no problem typing out balanced parentheses for Lisp but I don't want to do that when writing a quick script in Linux so Bash without all those parentheses is much more convenient.

>There are no actual impossibilities stopping this, unless you have a formal proof you're not telling us about.

The mathematical limitation is that all useful higher level abstractions must have information loss of the lower level it is abstracting. This can be visualized with surjection: https://en.wikipedia.org/wiki/Bijection,_injection_and_surje...

In the wiki diagram, we can think of 'X' as low-level assembly language and 'Y' as higher-level C Language. In C, a line of code to add 2 numbers might be:

  a = b + c;
In the wiki diagram we see X elements '3' and '4' both mapped to Y element 'C'. X-3 and X-4 may be thought of as strategy #3 vs strategy #4 for picking different cpu registers before the ADD instruction and Y-C is the "a=b+c" syntax. In assembly, you manually pick the registers but in C Language you don't because gcc/clang/MSVC compilers do it. Because there are multiple ways in assembler to add numbers that collapse to the equivalent "a=b+c", there is information loss. Most of the time, C Language programmers don't care about registers which is why the C Language abstraction is useful but sometimes you do, and that's why raw assembly is still used. You can't make OTL with the syntax that handles both semantics of assembly and C. If you argue that C can have "inline assembly", that doesn't cover the situation of not having the C Runtime loaded at all that runs prior to "main()". Also, embedding asm in C is still considered by programmers as 2 languages rather than one unified one.

Or we can also think of 'X' as low-level C/C++ language that has numeric data types "short, int, long, float, double". And 'Y' is the higher-level Javascript that only has 1 number type which is a IEEE754 double precision floating point which maps to C languages "double". This means that Javascript's "information lost" is the fine-grained usage of 8-bit ints, 16-bit ints, and 32-bit ints.

If programmer John attempts to design a OTL, he will have to choose which information in the lower layer is "lost" in the runtime assumptions of the higher-level OTL. Since the John's surjection can't cover all scenarios, it motivates another programming language being created. An assumption of GC in the language runtime creates some information loss. Even an optional GC is an abstraction also creates information loss of how to manually manage memory at a lower level of abstraction.


OTL does not need to be surjective onto the set of all binary programs. You only get "information loss" when you try to go backwards, from the end result to the intent. That's reverse engineering, not programming. Now, during translation, the compiler might fill in some information you didn't care about. If you do care about specific instructions and registers for some part of your program, supply them. You probably want to have an assembly DSL that knows about how to integrate with the other code rather than embedding strings. You probably can generate any assembly this way, if just by writing exclusively in the assembly DSL, but the actual requirement is to correctly translate all valid specs.


> You only get "information loss" when you try to go backwards, from the end result to the intent. That's reverse engineering, not programming.

Instead of "information loss", another way to put it is "deliberate reduced choices to make the abstraction useful to ease cognitive burden". That way, it doesn't have connotations about reverse engineering because limitations of surjective mapping is very much about forward engineering.

E.g. I look at Javascript and think forward to engineer how I want to use integers that are larger than 2^53. Javascript's "simpler abstraction of 1 number type" loses the notion of true 64-bit int with a range up to 2^64. Therefore, I don't use Javascript if I need that capability. This means Javascript can't be the OTL for all situations. Your suggestion of Racket-like language as a candidate for OTL has the same problem: it will always have gaps in functionality/semantics/runtime that make others not want to use it and therefore they create Another Language with the desired semantics.

Supplementing the gaps via the ability to write custom DSLs and macros don't solve it. Lisp already has that now and it's not the OTL. If programmer John extends Lisp with macros to simulate monads, he'll spell the macro his way but programmer Bob will spell his macro differently. Now they've created 2 personal dialects of Lisp instead of a larger unified One True Language.

Rereading your comments, I think you're really saying that it's possible to invent the OTL for you, andrewflnr. That's probably true, but unfortunately, that's not a useful answer when the programming community is confused as to why there isn't a universal OTL yet. They're talking about the OTL that everybody can use that covers all scenarios from low-level embedded C Language to scripting to numeric computing to 4GL business languages where SQL SELECT statements are 1st class and don't require double quotes or parentheses or loading any database drivers. Such a universal programming language, if it could exist, would make the "One" in "One True Language" actually mean one.


Most/all languages today take away options. Any OTL would just provide defaults. Details are optional but always possible. I thought I was pretty clear about that re assembly. That's barely even one of the hard parts.

I'm well aware of what it means to have a language for everyone to use. I'm thinking of everything from bootloaders to machine learning to interactive shells. The reason there isn't one yet is that it's really hard. Lots of basic theory about how to think about computation is still being sounded out. Unifying frameworks have been known to take a few decades after that. Still no reason to think it's impossible.

You're just repeating that there will always be gaps, with no evidence except gaps in languages produced by today's rushed, history-bound ecosystem. You're trying to use JS as a illustration of an OTL, which is baffling. Having a limited set of integer sizes would obviously not fly.

I'm apparently not getting the vision across. This is not even a type of thing that exists today, which is why I keep saying to use more imagination. Racket is only close due to its radical flexibility in inputs and outputs.


>Any OTL would just provide defaults.

And you will inevitably have defaults that contradict each other which motivates another language.

Another way of saying "default" is "concepts in the programming language we don't even have to explicitly type by hand or have our eyeballs look at."

What should the OTL default be for not typing any explicit datatype in front of the following _x_ that works for all embedded, scientific numeric, and 4GL business?

  x = 3
Should the default _x_ be a 32bit int or 64int or 128bit int? Or a 64bit double-precision? Or a arbitrary precision decimal (512+ bits memory expandable) or arbitrary size integer (512+ bits expandable)?

Should the default for x be const or mutable? Should the default for x have overflow checks or not? Should default for x be stored in a register or on the stack? Should the name 'x' be allowed to shadow an 'x' defined at a higher scope? What about the following?

  x = 3/9
Should x be turned into approximation of 0.3333... or should x preserve the underlying symbolic representation of 2 rationals with a divide operator (3,div,9)?

The defaults contradict each other at a fundamental level. The default for x cannot be simultaneously be both a 32-bit int and a 512-bit arbitrary precision decimal at the same time. We don't need a yet-to-be-discovered computer science breakthrough to understand that limitation today.

If we go meta and say that the default interpretation for "x = 3" is that it's invalid code and the programmer must type out a datatype in front of x to make it valid, then that choice of default will also motivate another language that doesn't require manually typing out an explicit datatype!

Therefore, we can massively simplify the problem from "One True Language" to just the "One True Datatype" -- and we can't even solve that! Why is it unsolvable? Because the OTD is just another way of saying "read the mind of the programmer and predict which syntax he doesn't want to type out explicitly for convenience in the particular domain he's working in". This is not even a well-posed question for computer science research. Mind-reading is even more intractable than the Halting Problem.

As another example, the default for awk language -- without even manually typing an explicit loop -- is to process text line-by-line from top-to-bottom. This is not a reasonable default for C/Javascript/Racket/etc. But if you make the default in the proposed OTL to not have implicit text processing loop in the runtime, it motivates another language (such as awk) that allows for it. You can't have a runtime that has both simultaneous properties of implicit-text-loop and text-loop-must-be-manually-coded.

Whatever choice you make as the defaults for OTL, it will be wrong for somebody in some other use case which motivates another language that chooses a different default.

>Details are optional but always possible.

Yes, but any extra possibilities will always require extra syntax that humans don't want to type or look at. Again, it's not what's possible. It's what's easy to type and read in the specific domain that the programmer is working in.

>You're just repeating that there will always be gaps, with no evidence except gaps in languages produced by today's rushed, history-bound

Are you saying you believe that abstractions today have gaps but tomorrow's yet-to-be-invented abstractions can be created without gaps and we just haven't discovered it yet because it's really hard with our limited imagination? Is that a fair restatement of your position?

Gaps don't just exist because of myopic accidents of history. Gaps must exist because they are fundamental to creating abstractions. To create an abstraction is to create the existence of gaps at the same time. Gaps are what make the abstraction useful. A map (whether fold paper map or online Google maps) is an abstraction of the real underlying territory. The map must have gaps of information loss because otherwise, the map would be the same size and same atoms as the underlying territory -- and thus the map would no longer be a "map".

The mathematical concept of "average or mean" is an abstraction tool of summing a set of numbers and dividing by its count. The "average" as one type of statistics shorthand, adds power of reasoning by letting us ignore the details but to do so, it also has gaps because there is information loss of all the individual elements that contributed to that average. The unavoidable information loss is what makes "average" usable in speech or writing. You cannot invent a new mathematical "average" which preserves all elements with no information loss because doing so means it's no longer the average. We can write "the average life expectancy is 78.6 in the USA". We can't write "the average life expectancy is [82,55,77,1,...300 million more elements divided by 300 million] in the USA" because that huge sentence's text would then be a gigabyte in size and incomprehensible. You can invent a different abstraction such as "weighted average" or "median" or "mode" but those other abstractions also have "information loss". You're just choosing different information to throw away. We can't just say we're not using enough imagination to envision a new type of mathematical "average" abstraction that will allow us to write an alternative sentence that preserves all information of individual age elements without the sentence being a gigabyte in size.

>JS as a illustration of an OTL, which is baffling.

No, I was using JS as one example about surjection that affects forward engineering. When I say "this means Javascript can't be the OTL for all situations", it's saying all programming languages above NAND gates will have gaps and thus you can't make a OTL.

What's baffling is why anyone would think Racket's (1) defaults + (2) DSL + (3) macros -- would even be a realistic starting point for the universal OTL. The features (1,2,3) you propose as ingredients for universal OTL are the very same undesirable things that motivates other alternative languages to exist! Inappropriate defaults motivates another language with different defaults. The ability to write DSLs motivate another language that doesn't require coding that DSL. The flexibility of coding macros motivate another language that doesn't require coding the macro.


I don't think syntax is the big limitation here; it's library and behavior design.

The One Language concept could still be considered to be accomplished by a language with two syntaxes provided they have a low friction to interoperability.


This sounds more like a you-problem than a programming language problem.

The fact of the matter is that there can be no "perfect" programming language in the sense that it perfectly fits all possible use cases.

So rather than trying to develop or hoping for such language to be developed, a programmer should become multi-lingual. Experiencing first-hand how different programming languages and paradigms approach problems not only broadens the horizon, but also helps with choosing the right tool for the job.

No sane contractor would build a house using nothing but a hammer after all.


A programming language is not tool, like a hammer, with which you build a house. It's a truck full of toolboxes, containing sets of tools made to work in harmony, that you use to build parts of the house that requires a human to deal with it manually.


> I'm hoping the technology will advance enough so that we eventually get one language that can navigate both side of the spectrum.

Me too! I'd love a language at the JavaScript/Python/PHP/Perl level, but in a Swift/Rust style. Possibly with some kind of gradual typing. TypeScript is pretty close to this, but alas its type system isn't sound. And it has to deal with the legacy of JS semantics (like exceptions).


Perhaps Raku (https://raku.org) could hit your sweet spot?


I think Swift more or less hits this sweet spot for me. As mentioned Kotlin is also very close but comes with some baggage. Crystal and Nim are on the horizon and are promising this kind of combination of ergonomics, correctness and performance.


Swift has a long road ahead for the very "low" end, i.e. replacing C or even C++. It's missing, or has extremely awkward versions (`UnsafeOMGPointer`) of various pieces right now, [and some may never even be added][0].

[0]:https://forums.swift.org/t/bit-fields-like-in-c/34651/7


Yeah that makes sense, I guess in my mind I don't see it as a C or C++ replacement as with Rust. To me it fits as a slightly higher level, safe, general purpose language with pretty good performance for most tasks you throw at it. After working with it for a year I feel very productive and that I can trust my code will work if it compiles. A similar feeling to Elm or maybe Rust but yet to spend a longer time with Rust.


Yeah, fair, and I agree with your take; there's just this longstanding idea/goal that Swift can (or will be able to) do it all.


Have you tried Kotlin?


Sounds like you'd like Haxe.


Instead of a reaction to scripting languages, or maybe in addition to, I think the current trends of shepherding languages are reacting to the flexibility of C and, even more so C++. C++ in particular is such a mind-boggling huge language. It presents so many choices that designing anything new involves searching a massive solution space. A task better left to experts.

Newbies (speaking from experience) need a framework to lean on. Something that provides a starting point for solving problems. Opinionated languages provide that out of the box.


I think the "C++ is huge" complaints are a bit overblown. C++ is huge, but most of its new features are designed with backwards compatibility in mind - if the size of the language bothers you, then you can write the limited subset of whatever C++ you know, or even just straight-up C, while making use of new features (auto, foreach, smart pointers) as you see fit. It's an all-you-can eat standard library buffet.


> Both Rust and Haskell wrap values that can fail in little boxes, and to get them out you have to check which type of value it is, and in C# there's nothing stopping you from returning null and not telling anyone that you can return null, and just assuming people will check for null all the time.

F# is the .NET citizen that does the equivalent of the Rust or Haskell stuff.

Either you use an option type (https://fsharpforfunandprofit.com/posts/the-option-type/) which is an easy way of making a function that says 'user says to find a record with the name of Bob, and you will either get a return type of Some record(id:1,name:bob), or you will get a return type of None'

     let GetThisRecord(name) =
          if SomeDatabaseLookup(name).IsSome then
               Some record(SomeDatabaseLookup(name).Value) // not idiomatic but works and is faster than a match
          else
               None
Or you use the Success/Failure type (see railway oriented programming)


The haskell situation sounds like generally a good thing but I am not sure I would like it very much if this also applies to logging.... It does not sound like great fun to have to change the signature of a function when it needs to log something and then change it again if it no longer needs to.


For logging, you can use unsafePerformIO. Of course, you would call it inside a special function that can do logging. In fact, there are functions in Debug.Trace that do exactly that (to standard output).

Similarly, I used unsafePerformIO (again put into a convenient function) to save a checkpoint data in a large computation. The computation is defined pure, but it calls the function to checkpoint, which does in fact IO under the covers.

Remember, type safety is there to help you. As long as the function performing the I/O doesn't affect the outcome of the computation, everything is safe.


> As long as the function performing the I/O doesn't affect the outcome of the computation, everything is safe.

except it's not! Your IO action may not affect the outcome of the computation but it may launch the missiles in background, which changes everything. The less contrived example would be - "computation is fine and is not affected, yet we have our [production cluster deleted / disk space run out / money sent to wrong recepients] by the IO action".


Despite the name, unsafePerformIO isn't automatically akin to undefined behavior in C. It can cause undefined behavior if misused, the most obvious example being the creation of polymorphic I/O reference objects which act like unsafeCoerce—but that would be affecting the outcome of the computation. If the value returned from unsafePerformIO is a pure function of the inputs then the only remaining risk is that any side effects may occur more than once or not at all depending on how the pure code is evaluated. As long as you're okay with that there isn't really any issue with using something like Debug.Trace for its intended purpose, debugging.

There are better ways to handle logging, of course—you generally want your log entries to be deterministic, and the ability to produce log entries (as opposed to arbitrary I/O actions) should be reflected in the types.


Debug.Trace doesn't launch missiles or delete clusters or send money. It might run you out of disk space, but so can safe IO.


Honestly, that will depend on what exactly both you and the GP are calling "logging".

Usually "logging" is semantically relevant, and it better reflect on the return type. But well, it's pretty useless to log the execution of pure code anyway.

I agree that GP seems to be talking about print-debugging (one doesn't go changing his mind about semantically relevant logging), so everything on your comment is on the spot, but generalizing this can lead to confusion.


Standard functional programming methods apply, in this case you would use inversion of control to limit the access to I/O.

If you need to do "semantically relevant" logging from a pure function, just create a pure function to process the semantic relevant part to something generic (like a Text), and call the simple unsafe logging function on the generic result.


Thinking about the systems I work with I can only think of a few cases where logging is semantically relevant (the way I understand it).

One is replaying critical failed requests when a downstream was offline and the other is gathering tracking statistics from apache access logs.

Everything else I would classify as diagnostic, wondering if you would consider that semantic as well.


To clarify it, what I mean by semantically relevant is if it is on the user requirements. It's not semantically relevant if it's there just to make the developer's life easier. So, it seems we are using the same definition.

Every kind of software has some error log, long lived servers tend to have some usage log too, databases tend to have journaling logs, and distributed computing tends to have a retry log. There are other kinds of them, like all those lines a compiler outputs when it tries to work on a program, or the ones a hardware programmer shows while working. Every one of those are there for the user.


Okay, that does sound like a workable solution.


There's a tendency to be very idealistic when talking about IO in haskell, people talk about launching missiles when you ask to print a string and it makes you think we're purist fools. For debugging you can easily drop print statements in without affecting type signatures (with the Debug.Trace package) and this is really helpful but in production you almost never want logging inside pure functions. Think about it, why would you want to log runtime information inside a function that does arithmetic or parses a JSON string? The interesting stuff is when you receive a network request or fail to open a file.


If you have a large application written in Haskell, you're probably already using some sort of abstract or extensible monad for your "business logic", and that means it's usually not hard (in practice) to add a MonadLogger instance to your code.

Also, when you've written Haskell for long enough, you start to write your code in such a way that it's astronomically unlikely that you need to add logging to a pure function. I haven't found myself wanting to do that in years. Haskell has a library to do logging in pure code, but it's unpopular for a reason.


You generally would not put logging into pure functions as that would be fairly pointless. You only log in the IO actions where you can log freely anyway.


In my experience, it actually is a good thing to have to do that, especially in a context-logging world. The actual refactoring is rarely at all difficult in my experience, and by doing so you can make it so logging context is automatically threaded everywhere more ergonomically than other languages even!

And usually when you're logging, it's near other IO anyways. So that makes it even easier.


This.

People just want to get things done, and at some point you start fighting the language, except that the language wins and you lose.

One thing I like about PowerShell is that functions are surprisingly complex little state machines with input streams, begin/process/end pipeline handling, and multiple output streams.

Everything is optional and pluggable. So if you want to intercept the warnings of a function, you can, but it won't pollute your output type.

So in Haskell and Rust, you have "one channel" that you have to make into a tuple. E.g. in Rust syntax:

   fn foo() -> (data,err)
Imagine if you wanted verbose logs, info logs, warnings, errors, etc! You'd have to do something psychotic like:

   fn foo() -> (data,verbose,info,warn,err)
In PowerShell, a function's output is just the objects it returns. E.g. if you do this:

    $result = Invoke-Foo
The $result will contain only your data. Warnings and Errors go to the console. But you can capture them if you want:

    $result = Invoke-Foo -WarningVariable warn -ErrorVariable err
    if ( $warn ) { ... }
In some languages, like Java, strongly typed Exceptions play a similar role. You can ignore them if you like and let them bubble up, or you can capture them, or some subtree of the available types. The only issue is that this mechanism is intended for "exceptional errors" and is too inefficient for general control flow.

There have been proposals for extensible, strongly-typed control flow where functions can have more than just a "return". They can also throw exceptions, raise warnings, yield multiple results, log information, etc... The calling code can then decide how to interact with these in a strongly typed manner, unlike the PowerShell examples above which are one-way and weakly typed.

I'm a bit saddened that Rust didn't go down this path, instead preferring to inherit the current style of providing only a handful of hard-coded control flows, some of which are weakly typed. For example, there's only one "panic", unlike typed exceptions in Java.


> You'd have to do something psychotic like:

You wouldn't have to do this. First of all, if you're talking about something that can error, you'd use a Result, not a tuple (I'm going to use Rust names here):

  fn foo() -> Result<Data, Error> {
Note that you choose both of these types. You can make them do whatever you want. If you wanted to be able to stream those non-fatal things back to the parent, you'd either enhance the Data type to hold them, in which case there'd be no changes, or you'd create a wrapper type for it. You still end up with Result.

Rust also doesn't like globals as much as many languages, but doesn't hate them as much as haskell. Most logging is sent to a thread-local or static logger, so you don't tend to have this in the signature.

In general, many people consider the Result-based system Rust has to be much closer to Java's checked exceptions than most other things. I don't personally because the composability properties feel different to me, but it's also been a long time since I wrote a significant amount of Java code.


> People just want to get things done

If you let people "just get things done", they usually do a shitty job, as we've seen from the last 50 years of software development. People need at least one of unfailing mechanical guidance or impressive levels of restraint. Most people don't have that much restraint (and it's exhausting to keep it up all the time), so the practical option is to have the compiler keep us in check.

If I'm not using Haskell (or equivalent), I usually end up thinking "eh, a quick print statement in the middle of this function won't hurt anybody" and before I know it I've lost the compositionally that makes me love Haskell programming.

> strongly-typed control flow where functions can have more than just a "return"

This sounds to me like what monads give you. ContT, MTL stacks, effect monads, take your pick. There are several ways to get strongly-typed advanced control flow in Haskell.


Hum... Haskell is the one language where people use pluggable middleware everywhere.

But if you program like in C#, you really won't be able to.


> literally any function that has an IO action taints the value returned from that function, causing it to be an IO value, and trying to pass that IO value into another function makes the return type of that function IO, too. Parametric polymorphism is the default, too, so it also shepherds you into writing general purpose code. Haskell is full of these little decisions where it just won't let you do something because it's not "correct" code, and they kind of don't care if that makes coding in it a fight against the compiler.

From a Haskell perspective, and a correctness perspective, and also Rust with its pointer tracking, all this makes sense. It's very helpful for correctness.

Yet, the IO monad "virality" reminds me of Java checked exceptions. Checked exceptions mean every function type signature includes the set of exceptions that function might throw.

When that was introduced, it was thought to be a good idea because it's part of the type-safety of Java and will ensure programmers write code that deals with exceptions correctly, one way or another.

But some years later, people started to argue that listing exceptions in the type signature is causing more software engineering problems than it solves (and C# designers took the decision to not include checked exceptions). Googling "checked exceptions harmful" yields plenty of essays on this.

For checked exceptions, there are people arguing both sides of it. Yet they are pretty much all fans of static typing for the rest of the language; it isn't an argument between people who favour static vs. dynamic typing.

So why are checked exceptions considered harmful by some? On the face of it, there's an argument against verbosity. But the deeper one is about software engineering. What I call "type brittleness".

When you have a large codebase, beautifully and carefully annotated with exact, detailed checked-exception signatures, then one day you have to add a trivial little something to one little function that might throw an exception not already in that function's signature... You may have to go through the large codebase, updating signatures on hundreds or thousands of functions which use the first little function indirectly.

And that's if you have the source. When you have libaries you can't change, you have to wrap and unwrap exceptions all over the place to allow them to propagate via libraries which call back to your own code. Sometimes there is no exception type explicitly allowed by the libaries, so you wrap and unwrap using Java's RuntimeException, the one which all functions allow.

The "viral effect" of so much effort for sometimes tiny changes is a brittleness issue. It leads people to resort to "catch and discard all" try-blocks, to confine the virality Sometimes it's "temporary", but you know how it is with temporary things. Sometimes it isn't temporary because the programmer can't find another clean way to do it while not modifying things they shouldn't or can't.


> When you have a large codebase, beautifully and carefully annotated with exact, detailed checked-exception signatures, then one day you have to add a trivial little something to one little function that might throw an exception not already in that function's signature... You may have to go through the large codebase, updating signatures on hundreds or thousands of functions which use the first little function indirectly.

And you know what? That's probably a good thing. How else can you be sure that all those functions can deal with that exception correctly? If you're adding a new exceptional case to an operation, and rather than handle it locally you decide to punt the issue up the call stack, you should expect that to have far-ranging effects on the rest of the codebase. At that point you have two options for limiting the impact: you can handle the error close to the source, or you can rethrow it as a more general-purpose exception type which is already part of the function's signature (i.e. RuntimeException in Java) with the understanding that any handling of that exception will likewise be generic—typically cancelling or retrying the entire operation.

Of course, libraries which call back in to the user's code can be an issue. (More so in Java than Haskell—so far as I know Java doesn't have any way to make library functions polymorphic in the kinds of exceptions they can throw, whereas in Haskell the exceptions are just part of the type signature so there's no issue with saying "this function throws the same exceptions as the callback".) You may need to temporarily convert the exception into a return value or even provide some out-of-band channel to smuggle it across the library boundary.


> Java checked exceptions

Actually CLU checked exceptions, Modula-3 exception sets, C++ exception specifications.


Good points, all.

I thought of Java only because I'd been reading essays about Java exceptions considered harmful, and then one day I recognised the problem it described, where to change one small function I had to do an absurd number of boilerplate-like edits elsewhere.

I found it quite thought-provoking about "type brittleness" with regard to aspects of the dynamic vs. static typing debate.

I've written in Haskell and SML too, where it didn't feel like the same level of brittleness. Perhaps it's to do with the size of applications and libraries, and how they evolve.

That's why I think of it as a software engineering getting-the-balance-right thing, rather than a correctness vs. prototype-in-a-hurry thing as static-vs-dynamic is often portrayed.


I jump between Java and .NET languages depending on the project/customer, and one thing it bothers me in .NET land, or JVM guest languages, is having to hunt for exceptions, because documentation in some libraries is hardly up to date.

So one ends up putting a couple of catch all handlers in critical code paths, just in case.


"Rust makes it quite hard to do things" generally as a result of that decision. Even just syntactically it's a large overhead. It does force you to explicitly manage lifetimes at every place in your code. Which is a good example of the wrong implementation of the wrong objective.


I agree with your assessment 100%. Does anyone else out there get frustrated with "bare hands" conventions? That's where you have to manually follow a verbose convention or write things like glue manually, when the compiler/runtime could do more of the heavy lifting automatically for us.

For example, say we want to hide low-level threading primitives due to their danger. So we implement a channel system like Go. But we run into a problem where copying data is expensive, so the compiler/runtime has an elaborate mechanism to pass everything by reference and verify that two threads don't try to write the same data. I'm glossing over details here, but basically we end up with Rust.

But what if we questioned our initial assumptions and borrowed techniques from other languages? So we decide to pass everything by value and use a mechanism like copy-on-write (COW) so that mutable data isn't actually copied until it's changed. Now we end up with something more like Clojure and state begins to look more like git under the hood. But novices can just be told that piping data between threads is a free operation unless it's mutated.

To me, the second approach has numerous advantages. I can't prove it mathematically, but my instincts and experience tell me that both approaches can be made to have nearly identical performance. So on a very basic level, I don't quite understand why Rust is a thing. And I look at tons of languages today and sense those fundamental code smells that nobody seems to talk about like boxing, not automatically converting for to foreach to higher level functions (by statically tracing side effects), making us manually write prototypes/headers, etc etc etc.

I really feel that if we could gather all of the best aspects of every language (for example the "having the system on hand" convenience of PHP, the vector processing of MATLAB, the "automagically convert this code to SIMD to run on the GPU" of Julia <- do I have this right?), then we could design a language that satisfies every instinct we have as developers (so that we almost don't need a manual) while at the same time giving us the formalism and performance of the more advanced languages like Haskell. What I'm trying to say is that I think that safe functional programming could be made to look nearly identical to Javascript, or even some of the spoken-language attempts like HyperTalk.

The handwaving around the bare hands stuff is what tires me out as a coder today because fundamentally I just don't view it as necessary. I really believe that there is always a better way, and that we can evolve towards that.


This is my main issue with C++. For a while my job was to get game engine codebases running, integrate tools and move on. So I saw a lot of big C++ codebases. Nearly every one had the same bad behaviors. Tons of globals. Configuring build options from code. Header mazes that made it clear people didn't actually know what code their classes needed.

I then worked for awhile developing a fairly fresh C++ code base. The programers I worked with were very willing to write maintainable code and follow a standard and it was still really damn hard to keep things like header hygene.

When I go back to the language I can't believe how much time I spend dealing with minor issues that stem from the bad habits it builds. For years I would refuse to say any language was good or bad. Always I insisted you use the right tool for the job. And there are some features of C++ that when you need them you have to use that language or maybe C in its place. But the shortcomings are unrelated to language's issues which largely seem to come from a focus on backward comparability. And so even used in its right application it seems incredibly flawed. And I pretty much believe it's a bad language now.

Disclaimer: I learned to program with C++, I understand its power and for years I loved the language. I also understand there are situations where despite its shortcomings it is the right choice.


Why are globals considered bad? I'm seriously asking. I, too, have been told hundreds of times over the course of my career, and I never questioned it. I want to question it now, because I've never understood why people work SO HARD to remove and avoid globals. I seriously doubt that the time and effort I've seen spent on removing and avoiding globals has been time well spent. And I'm quite sure that the effort spent on that is not comparable to the amount of problems prevented by not having globals. There's just no way globals can be dangerous enough to justify the size of globals-cleansing efforts I've seen.

Game development often has a very large global state, and game problems are often inherently global state manipulation problems; you need globals in order to even have the game in many cases.


Imagine a kitchen where a hundred cooks are trying to make the same pot of soup, same pile of ingredients and utensils. Now imagine they all have telekinesis. That’s global state.

The problem is that when disparate bits of code directly affect the details or internals of a state machine, is pretty much impossible to ever maintain a valid state at all times. Throw in threading and the whole mess becomes non deterministic to boot.

All state management tools and procedures seek to handle this by encapsulating details and establishing rules for updates. Some like Finite state machines are more fixed and formalizable. Some like Redux are looser but stay deterministic.


That is such a fun image! Are any teachers taking note? I think this is a fun metaphor metaphor to use in a classroom setting.


As you mentioned state machines and patterns like reducers allow you to make state changes deterministic, solving the 'telekinesis' problem for global state. Conclusion?


There isn’t really a conclusion - each solution pattern allows you to trade off progressively less control for more rigidity and determinism. Pick a system that matches your use case the best.

Think of all your state as a state machine. Is there a finite number of possible states you can be in, with clearly defined ways to go to each from each one? You have a finite state machine. Lots of libraries will be available in your language.

Are your state combinations unbounded and unknowable, but still subject to validation and sequencing? This is pretty much any UI - a Redux style system helps you organise changes and make them linear. Any number of states are possible, but they’re all deterministic and can be reproduced.

Can’t linearise the states but still have validation rules for correctness? Sounds like an RDMBS type system - set up you constraints, foreign keys and go to town with any number of threads.

There’s really no right answer. I just try to understand the problem as well as I can and see if the solution presents itself.


There’s also one step after RDBMSs, which is the Redis style key value or data structure stores that allow some level of client based or cooperative structuring, using conditional sets and gets or CAS operations.

Then finally there’s the Wild West of everyone do whatever they want.


Global state is nearly impossible to test in any decent automated fashion. When writing unit tests, globals are the bane of your existence.

If you’re relying on globals for passing data, they are also difficult to reason about in multithreaded code.

There are means by which you can share data, that data if instantiated at the code entry point, can be shared in such a way as to never need globals, and rely on decent patterns for sharing between code points.

Yes, there is a trade off in adding parameters to functions, references in classes, but these can be avoided by adopting patterns like inversion of control, etc.

Basically globals are a bad pattern because they make it hard to test and hard to reason about data access patterns.


This is only true in a case where you don't spin up and tear down your program per test case. And I don't want to defend globals.

Globals are bad because they are just often used poorly. In large because they require you to think about the whole system as you make changes.

Ironically, the best changes are done with the whole system in mind. Such that sometimes establishing a few core globals and some rules for how they will be treated can actually help your logic. So it really is a tradeoff. With a great slogan of "think globally, but act locally."


It's about scope. The "ideal" design one pattern is supposed to be separation of concerns - the devolution of performance and responsibility into units that can be built and tested independently.

This is fine when that design pattern fits the domain. But some domains require global context, and it isn't useful or possible to strictly enforce separation - because you end up passing parameter bundles around and managing all those local scopes introduces more bugs than implementing a global context.

Multithreading is a different issue, and is a different kind of domain requirement. If you need multithreading and have a global context, you have a very interesting problem to solve.


> This is only true in a case where you don't spin up and tear down your program per test case.

Well, yes, but then the unit tests end up taking twenty minutes.


Not only that, you also can't trust that the test results will apply to any situation where the user doesn't restart the program after every action—i.e., to normal operation.

Don't restart the program between tests. Randomize the order of the test cases between runs. Try running the same test multiple times on occasion.


You shouldn't have your tests artificially limit the life cycle of parts. Either for artificial reuse or artificial termination.

To that end, if you have variables that live as long as your program, or longer, have your tests reflect that.


I agree that globals are usually a bad pattern, but there are situations where judicious of globals is warranted and can actually improve readability.

An example is small scripts, where the scope of the script limits the scope of the global. The overhead of an abstraction doesn't pay off in that case.

Another example are "near constants" like a locale setting, an environment variable that gets detected once at startup, or a development feature flag. The "proper" way to structure those is to create a settings object and pass it to every function that needs them, but judicious use of a well-documented global can prevent a lot of boiler plate code.

Of course, as soon as the code base needs to be touched by many devs, especially less experienced ones, it's safer to say "never do it" than "judiciously use", so I understand why most textbooks say this.


> Another example are "near constants" like a locale setting, an environment variable that gets detected once at startup, or a development feature flag. The "proper" way to structure those is to create a settings object and pass it to every function that needs them, but judicious use of a well-documented global can prevent a lot of boiler plate code.

In small programs, globals are ok, but in larger programs a better approach would be a global accessor that gives you read-only access:

    printf("%s\n", Environment()->Host);
This doesn't require passing an object to every function, and the application still can't trample on these variables.


I don't understand. If you have a good understanding of the code you're writing, you won't put yourself into a position where globals cause problems unless you're being very stupid, and if you do, normal use of the program will detect those problems, right? Certainly bug reproduction steps and a debugger will figure out what's going on.

You mentioned unit tests, and these are another thing I don't fully understand. Obviously testing your code is important, and automated tests are good. My beef with unit testing comes with the requirement that all methods and functions have multiple tests each for success and failure conditions, and that results in test code which outweighs tested code by several times. When you discover that the architecture you've been putting together isn't going to work, which is something that happens approximately 100% of the time if you're doing anything real, you now have (say) 5,000 lines of code that needs rework, and 50,000 lines of test code that need to be thrown away and rewritten.

That is A LOT of effort to shove onto yourself to avoid a few global variables, to me. That's so much effort that many projects will just not make the change and ship software that they know is insufficient, and then they'll graft on whatever functionality can't be attained natively with the given architectural decisions rather than redesign.

The ability to paralyze yourself with the weight of unit test code seems like an extremely high price to pay to avoid some global variables.


I think that globals are not a problem when "you have a good [enough] understanding of the code you're writing." The problem is when code bases grow, references to globals can start to appear in lot of different places, and the exact use of a particular one can be hard to reason about. (Strictly talking about mutable global state here.)

As code bases grow, and developers come and go, eventually no one will have a "good [enough] understanding." Mutable global state is fundamentally hard to reason about since it can be changed at any time by any part of the program. When you first start out the codebase, you can just remember where all the usages are. But eventually that is not a good approach.

I consider the testing stuff orthogonal and muddying the issue. Mutable globals are hard to reason about, therefore they can make code hard to debug. Thus they should be avoided. No need to bring testing into the picture.


I think most of the problems with globals can be solved at a language level. Immutable references to globals are practically never a problem, so if your language forced you to explicitly mutably borrow a reference to a global variable you force the programmer to think about every instance of code where they are modifying the global.

This also enables tooling to for example syntax highlight these things differently. An immutable global looks like any other variable, but a mutable global is bold red.

You bet your ass that people will think about whether they really need it mutable in that case, and they'll know everywhere it's made mutable and therefore error prone.

Again this comes back to shepherding. Globals in Rust aren't the same as globals in C++ because the languages shepherd you differently.


Remember the old phrase, “imagine the person who maintains your code is an axe wielding murderer who has your home address”?


One of the problems I’ve encountered in the wild is that globals often mean that you have to check the entire program when things go weird. You’re right: if you have a complete understanding of the entire codebase, then it probably won’t be an issue. But software grows and ages; globals won’t hurt you (much) early on in the project, but they start to in the long term. Your coworker modifies it in a place where you don’t realize it’s being modified, and things that worked fine yesterday stop working. The coworker might be you when you’re tired :)


Not sure what your gripe on testing has to do with what the comment is saying. Globals make testing hard.

The simple answer is that globals are expensive. Literally, they cost a lot of money. They introduce bugs that are harder to find, reproduce, and fix. That means introducing a global is a high risk, since it's increasing the expected value of your non recoverable engineering costs.

Rejecting globals is about lowering risk and cost because it's so easy to not use them and toss them out of code review, and it's really easy to work around that limitation.

Gonna remove my more uncivil remark. Basically relying on bug reports and debugging is the software equivalent of waiting for your engine to seize before you change the oil.


> When you discover that the architecture you've been putting together isn't going to work

One of the underappreciated benefits of unit tests is it quickly teaches you how to write good code. It turns out testable code is also code that tends to be well architected and doesn't need to be rewritten. Basically writing tests leads you to being a better programmer


Unit tests are perhaps good for instilling a decent sense of function decomposition, but make no mistake, you can go too far in this direction and not develop the sense of an integrated system. It's a hard problem to avoid, especially when starting out. That's one of the reasons I generally find type-driven development better for seeing how parts are actually interacting.

Not to discount, testing, naturally, but I also prefer property based testing to unit for the same reason (i.e. a function can be a mini-system with relationships between internal values that may not be exposed with unit tests.)


That's a myth. It teaches you to write code which is easily unit testable. That may be a better architecture than the one you would have used, but often it's just a different architecture, sometimes even markedly worse.

I have seen far too many code bases with simple things chopped up beyond recognition to make the code unit testable.


"[S]imple things chopped up beyond recognition" sounds like a hyperbolic argument to me. What is an example? When is the maxim "A function should do one thing well" not applicable?


> One of the underappreciated benefits of unit tests is it quickly teaches you how to write good code. It turns out testable code is also code that tends to be well architected and doesn't need to be rewritten. Basically writing tests leads you to being a better programmer

In majority of situations, this holds (apart from the "doesn't need to be rewritten" part!). But there's a large minority of situations where it doesn't.


I have never witnessed that in 15 years of working at places which write unit tests. I've witnessed a LOT of unit tests which test nothing and manually return the pass/fail result desired so the indicators stay green.


That's very unfortunate.


I think the "leave the site better than you found it" advice applies here. Whenever you need to touch a piece of code, write the proper tests (hell, add some fuzzy testing if you don't want to write them by hand) and then improve that code.


Try reasoning about a code base that is 5million+ LOC, tens of thousands or hundreds of thousands of functions and has 300+ people working on it. THis

>If you have a good understanding of the code you're writing, you won't put yourself into a position where globals cause problems

Becomes basically impossible.


Because of spooky action at a distance.

Consider the following code fragment:

  glob = 5;
  f();
  // what is the value of glob here?
The problem with globals is that you can't know. f might change glob, directly or indirectly, and there is no way you can keep in mind all possible changes (especially with multiple people working on the same codebase).

(The same problem can happen, on a more limited scale, with class fields - which is why some of us insist on requesting that classes are kept small and cohesive.)

Note that this does not happen as often with database values (which are also globals that can be changed from any point in the program) because of expectations. When using those, we have all kinds of mechanisms - like transactions and isolation modes - that let us specify how much we want a value we have written to stay like that until we're done with it; when we don't use those mechanisms we generally expect that "this value could change between one statement and the next".


I think one of the main complaints about global variables is that because you can change them from anywhere within the code, you are tempted to actually do so, which can get into some pretty nightmare debugging scenarios. If you truly have global state, I think the preferred solution is to have one piece of code which changes/updates the state, but everywhere else may simply read it. Then you at least know where the problem has to be if your state updates are buggy.


A loss of local reasoning. You don't know which functions will touch the variable and when.

You might know at a high level on paper, but you won't have clean, easy-to-read and easy-to-predict life times. Then you'll have race conditions.


Related to this: the great advantages of pure functions.


> You don't know which functions will touch the variable and when.

How do you not know? You have the source code. You can run it through a debugger. "grep" can find where that variable is used. Of course you know...


I don't think that person means you literally can't know, just that it increases the difficulty of reasoning through the code.

I was debugging some code earlier today. Someone had put a global variable that is either altered or used in 4 or 5 different functions across our codebase. I had to literally draw out the paths a user could go down to figure out what the value of this global variable would be at the time I was trying to call one of those functions. It was not awesome.

I figured it out, so you're right. I do know which functions touch the variable and NOW I know when. But I still can't guarantee the value of the variable.

Needless to say, tomorrow will see a little refactoring.


I was dealing with a hard problem earlier this week, which I'm pretty sure was causing a thread to crash without logging anything, but the program to stay running. Unfortunately, only seen in production and only once every few days.

The program does several stages of data processing in parallel batches, initially loading and eventually saving to a database. It's basically a "continuous" and complicated ETL.

There is effectively a set of global state variables to track progress of each input item through the stages. The values in this global state can depend on the data, execution order, and can be modified from a dozen places in the code.

I narrowed down several potential crash points, which was basically stuff like: if the global state contains x and a db lookup in thread 2 times out, if thread 3 accesses the value before 2 starts the next batch it could get a null reference. Another was based on making a decision to insert or update: in theory, the two global state value that effectively made this decision could never be set to states where it would do the wrong thing (getting either a foreign or duplicate key error) but the state is possible to represent.

If I were to run in a debugger using the massive production data stream I might eventually get lucky and see the data that triggers this. However, I could also sit for days and get nowhere, or the act of debugging and inspecting night be enough to prevent a race condition and not trigger a bug.

I still don't know for sure what's happening (though now there's instrumentation and better error handling in those spots so hopefully I will), but the point here is it's nearly impossible to reason about in a definitive way.


This works fine on small scales.

When dealing with millions of lines of code, I do not have the time to read the whole thing and internalize it's whole state. Understanding the call graph can help, but diving through every abstract interface and callback and abstraction is a non-starter. Even if I had time to read the entire codebase line by line, I wouldn't be able to fit it all in my head, and I often have enough coworkers that changes are occuring faster than I can read and understand them all.

Even the codebases I work on are dwarfed by much larger ones.


> How do you not know? You have the source code.

For instance, concurrent accesses and modifications could occur in any order.


You loose local reasoning as was already said.

In theory you have the source code and you can know everything just by reading it all and debugging it all. In practice it becomes overwhelming.

Even intelligent people can only fit a little bit of information into working memory in their heads at a time. Mere mortals have no chance. We need things to be bite size and local and simple so we can fit it in our heads and reason about it.

Global variables force you to do global reasoning, which a human mind just doesn't have the capacity to do.


There are lots of ingenious ways to accidentally hide where a variable is used. Start passing some pointers around and storing them off under different names.

And of course with a race condition in a multithreaded context knowing where a variable is accessed is about 1% of the battle.


Reasoning about code requires reasoning about relevant state. On the one extreme, you have pure functional programming, where all state is passed in and returned out - all relevant state is explicit and "obvious". On the other extreme, you might use global state for everything - relevant state requires diving into all your code. This sounds unthinkable in the modern era, but similar styles aren't entirely uncommon in sufficiently old codebases with codebases that didn't really bother to use the stack.

This is part of the reason why memory corruption bugs can be so insidious in large codebases - if anything in your codebase could've corrupted that bit of memory, and your codebase is millions of lines of code, you have a large haystack to find your bugs in, and your struggle will be to narrow down the relevant code to figure out where the bug actually is. This isn't hypothetical - I've had system UI switch to Chinese because of a use-after-free bug relating to gamepad use in other people's code, for example.

(EDIT: Just to be clear - globals don't particularly exacerbate memory corruption issues, I'm just drawing some parallels between the difficulty in reasoning about global state and the difficulty in debugging memory corruption bugs.)

> Game development

John Carmack on the subject, praising nice and self contained functional code and at some point mentioning some of the horrible global flag driven messes that have caused problems in their codebase, mirroring my own experiences: https://www.youtube.com/watch?v=1PhArSujR_A&feature=youtu.be...

> you need globals in order to even have the game in many cases.

Simply untrue unless you're playing quite sloppy with the definition of "globals" and "global state". The problem isn't that one instance of a thing exists throughout your program, it's that access is completely unconstrained. Game problems do often involve cross cutting concerns that span lots of seemingly unrelated systems, but globals aren't the only way to solve these.


> Game development often has a very large global state

Not any more!

I'm currently playing Doom Eternal, and I've got to take my hat off to its developers: It's ridiculously well optimised! I played the previous version of Doom on the same hardware, and it was a stuttering mess at 4K, but now it's silky smooth with Ultra Nightmare quality settings. Wow.

They achieved this by breaking up the game into about a hundred "tasks" per frame, and each task runs in parallel across all available cores. These submit graphics calls in parallel using the Vulkan API.

There is just no way to write an engine like this with a "very large global state". No human is that good at writing thread-safe data structures.

The only way to do it is to separate the data and code, making sure each unit does its own thing, independently of the others as much as possible.


Hum... May I ask how those different tasks communicate with each other?


I have no idea about how Doom Eternal does it, but John Carmack has some ideas on how to parallelize game engines here: https://www.youtube.com/watch?v=1PhArSujR_A&feature=youtu.be...


Haha, love that game, but it gives me high blood pressure ;-).


Where would be a good thing to read about that?


I'll try to address things other replies haven't. Global variables are not just a problem for understanding code, but they also have a large potential for causing incredibly hard to debug bugs. Say you're writing a parser and decide to use `strtok`, which uses global variables. Everything works fine, but then you try to improve performance using multiple threads and suddenly your linux and mac users are seeing all kinds of weird incorrect behavior. Turns out strtok uses thread local storage on windows, but not on other platforms, so your parallel strtok's were all overriding each other.


It's a good question, and we should always question our assumptions.

Globals and Singleton avoidance stem from long-term experience. Their design tend to lead to write-only code: Because any part of the code can at any time access and modify them, globals quickly become distinct from your main program flow!

This property makes them more complex to reason about, while overall codebase complexity tend to increase as well. From a complexity standpoint, at some time globals become an untenable nightmare to develop further and maintain. Because of lack of foresight and design, you get stuck with too much scared code to properly refactor. The tunnel to clean up the "mess" will be long and dark. Bugs may also be introduced, making it tempting to rebuild everything from scratch, something with its own caveats and troubles. If you lacked design the first time, how sure are you to be able to hit the nail the second time? It's costly and doesn't benefit from an iterative approach with rapid feedback cycle.

For small scripts / one-offs, globals and singletons are OK. Good coders know they're there, how to remove them, and nobody else are going to build airport traffic controller software on top of them.

Btw, encapsulating globals/singletons with OO CRUD or REST, doesn't make them any less distasteful. You end up exporting complexity to all the different parts of the whole codebase, instead of encapsulating behaviour within its own domain.


A simple, quick answer that I'm sure you heard is that global variables pollutes namespaces. The qualities of design choices are rarely apparent outside the real world.

A big problem with global is that it's often abused as a work around. Restricting access is an abstraction. The user isn't expected to alter this value, why should they be allowed to? What's more critical is the fact that the programmer might not realize what they wanted to be simply accessible is in fact static as well. So you now have a variable that's not only accessible, but state dependent. Now anyone using this variable has to be mindful of this.

Unless there is C code being called as well, in C++ you should rarely use global. It's much more manageable to have a game class object, where inside it, what used to be global could now just a private member that's global to that class.

You're team put in all that effort to remove global because it takes even more effort to get rid of all the trivial errors tied to the choice to begin with.

It all comes down to writing reusable code, objects that manage themselves. People shouldn't have to be cautious when reusing code. This doesnt only apply to other programmers but yourself 6 months down the line.


> It's much more manageable to have a game class object, where inside it, what used to be global could now just a private member that's global to that class.

That's still a global, except now it has lipstick.


If it's private in the class then why are you saying it's global?


"a game class object" is 99.95% likely to contain the whole game. Doesn't matter that it's labeled private, it's basically global to all the code of the ... game.


> 99.95% likely to contain the whole game

That doesn't sound like a wise assumption. It is common to have the actual engine of the game, and even the game itself separate from other architectural components.


At first blush a lot of games systems and code look like they're global but aren't really. If you think about a game as the flow through a frame you can break things down and it turns out a lot of things are not as global as you first think.

For a naive example game flow is basically:

- Get input.

- Update game state.

- Render.

If each stage only consumes data generated by the prior stage then it doesn't need to be dependent on how that data was generated. Nothing needs to be global in this case.

There's nothing inherently wrong with modelling this using globals though just that they require more discipline on the part of programmers to stick to the application design. It's sorely tempting to just reach in and tweak something when it's easy and then suddenly your entire application is a spiderweb of little tweaks. Not using globals and only having the systems and data available that you need to use makes the design harder to break and its much easier to detect the spiderweb creeping in.

This isn't limited to globals though, dependency injection, IoC and other application patterns suffer the same problems as well. Lot's of software ends up passing around a 'context' or injecting defacto globals everywhere which results in the same spider web except you can't even navigate the codebase sanely.

The problem with the spiderweb is that it's harder to maintain and can make things more difficult down the line if you want to re-architect things for example to make the game multithreaded.

More generally the harder we make it to mess stuff up the less stuff will get messed up and the easier it will be to find. That's partly why static types, lifetimes and immutability are popular. They of course come with tradeoffs in performance or ease of use that need to be weighed. Software design choices are just a less strict version of the same.


One fun class of bugs that occurs on 8-bit systems is when you have a 16-bit global variable (C makes this easy), and read access is actually 2 separate reads (one for each 8-bit part). This is invisable from the C code. Now lets say there is a separate thread or an interrupt that writes the variable in between the two read phases. Most of the time its fine, but every so often you get garbage (often double or half the value you expected).


The more local the state is, the easier it is to reason about since less code can reach it. And invalid state is a major source of bugs.

And the more global the state is, the less modular the system becomes, which makes it more difficult to test and adapt to new requirements.

That being said, it's more important to know why than memorizing/following rules, every good decision is some kind of compromise.


It's usually a sign that you don't have clear component boundaries or well-defined interfaces, which means your code is going to be harder to test and harder to debug. Every place you read and write global state is also a potential race condition in multi-threaded code.

Of course there are places where global state is unavoidable (even if it's just "the filesystem"), but by confining your global state to a small corner of your codebase and having the rest of the code interact with this component instead of touching all the global variables directly, you can reduce the number of potential problem spots.


Coupling. If any part of the program can touch a global variable, then the only way to understand how that global is used and when and why it changes is by understanding the entire program. Limiting the variable’s scope (e.g. to a module or class) makes it easier to reason about, as there’s less to learn and mentally model all at once.

Have you got Steve McConnell’s Code Complete? Read the chapter on coupling and cohesion. If not, you should. (You can nab a first edition off eBay for a few bucks.) Good for the “Why”s of software construction.


Global state == shared state if threading (and you probably will be eventually) == a mess.

Global state == lots of refactoring if you want to make your program a library (OpenSSH is a poster child for this).

Write it like it needs to be a library. Write it to be thread-safe. Write it to use async I/O. Do these things and you'll save yourself a ton of work later. Learn to do these well and you'll always do this from the get-go.


> Why are globals considered bad? I'm seriously asking.

I think outside of special cases they bad. I use globals for embedded code because I don't have a heap.

What I've found is as long as globals are used to hold state and not pass data via spooky action at a distance they're okay. A test is if you can trivially refactor them out then they're okay.

Example you have one uart.

   UartInit(baud_rate, bits, parity, stop);
Now you have two so gets refactored

   UartInit(port, baud_rate, bits, parity, stop);
Terrible is shit code like this.

   foo.bar = 2;
and somewhere else in the code

   if(foo.bar == 2)
   {
      foo.bar = 0;
      ...
   }
A note: Game programs to me look like really big embedded programs.


Games and UIs are special when it comes to state. They're weird in the space of all programs because their domain specifically concerns itself with maintaining and transforming a bunch of state over time. State is the point, in a way that it isn't for the vast majority of programs.

There are still lots of cases in games where state shouldn't be global, but there are also lots of cases where it's very natural and legitimate.


In addition to the reasons mentioned, one reason is that, by design, you can only have one instance of a global variable.

That might be fine today, but who knows what tomorrows requirements might entail.

At the very least, put global variables in a context object, and pass that around. Then it's clear what is affected by and can affect the "global" state, and it's easy to create multiple context instances if you suddenly find you need to.


In my view it makes the code extremely difficult to understand when someone other than the original author tries to read/modify.

The side effects of changing a global variables value is very difficult to glean from code.

It is as if some inputs to a function are getting passed to it implicitly, and it isn't obvious what value it has, who has set it, and what effect will be produced if you change its value.


Globals, similarly to "goto", are considered bad, because people tend to abuse them. But, same as with goto, they are not inherently bad and have their use. There are just lot of bad programmers who have been told that using globals (or goto) is dangerous and take it as "NEVER USE GLOBALS (or GOTO)" and spread this warped message further.


Probably it takes a lot of experience to use both correctly. If we are talking about a small programs, no threads, no chances of reuse (no modularity) - in other words "Keep It Simple and Stupid" - then it is fine.

But KISS is difficult to achieve: there's Hubris that pushes you to do "powerful" things rather than getting the job done, there's "anticipatis" that pushes you to have an answer ready for all future changes instead of solving the problem you actually have now, there's deadlines, and there's invasions of external unwanted complexities (silly requirements, interfacing with buggy software/hardware...).

That's why generally speaking "don't" is the safe piece of advice. But those who think they have the basics down can try it (in a harmless context like personal tools) and see what happens for themselves.


one example of good use of globals is for very light-weight pub/sub, where you keep the rule that only one place can write to the global (preferably with something like atomic write) and any other place only reads.


I use this kind of blackboard system still when I can't avoid globals. The main thing it helps with I find is you still have to know the order of and when your systems are being setup.

Ran into so many bugs from people creating static instance globals and thinking it was good that they didn't have to care when systems were setup.

I hate create on access with a passion. For god sake just new the damn thing at the beginning of main if nowhen else.


Singletons (globals) considered stupid, a Yeggie classic: https://sites.google.com/site/steveyegge2/singleton-consider...


If your game state is a global because every action in the game changes the global state, then that's great, your game will be alive. There are so many valid states and perhaps rather few invariants, or you are okay with invariants being enforced once every few seconds. You do what you have to.

Not every program is like this. Consider something like TeX, whose goal is perfect bit for bit reproducibility of documents across every run on every machine. Same with a compiler.

When you say globals, I imagine those kinds of programs having code like this:

    add_to_current_doc_index();
    fn document_map_overflow() {
        ERROR_CODE = 46;
        print_current_error();
        exit();
    }
    literally_every_function_could_write_an_error_code();
    do_not_call_another_one_without_checking_it();
    if ERROR_CODE != 0 { return ...; }
This is what XeTeX code looks like. You don't have to do it like that! You can write this:

    pub struct EnforcesOwnInvariants { private_data: [u8; 65536] }
    impl EnforcesOwnInvariants {
        fn get_first_something(&self) { ... }
        fn flush_cache(&mut self) { ... }
    }
    static GLOBAL_DATA: Mutex<EnforcesOwnInvariants> = ...;
The second kind is better, because you can at least say that any internal invariants in the global data should be upheld in very specific code.

But when you use the first kind, you're completely giving up on being able to point to the line of code responsible for global data having bugs in it. Obviously you can use globals without this problem if you encapsulate them effectively, but you'd need your language to "shepherd" you towards this. All the C codebases that had no such shepherding seem to end up looking exactly like this, and it is truly awful trying to find the source of bugs. You'll notice the languages that shepherd you away from globals (Rust) do so because they want your programs to work when you decide one thread is not enough. This has the side benefit of shepherding you away from global data generally, and mutability rules restrict which code can modify, so there is a huge impact overall on how you look for bugs.

Essentially, you're having the same discussion the original article is saying is fruitless. Globals can be good or bad! You can make them accessible everywhere without actually accessing them everywhere and causing debug problems. But do they make good code easy to write and bad code hard? Absolutely not. They are bad shepherds, pied pipers that offer you easy solutions that make your codebase worse.


You will end up with code like this:

https://github.com/elonafoobar/elonafoobar/blob/develop/src/...

A lot of those variables could have been grouped into a struct. Like all those key_<action> variables. Even if you think global state is fine you would only have one way of accessing it. It would be closer to this:

game_instance.key_mapping.charainfo

but I never see things like that. All I ever get to see is projects with almost thousands of global variables.


Globals are bad when they are used together with the include pattern. So you are reading code and see variable foo and have no idea what it does, cant find it when searching in file, then you find it in an include file two levels down. Try to refactor only to to find its used elsewhere too, and sometimes included twice, and sometimes overwritten (but you are not sure if that is a bug or not).


In any decent IDE the operations you mentioned are one keyboard shortcuts each - go to definition and find all uses. This is really a non-problem.


IDE's are good at treating the symptoms. But it's also possible to write the code so that you don't need an IDE to untangle it: For example keeping all variables within (file) scope, and abstracting out into reusable (reusable elsewhere) libraries.


do you never have to check for all the calls to a given function in large-ish programs ?


You can make functions pure and specific so they rarely need to change. And use name-spaces and naming conventions - so the variables can be found with grep (find in files).

Lets say you are upgrading an API, lets call it "HN", to a new major version, which has made a breaking change by renaming HN.foo to HN.bar. Now if you have always named the API "HN" you can just make a "replace in file" operation where you replace HN.foo with HN.bar - after you have already checked that there is no HN.foobar (to prevent HN.barbar)

Even sophisticated IDE's will have trouble following functions in a dynamic language that is passed around, renamed, returned, etc. So I would never trust an IDE to find all calls-sites.

Heavily depending on an IDE or tooling can also lead to over-use of patterns and boilerplate that the IDE handles well. And unnecessary work like adding annotations just to satisfy the tooling.


Not GP, but for most programs I write myself I cannot find all the call sites of a certain function because of using first class functions a bunch. When I worked in nginx I had a smaller amount of similar trouble, since nginx frequently but not pervasively uses function pointers to decide what to do next.


Globals are not inherently evil, but "shared, mutable state" is, basically if any part of the code is able to scribble over any global at any time.

If your globals are constants, or the globals are only visible inside a single compilation unit where it's easy to keep the situation "contained", they are perfectly fine.



We cannot discuss globals without pinning own exactly what we mean by globals.

Is a global a piece of information of which there is one instance?

Or is it a variable which is widely scoped: it is referenced all over the place without module boundaries?

See, for instance, in OOP there is the concept of singletons: objects of which there is one instance in the system. These objects sometimes have mutable state. Therefore, that state is global. Yet, the state is encapsulated in the object/class, so it is not accessed in an undisciplined way by random code all over the place. On the other hand, the reference to the object as a whole is a plain global: it's scoped to the program, and multiple modules use it. Ah, but then the reference to the singleton is not a mutable global; it is initialized once, and points to the same singleton. Therefore, singletons represent disciplined global state: a singleton is an immutable reference to an object (i.e. always the same object), whose mutable state (if it is mutable) is encapsulated and managed. This is an example of a "good" global variable.

Another form of "good" global variable is a dynamically scoped variable, like in Common Lisp. Reason being: its value is temporarily overridden in on entry into a dynamic scope and restored afterward (in a thread-local way, in multithreaded implementations). Moreover, additional discipline can be provided by macros. So that is to say, the modules which use the variable might not know anything about the variable directly, but only about macro constructs that use the variable implicitly. Those constructs ensure that the variable has an appropriate value, not just any old value.

Machine registers are global variables; but a higher level language mangages them. A compiler generates code to save and restore the registers that must be restored. Even though there is only one "stack pointer" or "frame pointer" register, every function activation frame has the correct values of these whenever its code is executing. Therefore, these hardware resources are de facto regarded as locals. For instance, a C function freely moves its stack pointer via alloca to carve out space on the stack, as if the stack pointer register belonged only to it.

Global variables got a bad name in the 1960's, when people designed programs the Fortran and COBOL way. There is some data, such as a bunch of arrays. These are global. The program consists of a growing number of procedures which work on the global arrays and variables. These procedures communicate with each other by the effect they have on the globals. The globals are the input to each procedure and its output. When one procedure finishes, it places its output into the globals, and then when the next one is called, it picks that up, and so on.

The global situation was somewhat tamed by languages that introduced modules. A module could declare variables that have program lifetime, but are visible only to that module, even if they have the same name as similar variables in another module. In C, these are static variables. C static variables and their ilk are considerably less harmful than globals. A module with statics can be as disciplined as an OOP singleton. The disadvantage it has is that it cannot be multiply instantiated, if that is needed in the future, without a code reorganization (moving the statics into a structure).


> See, for instance, in OOP there is the concept of singletons: objects of which there is one instance in the system. These objects sometimes have mutable state. Therefore, that state is global. Yet, the state is encapsulated in the object/class, so it is not accessed in an undisciplined way by random code all over the place. On the other hand, the reference to the object as a whole is a plain global: it's scoped to the program, and multiple modules use it. Ah, but then the reference to the singleton is not a mutable global; it is initialized once, and points to the same singleton. Therefore, singletons represent disciplined global state: a singleton is an immutable reference to an object (i.e. always the same object), whose mutable state (if it is mutable) is encapsulated and managed. This is an example of a "good" global variable.

Lol no it's not. It has all the problems of any other global: unsafe to use concurrently, difficult to test, difficult to reason about.


It's best not to conflate global variables and their problems with the issues of shared, mutable state.

The difficulties caused by global variable are related to them being shared, mutable state. But global variables are recognized as causing additional problems, in the context of programming with shared, mutable state. So that is to say, practitioners who accept the imperative programming paradigm involving shared mutable state nevertheless have identified global variables as causing or contributing to specific problems.

In an OOP program based on shared mutable state, singleton objects having shared mutable state do not introduce any new problem. The global variable they are bound to doesn't change, so the variable per se is safe.

(There can be thread-unsafe lazy initializations of singleton globals, of course, which is an isolated problem that can be addressed with specific, localized mechanisms. Global shutdown can be a gong show also.)

A singleton could be contrived to provide a service that is equivalent to a global variable. E.g. it could just have some get and set method for a character string. If everyone uses singleton.get() to fetch the string, and singleton.put(new_string) to replace its value, then it's no better than just a string-valued global. That's largely a strawman though; it deliberately wastes every opportunity to improve upon global variables that is provided by that approach.


I disagree; as far as I know the specific problems of global variables (over and above shared mutable state in general) are things that apply just as much to singletons. Things like absence of scoping, lack of clear ownership, and as you mentioned initialisation and shutdown, are just as much a problem for singleton objects as they are for non-object global variables.

Objects containing mutable state have some advantages over plain mutable variables (e.g. the object can enforce that particular invariants hold and invalid states are never made visible), but as far as I know those are just the generic advantages of OO encapsulation, and there's not really any specific advantage to encapsulating global variables in a singleton that doesn't equally apply to encapsulating a bunch of shared scoped variables into an object.


I generally strive to avoid singletons but there are cases of API usability where they're useful. If you can carve out the responsibility of what state is being tracked in the singleton then it's useful.

It's also not difficult to test as long as you write it to be testable. It may be more verbose & cumbersome but it's not actually difficult. That means you provide hooks testing the singleton implementation to bypass the singleton requirement but in all other cases it acts like a singleton.

As an example, consider Android JNI. The environment variable is very cumbersome to deal with in background threads & to properly detach it on thread death. It also requires you to keep track of the JavaVM & pipe it throughout your program's data flow where it might be needed. It's doable but it's conceptually simpler to maintain the JavaVM object in a global singleton and have the JNIEnv in a thread-local singleton with all the resource acquisition done at the right time. It's still perfectly testable.


> It's also not difficult to test as long as you write it to be testable. It may be more verbose & cumbersome but it's not actually difficult. That means you provide hooks testing the singleton implementation to bypass the singleton requirement but in all other cases it acts like a singleton.

At that point you're adding complexity that has a real risk of bringing in bugs in the non-test case. Nothing is impossible to test if you try hard enough, but the more costly testing is, the less you'll end up doing.

> As an example, consider Android JNI. The environment variable is very cumbersome to deal with in background threads & to properly detach it on thread death. It also requires you to keep track of the JavaVM & pipe it throughout your program's data flow where it might be needed. It's doable but it's conceptually simpler to maintain the JavaVM object in a global singleton and have the JNIEnv in a thread-local singleton with all the resource acquisition done at the right time. It's still perfectly testable.

Not convinced - to my mind the conceptually simple thing is for every function to be passed everything it uses. If you instead embed the assumption that there's a single global JavaVM that could be touched from anywhere, then that adds complexity to potentially everything, and any test you write might go wrong (or silently start going wrong in the future) if the pattern of which functions use the JavaVM changes (or else you treat every single test as a JavaVM test, and have the overhead that goes with that). For some codebases that might be a legitimate assumption, just as there are some cases where pervasive mutable state really does reflect what's going on at the business level, but it's certainly not something I'd introduce lightly.


> If you instead embed the assumption that there's a single global JavaVM that could be touched from anywhere, then that adds complexity to potentially everything, and any test you write might go wrong (or silently start going wrong in the future) if the pattern of which functions use the JavaVM changes (or else you treat every single test as a JavaVM test, and have the overhead that goes with that)

Not sure I follow. If you expect any code to invoke JNI then you are still responsible for explicitly initializing the singleton within the JNI_OnLoad callback. If you don't the API I have will crash so definitely not a silent failure. There's no external calling pattern to this API that can change to break the way this thing works. As for why this is needed it has to do with the arcane properties of JNI:

1. Whatever native thread you use JNI on, the JNIEnv must be explicitly attached (Java does this automatically for you when jumping from Java->native as part of the callback signature).

2. Attaching/detaching native threads is a super expensive operation. You ideally only want to do it once.

3. If you don't detach a native thread before it exits your code will likely hang

4. If you detach prematurely you can get memory corruption accessing dangling local references.

5. It's not unreasonable to write code where you have a cross-platform layer that then invokes a function that needs JNI.

If you're avoiding all global state you only have the following options:

A. Attach/detach the thread around every set of JNI operations. This stops scaling really quick & gets super-complicated for writing error-free composable code (literally manifests as the problem you're concerned about with code flow changes resulting in silent bugs).

B. Anytime you might need to create a native thread, you need to pass the JNIEnv to attach it. If the native thread is in cross-platform code suddenly you're carrying a 2 callback function pointers + state as a magic invocation as the first thing to do on a new thread creation & the last thing to remember to do just before thread exit. Also you have to suddenly carry through that opaque state to any code that may be invoking callbacks that require JNI on that platform. This hurts readability & risks not being type-safe.

At the end of the day you're actually also lying to yourself and trying to fit a square peg in a round hole. JNI is defined to use global state implicitly throughout its API - there's defined to be 1 global JavaVM single instance. Early on in Java days JNI was in theory designed to allow multiple JVMs in 1 process but that has long been abandoned (the API was designed poorly & in practice it's difficult to properly manage multiple JVMs in 1 process correctly with weird errors manifesting). This isn't going to be resurrected. In fact, although not implemented on Android, there's a way to globally, at any point in your program, retrieve the JVM for the process.

In principle we're in agreement that singletons & globals shouldn't be undertaken lightly but there are use-cases for it. It's fine if you're not convinced.


> A. Attach/detach the thread around every set of JNI operations. This stops scaling really quick & gets super-complicated for writing error-free composable code (literally manifests as the problem you're concerned about with code flow changes resulting in silent bugs).

Sounds like a monad would be a perfect fit, assuming your native language is capable of that. That's how I work with e.g. JPA sessions, which are intended to be bound to single threads.

> At the end of the day you're actually also lying to yourself and trying to fit a square peg in a round hole. JNI is defined to use global state implicitly throughout its API - there's defined to be 1 global JavaVM single instance.

Of course if you're using an API that's defined in terms of globals/singletons then you'll be forced to make at least some use of globals/singletons, but I wouldn't say that's a case of singletons being "useful" as such. And if you're making extensive use of such a library, then I'd look to encapsulate it behind an interface that offers access to it in a more controlled way (using something along the lines of https://github.com/tpolecat/tiny-world).


For many singletons it does not matter at all. E.g. 99% of all desktop gui programs and 99.9995% of games have a single main window by design - trying to abstract that with an API that simulates that you could have more than one just makes the code harder to read for no benefit (as no widget system except beOS' can be used outside the main thread anyways)


> E.g. 99% of all desktop gui programs and 99.9995% of games have a single main window by design - trying to abstract that with an API that simulates that you could have more than one just makes the code harder to read for no benefit

Being able to test UI behaviour is a huge difference maker. (Also even if you do believe that a singleton is ok in this case, it's clearly no different from a global variable).

> as no widget system except beOS' can be used outside the main thread anyways

Which is a problem in itself.


> Being able to test UI behaviour is a huge difference maker.

obviously UI tests are being run today so this is not really an issue, right ?

> (Also even if you do believe that a singleton is ok in this case, it's clearly no different from a global variable).

yes, that's global state all the same

> Which is a problem in itself.

maybe, does not prevent writing a lot of very useful apps.


> obviously UI tests are being run today so this is not really an issue, right ?

UI tests are notoriously slow, flaky and generally worse than other kinds of tests. They're absolutely a significant pain point in software development today.

> maybe, does not prevent writing a lot of very useful apps.

People write useful code with global state. People wrote useful code with gotos, with no memory safety... that computers are useful does not mean there isn't plenty of room for improvement.


Avoiding singletons in the app implementation will not put a dent in UI testability. If you instantiate the MainWindow as a local variable in the top-level function, and pass that object everywhere it is required as an argument, external testing of your UI is not any easier.


It's a step in the right direction, and it gives some immediate value: you can see which functions don't actually need the MainWindow and can therefore be tested conventionally (you might argue that those were never actually UI tests, but in practice you'll end up using your UI testing techniques for things that don't actually use UI if you can't tell), and you're nudged towards only passing it where it's needed; also you could try to mock or stub it, which might cover at least some of the simple cases.


Global variables are fine, mutable global state is considered bad style.


You've heard it hundreds of times over the course of your career and yet you never once questioned it? Either you're exaggerating to make a rhetorical point, or you have such an apathetic attitude towards the issue that you can't (or haven't tried to) reason about why it polarizes people.

Taking your comment in good faith, not all global state manipulation is equivalent. Depending on how you do it, structured global state manipulation could mean have you end up with something like Postgres, where you have orderly read and write interactions that you can reason about with set theory and transaction monotonicity. It could also mean something like using an in memory cache or session store to persist temporarily durable data. Any kind of structuring like this around what you can read and write and for how long gets you further away from the idea of globals, and that's the point. It's a tool that doesn't reward reaching for it prematurely.


I write or deal with a lot of C, unfortunately. I try very hard to not have global variables, and to minimize sharing between threads. When I pick up a C codebase, one of the first things I do is build it and inspect the object files to see what globals exist. The same can be done in C++, and should be. Use inheritance sparingly. Don't use exceptions if at all possible. Use modern C++ as much as possible, and borrow ideas from Haskell/Rust as much as possible. I'm thinking of https://stackoverflow.com/questions/9692630/implementing-has...


I am so glad when I got to use C, I already had a good school of modular programming languages behind me.

On my own projects, like university assignments, I would treat each translation unit as a kind of module, anything that for whatever reason could not be in a handle structure would be a internal static (years later I started using TLS instead), and in some cases incomplete structs as means to avoid the temptation to directly access internal data.


Game development is a rapid prototyping adventure that is fuelled by the fact what you are producing is ultimatly a form of art. Architectures are based on abstraction, and abstraction is ultimately mindful ignorance; in this case of specific requirements or specific goals which are going to change because you are creating art. You are going to find out as you continue to develop that technical debt builds because the changing requirements create conflicting workflows which is why you get spam in the header. It's a lot faster to prototype something through duplication, cobbling, or refactoring then later on use automation to remove the chunks of code that are not used and reduce line count by creating utility functions because at that point, part of the project is set in stone and the project is going in one direction. Things will gyrate back and forth between messy and clean, and hopefully you have the budget to refactor to clean before you ship as modders don't like dirty game code.

Games are a simulacrum of reality and reality doesn't say properties of two different objects can never, ever, interact with each other; that's why you have the abuse of global variables to store state and also why there's a rich speedrunning community using all sorts of hacks in games to speed up their playtime due to unforseen edge cases. If you build a model of reality, you're going to be doing R&D learning how it interacts with itself, just like we do today!

Nobody wants to play a game with a static workflow.


What stands out is how apologetic you are for pointing out that a language might be worse (gasp!) than another language. When did this "all languages are roughly equal, and if you say anything else you're a zealot" ideology get so widely entrenched in our industry?


As someone who came up as a C++ game dev I just ran into tons of people that acted like people using managed languages were automatically inferior programmers. Even imbibed of this belief a bit myself.

This was a view purely sourced from ignorance. There were people creating awesome things with Java and Python at the time that I and my contemporaryies could probably never coded up.

It was quite embarrassing when I came to realize the combo of ignorance and arrogance I was working from. So now I tend to bias toward assuming most languages people are working with are useful and warrant some amount of respect. I try to only criticize languages I'm extremely familiar with and have had the opportunity to see bad patterns repeatedly emerge from in a variety of code bases.

Basically, I think we can call some languages "bad" or "good" it just takes a lot of evidence and I'd rather avoid ranking them altogether.


> I also understand there are situations where despite its shortcomings it is the right choice.

Would you say the reason for choosing it are not inherent to the lang itself but to things like: experience of the team, availability of libraries/ecosystem, need for mature/fast compilers?


Can't speak for parent, but in our case it's the only choice with zero-overhead abstractions and good cross platform support (Obj-C++, Android NDK, WebAssembly, Linux for tests). I wish Rust were there, but it's not.


Exactly. I remember having discussions with folks about using D for games like 10 years ago and it's never gotten there either.


That's a great metaphor for language smells! Some more anecdotes:

- Python shepherds you into using list comprehensions, even when it's almost always premature optimization and much harder to read than a loop. As a language smell that's not bad, it's just the worst I could think of in a post-v2 world. Luckily there's `black`, `flake8`, `isort` and `mypy`.

- Bash shepherds you into using `ls` and backticks, useless `cat`s, treating all text as ASCII, and premature portability in the form of "POSIX-ish". Luckily `shellcheck` takes care of several common issues.

- Lisp shepherds you into building your own language.

There's also tool shepherding:

- IDEA shepherds you into refactoring all the time, since it's the only IDE which does this anywhere near reliably enough. (At least in Java. In other languages renaming something with a common name is almost guaranteed to also rename unrelated stuff.)

- Firefox shepherds you into using privacy extensions.

- Chrome shepherds you into using Google extensions.

- Android shepherds you into installing shedloads of apps you hardly ever use.

- *nixes other than Mac OS shepherd you into using the shell and distrusting software by default.

- Windows and Mac OS shepherd you into using GUIs for everything and trusting software by default.


> Lisp shepherds you into building your own language.

the feeling of the racket community is that you build a DSL or DSLs in your code all the time, in any language, so why not take it seriously and codify your DSLs?


My personal feeling about this is that shared base language mecanisms such as functions, classes, control structure, interfaces imports, properties etc allow you to reason locally about some files. You don't need to read the whole library in order to understand one piece of it.

With macros this goes out of the window. You have to read all the custom macros before you can understand what is their behavior.

With haskell the same issue exists with complex monad stacks and control libraries like lens.

The ability to analyse a tiny piece of a big system is a major factor in building those in a manageable way.


The beauty of macros is that you can just expand them in-place and read the expanded code.

Of course, for really complex macros the expanded code might be hard to read, but I guess that means "write nice macros".


I dont think list comprehensions are used to improve performance. One reason to use them is to improve readability, as the execution doesnt jump around with continue/break etc.


> - Android shepherds you into installing shedloads of apps you hardly ever use.

I disagree with you on this one but I could be in the minority. I have about 8 apps that I trust and that rarely change. I don't go looking for new apps to install and I resist attempts to use the app version of a website.

> Lisp shepherds you into building your own language

I see the propensity of developers to build DSLs in all languages. I think the act of programming shepards us into creating elaborate abstractions.


> and I resist attempts to use the app version of a website.

If you weren't being shepherded into installing apps, why are you resisting?

I could install apps on my old Nokia dumb phone, but I never resisted installing apps. It never really seemed like it was worth the trouble to install one.

I actually looked into it once, despite the system shepherding me away from installing apps.


Bash shepherds you into ignoring errors and using maybe-blank values everywhere.


You can improve that situation somewhat by starting scripts with `set -o errexit -o noclobber -o nounset -o pipefail` and `shopt -s failglob` to fail fast.


Yes, this makes things better.

But then you will have to deal with the non-negligible amount of programs that use the exit value to return information. It is still better to place exceptions on failing code than to ignore errors that can wipe out your entire system, but it's not really good either way.


I never missed any kind of IDEA features in Eclipse and Netbeans, including refactoring.

And yes I do know InteliJ, as I have to put up with it on Android Studio.


>since it's the only IDE which does this anywhere near reliably enough.

Have a look at the Language Server Protocol implementations for java (eclipse jdtls and to a lesser extent boot-ls). Not all the features from IDEA are there, but the gap is closing.


> Python shepherds you into using list comprehensions

I'd argue LCs have more limits than loops, and hence are easier to read (because you can make more assumptions about what it does). That said, I find nested comprehensions harder to read.


> Windows and Mac OS shepherd you into using GUIs for everything and trusting software by default.

Not sure how familiar you are with modern Windows, but you can try powershell. You can manage everything in Windows with it, and it’s a very cool shell language with a lot of feature.

(It’s also open source and multi platform)


I agree with what you said, but I also agree that Windows shepherds you into using GUIs for everything. It's possible to use powershell, but Windows makes GUIs seem like the "natural" way to do things.


Do you have any good sources for places to start learining powershell? I'm pretty comfortable in bash, but powershell scares me.


As someone who used PowerShell for a year and has used Bash for 10+, PowerShell is much less scary than Bash. The scariest thing about PowerShell is .NET, which (at least 8 years ago or so, so cue someone correcting that below) had extensive but often low quality documentation, with undocumented features you basically had to use to write useful code, uselessly trivial code examples (think "`0 + 0 == 0`" as an example of arithmetic) and some bad names. That's not to say Bash is better, just that they still had some way to go.


Microsoft online documentation is very good. I initially spent just a weekend trying to create a nice prompt, just because I was annoyed by the ugly “PS >” when using Windows. The online documentation had everything I needed, and I enjoyed so much that I switched all my systems to using powershell as their default shell.

You can start at https://docs.microsoft.com/en-us/powershell/scripting/learn/....

A quite nice feature is the command Get-Help:

Get-Help <a command> -Online

That will direct you to the documentation page specific to a command. That way you can discover the environment little by little by experimenting.

Also, the auto completion for all commands and their arguments is helpful to learn what is possible.

Also, if you want to keep your bash habits, be sure to install powershell 7, and enable the emacs edit mode (which is similar to bash defaults, with C-a, C-e, etc):

Set-PSReadlineOption -EditMode Emacs

https://docs.microsoft.com/en-us/powershell/module/psreadlin...


The term for this in the field of human machine interaction is (perceived) affordance, popularized by Norman.

This sounds like less active guidance than nudging or shepherding. Creating affordances is still an active design choice though.

https://en.wikipedia.org/wiki/Affordance

http://johnnyholland.org/2010/04/perceived-affordances-and-d...


Not sure the concept applies cleanly here. Affordance is about perceiving possibilities from an interface or environment.

For the subject of programming language idioms, the main factors are restrictions or patterns the language offers, and how naturally they fit within their context, not just perception by the user.


I’m curious, how do you make the distinction that restrictions/patterns/context are not just as much perveived properties of the interface to the device you are programming?

Are you perhaps perceiving perception as just visual perception? (See what I did there? :)


Paul Dourish reviews some work on this topic in his book Stuff of Bits, see pages 8-9: https://mitpress.mit.edu/books/stuff-bits

It's related to, but definitely not the same as, linguistic relativism. Programming "language" might be a bit of a misnomer, because it creates a false equivalency to natural language. Just as different subfields of mathematics were created to solve different problems, so too were different programming "languages" inspired by different subfields and their notations. With that view, it's unsurprising that some ways of doing things highlight certain methods of solving problems and obscure or impede others.


Working in a Java/Kotlin environment, everyone always handles all null cases when working in Kotlin, but they are frequently overlooked in the Java applications. Many of the Java apps compensate with more levels of catch-all exception handlers targetting unexpected NPEs. The only time we get NPEs in Kotlin is when Kotlin allows them because of the Kotlin/Java interop problem.

Working with Javascript/Typescript, we need to rely on linters to enforce safe practices in Javascript.


Something i like about rust is it shepards you to fast running programs and away from null pointer errors.

something i like about go is it shepards you to write code any other go programmer can follow easily

something i dislike about c# is it has the tools to let you write very very fast code but shepards you to use non devirtualized interfaces over heap allocated classes tied together with linq overhead.


> something i like about go is it shepards you to write code any other go programmer can follow easily

Sure, the syntax and indentation levels are all the same, but that's not really the difficult parts of programming. The difficulty comes from abstractions, indirections and other abstract things that Go, just as any language, let's you do however you want.

There are of course codebases made in Go where the indirections makes no sense and are hard to follow, just as in any language.

What Go shepards you into is to make really verbose code (well, compared to most languages except Java I guess), where everything is explicit, unless hidden by indirection. This is both a blessing and a curse.


Go limits the number of available abstractions--yes at the cost of verbosity--but compared to a language like C++, it's minuscule in its size. The end result is that you can keep the entire language in your head, and you don't have to go digging into the bowels of the internet to figure out how "turing complete template metaprogramming" works since the last developer decide to use that cool feature they just discovered.


> Go limits the number of available abstractions

Again, it doesn't, as abstractions are not built from syntax of the language but from the indirections developers create with the syntax provided. I agree that the abstractions the standard library provides are smaller than in other languages, but outside of that, anyone can create their own abstractions (as it should be).


I would say Go doesn't limit the number of abstractions, it limits the number of ways in which abstractions can be created. Or maybe better put, it doesn't provide a very rich set of abstraction facilities.


Anecdotally I always find Go code bases very easy to read compared to almost any other language. Is it not reasonable to assume this is because of "shepherding"?

Otherwise other effect would cause this? Perhaps my sample is unrepresentative, or perhaps Go programmers are somehow more competent?


Do you find it easier to read or to understand? I find Brainfuck code incredibly easy to read (there are only eight characters/commands in the language!) but almost impossible to understand.


I find it both easier to read and to understand.


That is the beauty of languages like C#, productivity and security first, provide the tools to go down the performance well if actually needed.


To allocate things on stack you have to either use only value types or use unsafe code. Which is fine for small performance critical sections but will introduce bugs and hinder productivity much if used for large code bases.


You no longer need unsafe to stack allocate if using one of the Span types.


I spend a fair amount of time in C# and don’t think about performance a lot unless it’s obvious, O(N^2) type of stuff. I’m always trying to level up so I would appreciate some tips.

What tooling are you referring to that will make C# really fast?

Also, what are you referring to with non-devirtualized interfaces vs heap classes with LINQ?


He talks mainly about stack vs heap allocation. Stackalloc, pointers and such.


What's the best way to get started becoming a C# developer? What kinds of C# programmers are there? I know fullstack web stuff and some Python, but no idea where to start with more proper languages.


There's a crazy amount of high quality guidance out there, a mere Google away. It's a popular language!

Step one: Visual Studio Community Edition is free. Go install it. Don't waste time with Visual Studio Code, it's a toy compared to the proper VS.

When I learn a language, I like to start with the basic primitives: functions, loops, variables, etc...

Then, explore the types and the standard library. Note that the .NET Framework has a fantastically huge base library, way bigger than other languages with the exception of Java. It's already a bit of a task just to flip through the list of available classes let alone functions!

Do programming challenges like Advent of Code.

Start poking away at Real Problems.

Go from there...


C# does these things out of the box:

- Desktop application development (Windows and macOS)

- Web Development (API's with .NET Core, traditional Rails-style stacks with ASP.Net)

- Game development (Unity)

If you want to get into C#, traditionally, these are the 3 things you can pick up and find lots of resources on.

Keywords to search on:

- Desktop Apps: WPF, XAML, Xamarin

- Web Dev: ASP.Net, .NET Core, Blazor

- Game Dev: Unity Game Engine

Good luck and happy learning/hunting!


It is the language plus the community. And not just the language.

As an example, there is nothing about Ruby that makes it more or less prone to monkey-patching than many other dynamic languages. But once a certain number of popular frameworks did that, there was no getting away from that. (Rails even has a convention around where you put your monkey patches.)


> there is nothing about Ruby that makes it more or less prone to monkey-patching than many other dynamic languages.

Python disallows making changes to fundamental types like `int` and `list`. It’s not possible for a Python framework to support something like Rails’ `2.days.ago`.

Interestingly, I don’t think this was an explicit decision made when designing Python - it’s just a side effect of the built-in types being written in C rather than in Python itself.


> Python disallows making changes to fundamental types like `int` and `list`.

Not directly via Python but you can achieve if it you really want https://github.com/clarete/forbiddenfruit :D


True, more things are objects in Ruby than Python. But you can still monkeypatch classes in Python.


For anyone (like me) who doesn't know what monkey patching is, wikipedia says it is "dynamic modifications of a class or module at runtime, motivated by the intent to patch existing third-party code as a workaround to a bug or feature which does not act as desired"

https://en.wikipedia.org/wiki/Monkey_patch


For anyone further curious, this is how python's `mock` module works - structured application and removal of monkeypatches.


You're talking about ActiveSupport probably, and I really love it. It augments Ruby in a very beautiful way.

I really like 2.days.ago

There were zero times were I wished it wouldn't do that.

But to each his own.


Supporting postfix operators, extension methods, etc. is different from monkey patching. Scala has "2 days ago" conversion to Duration type via implicits, but it doesn't monkey patches Integer for that. C# has extension methods and it can do the same without monkey patching.

And monkey patching is not necessarily the end of the world, it's just error prone if multiple libs try to do it (on the same targets) without being careful.


Never said you can't reach the same outcome with other languages. I guess C# can do DateTime.Today.AddDays(-3). It's all turing complete, so yes, you can do it in assembly as well.

I just really love Ruby's (Rails actually) 2.days.ago and never saw the harm in that. It's readable, easy and nice.


Readable yes. I don't feel it is writable. Discovery only happens when you read what someone else did or you guess unless you read over the entire docs related to $thing. It is a major reason I dislike Ruby as these things are encouraged. In testing, it is even worse.

Made up DSL: Expect(myFunc).WithParams(4).To().BeCalled(1).Returns(8).InUnder(2). Milliseconds().WithEspilon(30).Microseconds().

You have to be fully familiar with a custom DSL to write this kind of test. I prefer Go's principal of "tests are just code." It is more verbose to write, but no custom DSL to learn. The more verbose code is often not harder to read either as it is the same code style as the application too.


You'd be lucky if it was all just a chain of methods. In reality one of those links would be replaced by space. Which is an implicit method call. (Invisible parenthesis)

I find this infuriating because the DSL isn't discoverable. You have to know what each method returns. Is it self, or is it a value.

Spot the difference: Expect(myFunc).WithParams(4).To BeCalled(1).Returns(8).InUnder(2). Milliseconds().WithEspilon(30).Microseconds().


Why isn't it discoverable? You know what type is returned by "To()" and if that has a BeCalled() method you can call it :)

Now, the problem that requires an IDE, that many curse Scala for comes in when you have implicit things. So even if "To()" returns a WhateverTestDslElement that might not tell you anything because there might be a hundred implicit functions that convert WhateverTestDslElement to OtherFancyDslThingWithTheMethodYouAreLookingFor.

Of course, at this point you are basically back to reading the documentation, which is kind of what C programming looks like.

That said efficient and useful and really "domain specific" layers are great, and many systems would benefit from one. (Which is sort of what the whole clean code "entities" is about - https://blog.cleancoder.com/uncle-bob/images/2012-08-13-the-... )


Ruby's default test suite is minitest, not RSpec. I agree that RSpec can be overkill sometimes. And well, libraries and frameworks expect you to go over their docs if u wanna use them, that's kinda universal. Can u write a Spring MVC app without going over he docs?


Ever encountered code where you have to load 2 modules in a particular order or else they don’t work? Or maybe you didn’t figure out why a module that works fine for others doesn’t for you.

Monkey patching is one of the top causes of that. And is part of why Ruby projects tend not to scale in complexity as well as Python.


It's quite rare to be honest, so no. And I've been doing Ruby for a long time. It's not like every Ruby developer out there tries to monkey patch esoteric things just to confuse the enemy. And again, monkey patching is possible in every dynamic language.


I"ve seen a developer break 20000 websites at once because he monkey patched a jquery method in a widget our company built, so lets not make this out to be a Ruby thing.


If it is rare, then I must be unlucky. Because I'm not a Ruby developer, but I have had to help people diagnose exactly that problem more than once.


I like the concept and I particularly like the way I feel nim shepherds me:

* I very rarely need to come up with a name for a function or other identifier. The correct name can be reused for multiple use cases thanks to the type system and prox overloading. * to spend a little time designing the interface before jumping in the code * but also to think what I really need to accomplish and get to it instead of building a grandiose architecture * to have consistent apis * to steer away from OOP * to rely on minimal dependencies and to be kind of minimal in general * to use the correct tool for the problem (macro are not easy to write and that’s good otherwise you will abuse them. Instead they are great to use) * to build main table code * ...

I would be interested in what other nimmers think are the good shepherding.

One might also think of what is bad shepherding of nim, although nothing comes to mind at the moment.


> * to use the correct tool for the problem (macro are not easy to write and that’s good otherwise you will abuse them. Instead they are great to use)

Just wanted to add: Nim has Macros (which are comparable to lisp forms) and Templates, which are closer to C and Lisp macros and are much harder to abuse; It also has built-in inlines and generics, which are essentially the prime use for templates and macros in languages that lack those.

It also has stuff like operational transforms, which lets you state things like "whenever you see x*4 and x is an integer, use x shl 2 itself", so that you CAN apply your knowledge in the places where you know better than the compiler, while still writing the code you meant to write (multiply by 4) and not the assembly you wanted to generate.

Right tool for the right job is a very good description, which I don't think many languages can claim - definitely not minimalist ones like K, nor kitchen sink ones like C++ (no, an accidentally turing complete template expansion system is NOT the right tool for compile time computation).


The author is fairly insightful on various human behaviors in computing systems.

You might enjoy his Mesonbuild build system[0], which has a full manual [1]. It's already used in bunch of high visibility projects [2].

[0] https://mesonbuild.com/

[1] https://meson-manual.com/

[2] https://mesonbuild.com/Users.html


When I program in C++, there are lots of things one must consider. Should this be const, public/private, virtual, should I create a class, should I first create an abstract base class, should I create a factory, should I implement the PIMPL idiom, should this be a template function. The list of concerns is nearly endless. When I write in Python I tend to mainly think about solving my problem. In C++ I will naturally think more about performance and in Python that concern comes only if something seems slow. I make no claims about which is better, just that the language definitely affects me and the approach I take.


This also can change over time. For example, 15 years ago PHP shepherded you to include every file you were using explicitly, making it hard to reason about a given project if you weren’t the creator.

A big effort ensued to change that — class autoloading became the standard, and a large community arose around that standard.

Similarly, JavaScript shepherded you towards some bad practices that the community has now found ample remedies for.


> This also can change over time. For example, 15 years ago PHP shepherded you to include every file you were using explicitly, making it hard to reason about a given project if you weren’t the creator.

Huh, that seems backwards to me? Wouldn't the explicit approach make it more obvious what scripts were relevant?


That's been my experience; in the before times, I could step in and look at a server with issues and easily find the source (as long as nobody was too clever about paths), with autoloading I just have to grep the whole thing and hope.

I don't like a lot of what 'Modern PHP' has become though, so clearly the community is going a different direction without me, and I guess it's working for them.


As someone who has written a lot of PHP and is still maintaining a few PHP codebases... I don't miss the old ways at all. But I can see how the parent comment can be misleading for someone who's not too familiar with PHP (apologies if my assumption is wrong).

TL;DR in many ways modern PHP is more explicit than old PHP

The old way wasn't as explicit as the comment makes it. It was just painful. Imports in PHP are global, so every file can use any function/class already imported. You could explicitly define all dependencies at the top, but that's not what the language shepherds you to. In reality most imports were implicit (explicit in ANOTHER file), and you'd just import what you needed that hadn't been already imported (I don't think I've ever worked in a codebase where every file has all dependencies declared at the top). The old ways had so many downsides:

- moving things around could produce fatal errors just because the implicit imports changed

- any framework/cms using require instead of requice_once would limit your ability to import files (including the same class twice creates a fatal error)

- very poor support for IDEs

The modern way is quite pleasant to work with:

- everything should be in a namespace

- every used class should be declared at the top of the file (like Java)

- great IDE support (auto imports, auto complete, click to go to definition)

Yes there's some extra complexity in the autoloader, but in my experience it's negligible. If you use an IDE you might never even see it.

So yeah I don't think PHP got any less explicit. The opposite actually, modern libraries/frameworks tend to be much easier to navigate.


Yeah, my explanation wasn't good. Yours is much better – thanks!

If anyone's curious what the old way looks like, have a look at WordPress's codebase – lots of imports scattered around everywhere with no one single class-loading system.


Yes, modern PHP is different and better than the PHP everyone cranked out in the 00s, but auto-importing is orthogonal to the improvements. The big change is to use classes everywhere.


Using classes everywhere was made much easier with the auto-importing.

In fact the auto-importing means that regular functions (which have no auto-importing mechanism) are now used less than they probably should be.


Oftentimes not, because you'd end up importing everything as a precautionary measure, and there was a big temptation to mix together many classes and functions in the same file.


Yes! Another example.of this is C# and F#. Both build on the same .net base and both are Turing complete languages with elements of OOP and FP, but wow, the code that those communities write are completely different. They each steer you in different directions.


Usually people say language X "encourages" you to do Y. or "biases", but "shepherds" is a fine word too.

C, C++, and Rust shepherd the programmer to use array of structure memory layout. (though it's usually not great for vectorization)


Do you have examples of languages which don't encourage you to use array of structures?


An imo really nice but lengthy explanation of this is given by Robert Martin here: https://cleancoders.com/video-details/clean-code-episode-0

He calls this restriction by paradigms.


Thank you for sharing this. This is a much more thoughtful overview of the topic.


Is shepherding present in every aspect of life?

My guess would be that it is and that shepherding is what we talk about when we say that we can learn from every aspect of life.

If what I wrote is correct, than shepherding is the teacher of reality. But I guess it's on us to decide when we've learned enough and move on.

I was at first wanting to ask if shepherding is present in video games as well, but then I realized what shepherding could actually be.

SHEPHERDING; The part of an aspect that _can_ teach you something.

_can_, because it's up to you to decide if you'll learn anything.

Is shepherding always negative or can it be positive?

Also, if we would always strive to fix what shepherding teaches us, would that mean that in infinite amount of time we would reach perfection?

And, last question I swear, is shepherding subjective or objective. Or is it both?


Shepherding is when some behavior is encouraged by being made easy/rewarded, or by other behavior being made harder/punished. So shepherding absolutely happens in every aspect of life - systems move toward low energy states, water flows downhill and living creatures tend to follow the path of least resistance. When parents do it, we call it parenting. When governments or companies do it (via taxes/subsidies or pricing), we call it nudging. When groups do it, we call it socialization.

It's a really useful tool to analyze systems and organizations with - not by looking at what they make possible, but also by what they encourage and discourage - most of the time, the latter are what really matter (the average case is usually what determines the long-term impact of something, not the best or the worst case). When Netflix has auto-play at the end of a stream, they shepherd you toward binging. When Animal Crossing has things you need to wait wall-clock time for, they encourage you not to binge. When free-to-play games come with loot boxes, they don't force you to do anything, but they might still be shepherding gambling-like behavior.


https://en.wikipedia.org/wiki/Nudge_theory

"Nudge is a concept in behavioral science, political theory and behavioral economics which proposes positive reinforcement and indirect suggestions as ways to influence the behavior and decision making of groups or individuals. Nudging contrasts with other ways to achieve compliance, such as education, legislation or enforcement. "

"A nudge makes it more likely that an individual will make a particular choice, or behave in a particular way, by altering the environment so that automatic cognitive processes are triggered to favour the desired outcome."


This is also why CMake, even Modern CMake, is so bad. You have to fight tooth and nail for something even remotely maintainable.


Title reminds me of "guide you to the pit of success" (ie, a slippery slope w a positive ending), which IIRC I first encountered in a post by Zeit cofounder G Rauch, writing about NextJS.


That's exactly what I thought about. I think this is the blog post that made this phrase popular: https://blog.codinghorror.com/falling-into-the-pit-of-succes...


This is exactly what I thought of while reading the OP article. I love this metaphor and use it often


" Yet, in practice, many Perl scripts do XML (and HTML) manipulation with regexes, which is brittle and "wrong" for lack of a better term. This is a clear case of shepherding. Text manipulation in Perl is easy. Importing, calling and using an XML parser is not."

Importing, calling, and using an XML parser is straightforward in Perl. The same for HTML: think about what you want to process, what you want to ignore, and write your callbacks accordingly.


This struck me as clueless part of the article.

Perl coding was a significant portion of my career, and much of that involved web services, involving XML. I've never come across code that attempted to parse XML with regexes in Perl. It was always done with easy-to-import-and-call XML DOM or SAX parsers.


You read the code. The computer program you.


There is probably more truth in this than people realise at first glance.


This is a very good take. I've had a whiff of this idea for a while now, but never fully-formed it. As the author says, it's something that's been missing in lots and lots of language and framework debates and has caused people to talk past each other over and over. It's also a very helpful tool when examining why the cultures around certain technologies are the way they are.


I believe these things, and is why I think different programming languages are good for different purposes and are why I use different programming languages rather than only one kind. (For example, Haskell isn't a programming language I use much although I do use it sometimes because sometimes what I make is what it seems like good for to me.)


Haha, the idea that a "proper" scripting language is more portable than a shell pipeline is amusing. Sure it is, if you can guarantee everyone has the same version of Python, and all the right libraries at the right versions, etc. But if you can do that, you can probably have them all install the GNU userland on their Macs, too.


Be the shepherd, use Lisp :D

Seriously though, what other languages other than Lisp (all the mainstream ones at least) give you the freedom to change the language and/or create DSL's with the same ease? And you can still do your 'bare metal' in C if you really really need to and bring it in.


I actually think the OP perfectly explains the core problem with Lisp: Part of the value of shepherding is getting everyone on the same page. When everyone's their own shepherd, nobody is on the same page.


Javascript is this on steroids: It wants to be inspired by Lisp, by Self, by Java, and the resulting melange smells like every paradigm while fitting none of them. Trying to write applications in Javascript is trying to corral the minds of hundreds of people who each had a different idea of what Javascript is and what it does and what it wants you to do. The resulting system is doomed to total incoherence.


JavaScript started out as a sort of "Java-flavored Scheme", actually

However, I'd say that modern JavaScript - the language - is much more like C++ than Lisp. It isn't a void of shepherding, so much as shepherding-by-committee.

However, due to how easy its ecosystem has made package management (in contrast with C++), much of that quagmire has been papered over with much more strongly-opinionated (shepherding) frameworks and dialects. This hasn't completely solved the "C++ problem", but it's gone a long way towards mitigating it. Working in modern JavaScript may be wildly different between frameworks, but it's reasonably consistent between projects that use the same frameworks.


In other programming languages there is a consistency between frameworks and a default way to do things: there's no shitload of frameworks each reinventing the same wheel.

Some things are easy doable without frameworks.

In Javascript land you don't learn the language, you learn frameworks. And, some "JS frameworks" like Angular do not even promote JS, due to how terrible the language is.

If your current framework goes out of fashion in favor of the next shiny thing, you are out of luck.

I am unfortunate enough to having had to learn one of JS frameworks because our web apps are APIs in the backend and I have to do the frontend too and people who started the projects were fans of a particular JS framework.

For upcoming projects I'll use Blazor after it becomes production ready, no more JS frameworks for me.

I don't dislike JS, I quite enjoyed old-school ES6 + jquery. I even like Vue because is very customizable. I have a strong dislike for big opinionated frameworks like Angular.


What's mostly resulted in Javascript land is people using opinionated (and often severely constrained) boilerplate generators like create-react-app, with opinionated linters like Prettier (alongside transpiling Javascript from other languages like TypeScript), that force a coding still despite the myriad ways to achieve things - I've actually found that to mitigate a lot of these issues well, but it can't deal with the boatload of "bad" advice online to wade through.


Even with opinionated generators, what happens when you pull in a library that comes from a very functional mindset, a second that treats JS like Smalltalk, and a third that was thrown together by a novice and has become the de facto standard for its purpose?

The issue doesn't just lie within the baseline of the code you write, but in how many disjoint dialects that code must interact with and partially conform or contort to in order to engage with.


If you have a word to say and if you have to use a framework, take a look at Vue. It's not opinionated, it doesn't get in the way, it doesn't come with the kitchen sink.


>> When everyone's their own shepherd, nobody is on the same page.

I think that's more of a management problem than a language problem. cheers


It's been said, "Lisp takes things that are technical problems in other languages, and turns them into social problems." Don't underestimate the human factor of team-based development.


Haskell, OCaml, probably F# too.

Can also do 'bare metal' as well.

I think the advantage to Lisp is that the programmer can generate and evaluate arbitrary expression trees at run time.

I'm not sure about the others but I recall Haskell has some difficulty with this. It's possible but it's not supported and not trivial to do.


>> I think the advantage to Lisp is that the programmer can generate and evaluate arbitrary expression trees at run time.

That's the one! Code is data - data is code. While this can be done in other languages, it isn't done without considerable effort or going 'off road' so to speak, macros are Lisp. cheers.


Isn't eval part of the cause of many security holes in other languages, e.g. Javascript? That is, not eval alone but combined with an unintended path from unvalidated user input to the snippet that gets eval'ed.

Is there anything special Lisp does to prevent user input from getting into eval'ed data-code? Or any sandboxing provided by Lisp's eval?


You never ever eval user input.


I thought this was super common knowledge, yet I reviewed a co-workers pull request the other day and found files filled with eval (easily 30 usages) as well as global state (treated and mutated as if it was thread local).

The most annoying part was that what they wanted to do didn’t even require eval and they refused to fix it even after I’d found a safe, non-eval way to do it.


This rule is true for Javascript and PHP too, yet it happened all the time. If Lisp relies heavily on eval, what specific protection measures does it employ?


Mainstream Lisp dialects do not "rely heavily" on eval. Its use is broadly discouraged in favor of other mechanisms like apply or macros.

Newbies sometimes learn about backquote before learning about apply, and when they need too pass a dynamic list of arguments to a function, they end up writing (eval `(fun 1 2 ,@args)) instead of (apply fun 1 2 args). Or doing some metaprogramming using (eval `(defun ...)) in the middle of a function, instead of making a function-defining macro.

In JavaScript and PHP, not to mention numerous other languages, eval is the only meta-programming you have. If you need to generate code, you end up using eval. Moreover, eval is textual. Textual code requires very careful escaping to avoid injection problems. Lisp's eval is AST-based, so it doesn't suffer from that.

If I have a user-data variable that holds untrusted user data and put it into an evaluated code template, as in (eval `(list ... (some-api ',user-data) ...), I can completely trust that user-data will not "go live". It's inserted under a quote, and that's that. There is no way that content can bypass the quote, no matter what object it is.


> Haskell, OCaml, probably F# too.

Lisp has macros, read-macros, and eval. These things enable DSLs. You're saying strongly-typed functional languages have features of equivalent power for this purpose?


Yes, for example F# has quotations, type expressions and type providers.

Also .NET has attributes, compiler plugins and expression trees, which allow to do some kind of similar stuff even for C#, although not as straightforward as in Lisp.

On the Java side, you also get the attributes (aka annotations), compiler plugins, AOP (yay CLOS interceptors).

Haskell has Template Haskell and OCaml has ppx extensions.


Those are so distant to what Lisp macros achieve. Also, I can't believe you brought up Java annotations in this discussion.


They are, but as they say back home, those who don't have dogs, hunt with cats.


Writing EDSLs (E for embedded) is fairly common in Haskell. I don't know if it's as easy as with Lisp macros though since I have no exposure to that.

Many DSLs are written on Haskell (Elm and Purescript come to mind) as well.


It's all about ergonomics though. In Lisp, it's super easy, and even fun with macros. They can probably do roughly equivalent things in the other languages, but it's so fucking painful it's relegated to rare use and for relatively short code sections.

For Lisp users, it's like holding the knife by the handle. For them it's like holding the knife by the blade.


Hum... Yes.

Not exactly equivalent, but monad-based DSLs have almost the same power as Lisp macros.

You can't escape the syntax restrictions, the same way you can't escape Lisp's syntax restrictions. And it's a bonus for Lisp (in power) that its syntax is much more flexible. But in semantics they are equivalent.


> what other languages [...] give you the freedom to change the language and/or create DSL's [...]

FORTH. It's a very similar language from that respect. One of the first things you learn how to do as a FORTH developer is to rewrite the interpreter/compiler words.


I really like Forth and if I ever get into IoT that's what I would use over C any day. Forth works best from a clean slate though I think (from my admittedly limited experience), working in with a host OS takes a bit of fun away I found.


almost all languages can pull in c, so that doesn’t differentiate.

being able to dsl, you have to ask how often is that useful? what happens when 10 dsls are built into a code base and you hire a new person? how hard is it to make sense of everything?


The comment about C was more to do with perceived speed issues with Lisp but for most situations, Lisps that compile to native code are more than fast enough for general application programming.

The point of the article rings true, languages like C/C++ and those based on them do shepherd you into how they work and if you're doing low level programming for drivers etc then they work well, that's their domain (and you wouldn't need half of C++ if it stayed in that arena!). When trying to 'express' or abstract a problem then you have to shoehorn your thoughts into the language.

>> being able to dsl, you have to ask how often is that useful? what happens when 10 dsls are built into a code base and you hire a new person? how hard is it to make sense of everything?

Probably not as hard as trying to decipher swaths of source code in a language that doesn't make it easy to create the DSL's, instead you end up with API's and code that tries to hide the ugliness of being pushed around by the language.

With Lisp you are not so much creating API's or DSL's, you're extending the language to suit the problem, not waiting for the language to catch up. This is freedom!

cheers.


This is freedom and it does enable creativity and can increase productivity indeed. But I've also seen it hurt maintainability as well: the original programmer moved on and now nobody can decipher their genius DSL.


That's a shame, you hear this a lot about lisp code and I think the problem is that the people who really grok lisp and write 'genius' code can be a bit blase about documenting their code.

Simple documentation for standard function definitions is fine but macros definitely need special attention. Some say macros are over-used but I think it's more of a case they're under-documented. Even if the name of the macro gives you a fair idea, documenting how it works and what is generates with examples goes a long way to deciphering them for maintenance. cheers.


Absolutely. In fact I am writing a library in Elixir currently (for internal company usage) that mandates generating boilerplate to help a user project. It's a really fine balance to (a) not being too clever by making the macro code still readable and (b) documenting the intent of the macro, its input and output, and why it is actually useful.

Many people are like "I'll document it later" and it just almost never happens. Which is, as you said, a shame.


> what other languages [...] give you the freedom to change the language and/or create DSL's [...]

And I would add Rebol & Red.


Working on a project with a C# DSL really makes you appreciate the magic of Lisp.


Even broader look on this is which run-time properties the resulting _typical_ code has.

Most languages tend to produce a distinct behavior patterns and properties, such as (in no particular order):

- multi-threaded / single-threaded / multi-process

- frequency of memory safety bugs

- broken locale / unicode handling

- broken or non-standard network settings handling

- only synchronous I/O calls (or in contrast only async I/O)

- releasing memory back to OS (or not)

- fragmenting heap over time (or not)

- fast / slow text processing

- fast / slow binary codecs


"shepherding: An invisible property of a programming language and its ecosystem that drives people into solving problems in ways that are natural for the programming language itself rather than ways that are considered "better" in some sense"

Seems like most could lend a little more weight to: 'does this lang align with how do you want to think about/represent the problems you're solving'


The corollary for language and API designers: whatever you make easiest is what people will do.

If there is a 'right' way to do something, make that the default or the simplest calling pattern. If there is a new 'right' way, don't route the new way directly through the old way. People will think they're cutting out the middleman by keeping the old calling convention as long as possible.


> If there is a new 'right' way, don't route the new way directly through the old way. People will think they're cutting out the middleman by keeping the old calling convention as long as possible.

It's best if the old way becomes a shim onto the new way --- that shows that the new way works, and encourages moving to the new way -- nobody wants to keep a shim they don't need, and if the new way can do everything, the compatability shim isn't needed.


I totally agree.

I'd also like to add that a lot of the time in open source, it's as simple as having the examples be full examples with the error handling needed to use it in a production system. Quite often the examples shown are the "simple" ones, but they leave a lot out. I wish it weren't the case but those are often also pasted into production code.


JavaScript shepards you into monkey patching


Shimming and polyfilling (i.e. ensuring standard functions are present and what not) sure. But I think most codebases avoid monkey patching (i.e. adding new functionality to built ins).

Maybe it's up for debate if the language itself guides people in that direction, but no one seems to do it anymore, so I personally don't think it does.


The new culture around modules actively shepherds people away from monkey-patching. Whether or not Webpack is considered "part of the language" for these purposes is fairly academic, but yes, the reality is that people don't really do that any more.


There's a similar blog post by Gabriel Gonzalez which focuses on the incentives set in Haskell:

http://www.haskellforall.com/2016/04/worst-practices-should-...


I would have liked to have seen more examples of what different languages shepherd you to do.

Python, in my experience, shepherds you to import a lot of modules rather than writing your own. There's so many helpful ones available and that's kind of a reality shock if you switch to some other languages.


I like this. The insight is true of tools generally, not only languages. It's true at the feature/affordance level. Also at the tool level. "To a man with a hammer everything looks like a nail."


My take was always language quality is not about expressive power, because it's fairly difficult to come up with a non Turing complete system anyways, it is about what is easy to understand and modify.


Isn't it a balance?

VB6 code was easy to understand on one level, but its lack of expressiveness lead to a lot more code. On the other hand when I look at Ruby code there is too much syntactic sugar for a non-ruby programmers to understand it without looking some thing sup.


ZX Spectrum Basic shepherded you into using goto keyword which I've heard it's the single most evil thing to do in software development.


In Speccy Basic a GOTO jumped to a line number, so if you added code in that changed your numbering the GOTO broke. That was managed by using numbers that left space (10, 20, and so on), but it was still horrible in a big program. Another issue was that you couldn't go back (that's what gosub was for). They also made code hard to follow because the execution jumped around all over the place, but that was less of a problem.

These days languages that have GOTO usually jump to a label so it's not quite as bad. They're still likely to end up as spaghetti though.

And yes, I am old.


Modern languages usually only support goto within a single function. This significantly reduces the potential for spaghetti, compared to 8-bit BASICs that allowed you to jump anywhere within a program.


While this topic is fun, it smells like complete BS to me.

Why? Because the problem of people choosing the wrong tool (not the tool "shepherding" the person into using it) is older than programming [1].

We can do better than to (once again) make up an elaborate way of blaming our tools.

[1] https://en.wikipedia.org/wiki/Law_of_the_instrument


What is the right tool in each of these cases? If you have a PERL codebase, do you take XML with a totally different language?


2004-2008 I worked at The Press Association - we were processing gigabytes of XML a day, many of it in real time feeds pushed out to the likes of the BBC, large newspaper publishers, and betting shops.

All of this was done with Perl. None of it was done with regular expressions - it was done through XML::LibXML, and thus libxml. Whenever I hear other perl devs complain about XML processing my answer is always the same - just use XML::LibXML.

Choosing a different language might gain some advantages, but if you still choose the wrong libraries you're still going to have pain.


But if "choose the right tool" includes "use the right libraries", how does it differ from "write good code"? At that point arguing that you should use the right tool is arguing about being disciplined?


Typo

> shepherding An invisible property of a progamming language


If all you have is a hammer, everything looks like a nail.


My favourite when working in teams is: "use the right tool for the right job", which presupposes you know many tools and have experience working with them ... and that the shared set of tool knowledge in the team is more than just a hammer and a nail.


Facts.


I agree with all the points brought up in this article except for this curveball:

> Turing complete configuration languages shepherd you into writing complex logic with them

Everything discussed in this post is a consequence of language design or best practices. I don't think shoving complex logic into yaml files would be considered either. This is more a possibility as a result of Turing completeness, I don't think the language design has anything to do with it. All the other "shepherding" examples are clearly intentional choices by the language designers.


Although not obvious from the context, the writer of the article is the author of a build system (https://mesonbuild.com/) used in a bunch of large non-trivial high profile projects (https://mesonbuild.com/Users.html) including systemd, Nautilus and so on.

While I dislike proof-by-eminence the author has history as an original, recognized contributor in this space.


I think that the author of the article specifically speaking of configuration DSLs that were intended to be Turing-incomplete but somehow grew full Turing-completeness and therefore became bastard languages. YAML as a config DSL is sub-par (but I'll take it over JSON any day), and truly declarative, Turing-incomplete config languages are great, as is just using a fully-featured scripting language such as a Lisp or Python, but the middle-ground with a custom config language that is also Turing-complete is usually just bad.

Not sure I have any examples of what that might be. Perhaps nginx? or apache? Lots of complex software have very complex config languages that might accidentally be Turing-complete, but only fools would actually use them like that.


Often the turing completeness comes not from the config _format_ but from the semantics. A good example of this is Logstash config, where your config is an array of steps to apply, each of which can filter things in certain ways based on certain conditions, and before you know it you've used 100 lines to badly and buggily implement a 5-line script in a real language.


It is a funny choice of examples. I configure Emacs with script. And, while I certainly have some bitrot in my config, it is still way more manageable than any other config I have had to deal with.


Fwiw, Emacs is usually single-user, and self-selects users better able to understand and manage their own complexity.


I view it as yet another thing I have to maintain. Show some frugality in how you do it. And recognize it is a yak.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: