The bug in the program reveals a poor understanding of object lifecycles by whoever wrote it. The `obj` argument to `simple` is not globally unique and so it makes a poor location to store global state information (a count of how often `simple` is called, in this example).
Never tie global state information to ephemeral objects whose lifetime may be smaller than what you want to track. In this case, they want to know how many times `simple` is called across the program's lifetime. Unless you can guarantee the `obj` argument or its `counter` member exists from before the first call to `simple` and through the last call to `simple` and is the only `obj` to ever be passed to `simple`, it is the wrong place to put the count information. And with those guarantees, you may as well remove `obj` as a parameter to both `simple` and `complex` and just treat it as a global.
State information needs to exist in objects or locations that last as long as that state information is relevant, no more, no less. If the information is about the overall program lifecycle, then a global can make sense. If you only need to know how many times `simple` was invoked with a particular `obj` instance, then tie it to the object passed in as the `obj` argument.
> it is the wrong place to put the count information.
I'd argue this is the case regardless of lifetime. It's trying to squash two unrelated things into one object and should have been two different arguments.
Way more obvious if "obj" is replaced with some example object instead of an empty one:
let person = { name: "Foo Bar", age: 30, counter: counter };
My JS is terrible, but it seems like once you make the counter a global variable it is just better to change it to have an atomic dedicated count function. So instead of incrementing the counter in simple, a globalCount() function gets called that isolates the state. Something like
{
let i = 0;
var counter = function (){
console.log(++i);
}
}
Then call counter() to count & log and document that something horrible and stateful is happening. I wouldn't call that a global variable although the article author disagrees.
Because there is no intent, need or reason to vary it.
Nearly everything in RAM is technically a global variable to someone who is keen enough. Programming depends on a sort-of honour system not to treat things like variables if it is hinted that they shouldn't and it doesn't make sense to. Otherwise we can all start making calls to scanmem & friends.
Right but closing over a state with a function and giving that closure a name seems to quack like a duck to me. Maybe it's just because I'm a scheme guy and we're more upfront about this.
The situation is ugly because there is global state by design. But I don't see why the fact that the closure is stored in a mutable location would be a concern for you. Can you think of any conditions where someone would modify it? I'm not really seeing it, and I don't know what your alternative suggestion is going to be to stop them technically.
I can tell you when someone would modify a counter variable - every time it needs to be incremented. It leaves open a lot of room for bugs in unexpected parts of the code. Gating access behind a stateful function makes doing the wrong thing more expensive.
I think we're talking past each other here. The issue is that the enclosed state of counter is exposed globally and can be accessed without synchronization. I suppose yes one could also reassign counter to a different function (and you'd need to be aware of that) but my point is that you still need some kind of sync if you're calling counter all over the place, just like you would with a global variable
Could you tell me where this was posted? I thought no one would see this after I got no comments the first day
No one I showed this to complained about the first example but online many people did. I wrote a completely different article which I think is much better that uses examples I would have used in the follow up. I'll post that article next week
Second chance pool. This post is, per your submission history, from 2 days ago. HN has a second chance pool that lets articles continue collecting upvotes (more easily than digging through the full history). Some of those articles will get their timestamp updated to artificially inflate their ranking. This brings them to the front page again and gives them their "second chance". After a few hours or whatever time, the timestamp is reverted and they'll start falling back into their natural place in the rankings.
In this case, yes. Its scope should be the lowest necessary scope. Does JS provide static variables in functions? If not, then that forces you to lift it to some other scope like file or module scope or the surrounding function or class if that's viable.
From the article> "Static Function Variable: In C inside a function, you can declare a static variable. You could consider this as a local variable but I don't since it's on the heap, and can be returned and modified outside of the functions. I absolutely avoid this except as a counter whose address I never return."
These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
I think such variables can be useful, if you need a simple way of keeping some persistent state within a function. Of course it's more state you are carrying around, so it's hard to defend in a code review as best practice.
Amusingly, you can modify such variables from outside the function, simply by getting your function to provide the modifying code with a pointer to the variable, eg by returning the variable's address. If you do that though you're probably creating a monster. In contrast I think returning the value of the static variable (which the author casts shade on in the quote above) seems absolutely fine.
Edit: I should have stated that the number one problem with this technique is that it's absolutely not thread safe for obvious reasons.
> These variables are not on the heap. They are statically allocated. Their visibility is the only thing that differentiates them from global variables and static variables defined outside of functions.
In C++, there is a difference: function-static variables get initialized when control first passes into the function. That difference matters in cases where the initial value (partly) comes from a function call or from another global. For example, in C++ one can do:
int bar = -1;
void foo() {
static char * baz = malloc(bar);
…
}
void quux() {
bar = 13;
}
That’s fine as long as the code calls quux before ever calling foo.
They added that feature not to support such monstrosities, but because they wanted to support having static variables of class type.
If they were to design that today, I think they would require the initialization expression to be constexpr.
Just in case anyone still doesn't understand what that means to be statically allocated. It means that they are allocated in the Data Segment, which is a separate area of virtual memory from the stack and the heap.
You're right, I know mutable and read-only are separates, I know I seen gcc and clang put static int and global int in the same page, and I know I heard people say global variables are on the heap thousands of times. I guess I forgot global writable data doesn't necessarily mean heap nor the read-only section of memory
Sorry, I owe the author an apology for saying that they "cast shade" on the idea of returning the value of the static variable. Actually they, quite correctly, cast shade on the idea of returning the address of the static variable. I'd edit the original message, but I am (far) too late. I just noticed my mistake. All I can do is add this apology, for the record.
I didn't mind. I knew I'd hear a lot of disagreements and incorrect thoughts. My next article has a lot of examples which should make things easier to digest.
One thing I dislike about compsci is everyone has different definitions for everything. If people say static variables are not on the heap fine, but you can easily see the address of a static variable and global variable being in the same 4K page
static variable addresses are an extremely important tool in static analysis, for proving certain program invariants hold. In C, a const static often will be able to tell you more about the structure of your code at runtime than a #define macro ever could!
though unless you're programming extremely defensively (eg trying to thwart a nation state), I see no reason why you would use them at runtime
static consts in C carry their identity through their (fixed, unchanging) pointer address. Lets say you have a business rules engine, that's only meant to ingest threshold values from a certain module. You want to know if the 3.0 you're using is coming from the correct place in the code, or if there's a programming error. With a define, there's not enough additional information to be able to. With a static const, you can just have a const static array of pointers to the valid threshold constants, and use that in your tests for the rules engine.
I work in a highly regulated field, so often this level of paranoia and proof helps us make our case that our product works exactly the way we say it works.
So it's easier to guarantee the identity is correct? That doesn't stop you from using the incorrect identity, though.
I'm not quite sure I understand how this is better than a define - if you know enough to know whether you're using the correct value, it can just as easily be a define?
I think I can see the idea, but I'm probably not paranoid enough to understand how it's helpful XD
Global variables (in languages where they otherwise make sense and don't have footguns at initialization and whatnot) have two main problems:
1. They work against local reasoning as you analyze the code
2. The semantic lifetime for a bundle of data is rarely actually the lifetime of the program
The second of those is easy to guard against. Just give the bundle of data a name associated with its desired lifetime. If you really only need one of those lifetimes then globally allocate one of them (in most languages this is as cheap as independently handling a bunch of globals, baked into the binary in a low-level language). If necessary, give it a `reset()` method.
The first is a more interesting problem. Even if you bundle data into some sort of `HTTPRequest` lifetime or whatever, the fact that it's bundled still works against local reasoning as you try to use your various counters and loggers and what have you. It's the same battle between implicit and explicit parameters we've argued about for decades. I don't have any concrete advice, but anecdotally I see more bugs from biggish collections of heterogeneous data types than I do from passing everything around manually (just the subsets people actually need).
I don't think #1 is necessarily true. Take a common case for a global variable, a metric your app is exposing via prometheus. You have some global variable representing its value. Libraries like to hide the global variable sometimes with cuteness like MetricsRegistey.get_metric("foo") but they're globally readable and writable state. And in your function you do a little metric.observe_event() to increment your counter. I think having this value global helps reasoning because the alternative is going to be a really clunky plumbing of the variable down the stack.
It helps with reasoning in some sense, and the net balance might be positive in how much reasoning it enables, but it definitely hurts local reasoning. You need broader context to be able to analyze whether the function is correct (or even what it's supposed to be doing if you don't actively work to prevent entropic decay as the codebase changes). You can't test that function without bringing in that outer context. It (often) doesn't naturally compose well with other functions.
Of course #1 is not necessarily true, it depends on one's coding style, and using globals for application-scoped services like logging/metrics is tentatively fine... although I also think that if we're going to dedicate globals almost exclusively to this use, they probably should have dynamic scoping.
On the other hand, I have seen quite a lot of parsing/compilers' code from the seventies and eighties and let me tell you: for some reason, having interfaces between lexer and parser, or lexer and buffered reader, or whatever else to be "call void NextToken(void), it updates global variables TokenKind, TokenNumber, TokenText to store the info about the freshly produced token" was immensely popular. This has gotten slightly less popular but even today e.g. Golang's scanner has method next() that updates scanner's "current token"-related fields and returns nothing. I don't know why: I've written several LL(1) recursive-descent parsers that explicitly pass the current token around the parsing functions and it works perfectly fine.
Global variables are a programming construct, which like other programming constructs is neither bad nor good. Except, due to the takeover of workplaces by the best practices cult, instead of reasoning about tradeoffs on a case by case basis (which is the core of coding and software engineering), we ascribe a sort of morality to programming concepts.
For example, global variables have drawbacks, but so does re-writing every function in a call-stack (that perhaps you don't control and get callbacks from).
Or if you are building a prototype, the magic of being able to access "anything from anywhere" (either via globals or context, which is effectively a global that's scoped to a callstack), increases your speed by 10x (since you don't have to change all the code to keep passing its own abstractions to itself as arguments!)
Functions with long signatures are tedious to call, create poor horizontal (which then spills over to vertical) code density. This impacts your ability to look at code and follow the logic at a glance, and perhaps spot major bugs in review. There's also fewer stuff for say copilot to fill in incorrectly, increasing your ability to use AI.
At the end, every program has global state, and use of almost every programming construct from function calls (which may stack overflow) or the modulus operator (which can cause division by zero), or sharing memory between threads (which can cause data races) requires respecting some invariants. Instead, programmers will go to lengths to avoid globals (like singletons or other made up abstractions -- all while claiming the abstractions originate in the problem domain) to represent global state, because someone on the internet said it's bad.
A global variable in a language with parallel operation is often a terrible idea. The problem with globals and parallel operations is they are an inherent risk for race conditions that can have wild consequences.
In some languages, for example, a parallel write to a field is not guaranteed to be consistent. Let's assume in the above example `counter` was actually represented with 2 bytes. If two threads write an increment to it without a guard, there is no guarantee which thread will win the upper byte and which will win the lower byte. Most of the time it will be fine, but 1 in 10k there can be a bizarre counter leap (forwards or backwards) that'd be almost impossible to account for.
Now imagine this global is tucked away in a complex library somewhere and you've got an even bigger problem. Parallel calls to the library will just sometimes fail in ways that aren't easy to explain and, unfortunately, can only be fixed by the callee with a library wrapping synchronization construct. Nobody wants to do that.
All of these problems are masked by a language like Javascript. Javascript is aggressively single threaded (Everything is ran in a mutex!). Sure you can do concurrent things with callbacks/async blocks, but you can't mutate any application state from 2 threads. That makes a global variable work in most cases. It only gets tricky if you are dealing with a large amount of async while operating on the global variable.
Yes, mixing some concepts in programming is a terrible idea.
Perhaps this is also widely unpopular, but it's the parallelism that needs to be treated with care and the additional caution, as often the parallelism itself is the terrible idea.
Concurrent code often has unpredictable performance due to cache behavior and NUMA, unpredictable lock contention, and the fact that often there is no measure of whether the bottleneck is CPU or I/O.
What most people want from concurrency (like computing the response to independent HTTP requests) can be done by separate processes, and the OS can abstract the issues away. As another reference, the entire go language is designed around avoiding shared memory (and using message passing -- even though it doesn't use processes for separation it encourages coding like you did).
But also sharing memory between processes can be handled with care via mappings and using the OS.
> Except, due to the takeover of workplaces by the best practices cult, instead of reasoning about tradeoffs on a case by case basis (which is the core of coding and software engineering), we ascribe a sort of morality to programming concepts.
You're just strawmanning here. Maybe some of the people who say that global variables should be avoided ("should be avoided" never means "absolutely can't be used ever", btw) are people who have experience working on large projects where the use of implicit state routinely makes code hard to reason about, causes concurrency issues and introduces many opportunities for bugs.
> There's also fewer stuff for say copilot to fill in incorrectly, increasing your ability to use AI.
That argument makes no sense to me. If some piece of code is relying on implicit global state to have been set, why would copilot be any better at figuring that out than if it had to pass the state as an argument, something which is clearly stated in the function signature?
I think this article broadens the definition of global variable and then says "Look, the things I added to the definition aren't bad, so global variables aren't always bad."
If you just look at what people normally mean by global variable, then I don't think the article changes minds on that.
To make it worse: "Look, the things I added to the definition aren't bad in this specific language and use-case, so global variables are not the problem".
To me, the author either has a very narrow field of focus and honestly forgets about all the other use-cases, practicalities and perspectives, or they choose to ignore it just to fire up a debate.
In any case, these constructs are only true for JavaScript (in node.js whose setup avoids threads common issues), and fall flat in a multithreaded setup in about every other languages.
If I were to port this to Rust, first, the borrow checker would catch the described bugs and not allow me to write them in the first place. But secondly, if I really insist on something global that I need to mutate or share between threads, I can do so, but would be explicitly required to choose a type (Mutex, RwLock, with Arc or something) so that a) I have thought about the problem and b) chose something that I know to work for my case.
> If I were to port this to Rust, first, the borrow checker would catch the described bugs and not allow me to write them in the first place. But secondly, if I really insist on something global that I need to mutate or share between threads, I can do so, but would be explicitly required to choose a type (Mutex, RwLock, with Arc or something) so that a) I have thought about the problem and b) chose something that I know to work for my case.
Agreed. Not the specifics, as I don't know Rust, but it makes sense.
I find the concept of a context structure passed as the first parameter to all your functions with all your "globals" to be very compelling for this sort of stuff.
This is very similar to dependency injection. Separating state and construction from function or method implementation makes things a lot easier to test. In my opinion it's also easier to comprehend what the code actually does.
That just seems like globals with extra steps. Suddenly if your context structure has a weird value in it, you’ll have to check every function to see who messed it up.
First, that's true for globals as well. Second, with "context structure" pattern, the modifications to it are usually done by copying this structure, modifying some fields in the copy and passing the copy downwards, which severely limits the impact radius and simplifies tracking down which function messed it up: it's either something above you in the call stack or one of the very few (hopefully) functions that changes this context by-reference, with intent to apply such changes globally.
This plus immutable data is what makes doing web apps in Elixir using Phoenix so nice. There is a (demi-)god "%Conn" structure passed as the first parameter that middleware and controller actions can update (by returning a new struct). The %Conn structure is then used in the final step of the request cycle to return data for the request.
For non-web work genservers in Elixir have immutable state that is passed to every handler. This is "local" global state and since genservers guarantee ordering of requests via the mailbox handlers can update state also by returning a new state value and you never have race conditions.
That's exactly why I used this specific example. I seen many code bases that use clone to avoid mutation problems so I wrote this specifically to show it can become a problem too.
I wrote a better article on globals. I plan on posting it next week
This seems more an issue with not understanding structuralClone, than one of understanding globals or lack thereof. There’s nothing wrong with the example, it does exactly what the code says it should — if you want counter to be “global” then structuralClone isn’t the function you want to call. The bug isn’t in how counter was in obj, the bug is in calling structuralClone when its behaviour wasn’t wanted.
With that said, it seems obvious that if you want to globally count the calls, then that count shouldn’t live in an argument where you (the function) don’t control its lifetime or how global it actually is. Simple has no say over what object obj.counter points to, it could trivially be a value type passed into that particular call, so if you know you want a global count then of course storing it in the argument is the wrong choice.
Global has two conflated meanings: global lifetime (ie lifetime of the whole program) and global access (which the article states). Simple needs global lifetime but not global access.
You rarely ”need” global access, although for things like a logger it can be convenient. Often you do need global lifetime.
If I have 500 functions, I don't want to extrapolate out the overhead of passing a state object around to all of them. That's a waste of effort, and frankly makes me think you want to code using an FP paradigm even in imperative languages.
Module-level and thread-level "globals" are fine. You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation.
If that’s so useful, make your language support the concept of lexical environments instead. Otherwise it’s just manual sunsetting every day of week. Our craft is full of this “let’s pretend we’re in a lisp with good syntax” where half of it is missing, but fine, we’ll simulate it by hand. Dirt and sticks engineering.
(To be clear, I’m just tangentially ranting about the state of things in general, might as well post this under somewhere else.)
You can have it either way, it’s not for you but for people who disagree with what they deem a preference that is the only option when there’s no alternative.
I got into this argument with my former coworkers. Huge legacy codebase. Important information (such as the current tenant of our multi-tenant app) was hidden away in thread-local vars. This made code really hard to understand for newcomers because you just had to know that you'd have to set certain variables before calling certain other functions. Writing tests was also much more difficult and verbose. None of these preconditions were of course documented. We started getting into more trouble once we started using Kotlin coroutines which share threads between each other. You can solve this (by setting the correct coroutine context), but it made the code even harder to understand and more error-prone.
I said we should either abolish the thread-local variables or not use coroutines, but they said "we don't want to pass so many parameters around" and "coroutines are the modern paradigm in Kotlin", so no dice.
You know what helps manage all this complexity and keep the state internally and externally consistent?
Encapsulation. Provide methods for state manipulation that keep the application state in a known good configuration. App level, module level or thread level.
Use your test harness to control this state.
If you take a step back I think you’ll realize it’s six of one, half dozen of the other. Except this way doesn’t require manually passing an object into every function in your codebase.
These methods existed. The problem was that when you added some code somewhere deep down in layers of layers of business code, you never knew whether the code you'd call would need to access that information or whether it had already previously been set.
Hiding state like that is IMHO just a recipe for disaster. Sure, if you just use global state for metrics or something, it may not be a big deal, but to use it for important business-critical code... no, please pass it around, so I can see at a glance (and with help from my compiler) which parts of the code need what kind of information.
I’m having a difficult time understanding the practical difference between watching an internal state object vs an external one. Surely if you can observe one you can just as easily observe the other, no?
Surely if you can mutate a state object and pass it, its state can get mutated equally deep within the codebase no different than a global, no?
What am I missing here? To me this just sounds like a discipline issue rather than a semantic one.
> To me this just sounds like a discipline issue rather than a semantic one.
Using an explicit parameter obviates the need for discipline since you can mechanically trace where the value was set. In contrast, global values can lead to action at a distance via implicit value changes.
For example, if you have two separate functions in the same thread, one can implicitly change a value used by the other if it's thread-local, but you can't do that if the value is passed via a parameter.
> These would be just as traceable in your IDE/debugger.
A debugger can trace a single execution of your program at runtime. It can't statically verify properties of your program.
If you pass state to your functions explicitly instead of looking it up implicitly, even in dynamically typed languages there are linters that can tell you that you've forgot to set some state (and in statically typed languages, it wouldn't even compile).
If your global state contains something that runs in prod but should not run in a testing environment (e.g. a database connection), your global variable based code is now untestable.
Dependency Injection is popular for a very good reason.
Sure. And a million programmers have all screamed out in horror when they realize that their single test passes, but fails when run as part of the whole suite. Test contamination is a hell paved with global variables.
Just need to make sure your module doesn't get too big or unwieldy. I work in a codebase with some "module" C files with a litany of global statics and it's very difficult to understand the possible states it can be in or test it.
I agree that so long as overall complexity is limited these things can be OK. As soon as you're reading and writing a global in multiple locations though I would be extremely, extremely wary.
I did not say you are targeting OP. I meant that you are degrading your parent commenter.
This:
"You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation."
...is neither productive nor actually true. But I'll save the latter part for your other reply.
I could not initially reply to you. Your comment rubbed me the wrong way, because I had no intention of trying to degrade anyone, and frankly, I was offended. But I thought better of my hasty and emotional response. I would rather take a deep breath, re-focus, re-engage, and be educated in a thoughtful dialog than get into a mud slinging contest. I am always willing to be enlightened.
A tip, in your profile you can set a delay which is a number of minutes before your comments will become visible to other people. Mine is set to 2 right now. This gives you time to edit your comment (helpful for some longer ones) but also to write some garbage response and then think better and delete it before anyone's the wiser.
It's also helpful to give you time to re-read a mostly good, but maybe not polite, response and tone down your response.
"You gain nothing (other than some smug ivory tower sense of superiority) by making your functions pure and passing around a global state object to every single method invocation."
So, we already cleared it up that using that tone is not inviting discussion and shows emotional bias and that has no place in technical discussions, I believe. You said you are open to have your mind changed. Let me give you a few loosely separate (but ultimately bound to each other) arguments in favor of passing around state to each function individually.
- All functions that operate on such state are trivially testable in isolation. This is not a theoretical benefit, I've experienced it hundreds of times ever since I started working mainly with Elixir for almost 9 years now (though I still code Golang and Rust). The amount of bugs I ended up being paid to fix was confusing even to me, just for utilizing this one way of working.
- Explicit dependencies, though this one is muddy because f.ex. in Elixir that's strongly but dynamically typed this benefit is nearly non-existent; I am talking mostly about statically typed languages here, especially Rust. If you have to operate on stuff that implements this or that trait then that's a very clear contract and the code becomes that much clearer with scarcely any need for documenting those functions (though docs do help in other ways there f.ex. "how do we use this so it's, you know, useful" -- but that still means that you get to skip documenting trivia which is still a win).
- LANGUAGE-DEPENDENT: ability to construct pipes (specific to Elixir, OCaml, F# and probably a few others). Consider this:
...while passing around the same state (piping passes the first function argument down akin to currying) makes for a super terse and readable code. It was and still is a game changer for many. Piping is what I sorely miss in Golang and Rust; gods, the code is so much uglier without it though their method chaining gets you almost there as well -- fair is fair.
Also, piping almost completely negates the inconvenience that you hinted at.
- Generally giving you a better idea of the dependency graph in your code. Again, a game changer for me. In my 9 years with Java this was my biggest pain. At one point you just give up and start throwing crap at the wall until something works (or doesn't). Not that I did mind the longer dev times of Java but the productivity was just abysmal with all the DI frameworks. I am aware that things improved since then but back when I finally gave up on Java back in 2009-2011 (gradually) it was still terrible.
OK, I don't have much to go on from your otherwise fairly small comment and I already extrapolated quite a lot. Let me know what you think.
But, one rule: "I don't like it" is not allowed. It's not about "liking" stuff, it's about recognizing something that helps productivity and increases clarity.
Even in a single-threaded environment, passing a struct around containing values allows you to implement dynamic scoping, which in this case means, you can easily call some functions with some overridden values in the global struct, and that code can also pass around modified versions of the struct, and so on, and it is all cleanly handled and scoped properly. This has many and sundry uses. If you just have a plain global variable this is much more difficult.
Although Perl 5 has a nice feature where all global variables can automatically be treated this way by using a "local" keyword. It makes the global variables almost useful, and in my experience, makes people accidentally splattering globals around do a lot less damage than they otherwise would because they don't have to make explicit provision for scoped values. I don't miss many features from Perl but this is on the short list.
And do you always know beyond any reasonable doubt that your code will be single-threaded for all time? Because the moment this changes, you're in for a world of pain.
Wrapping the globals into to a struct context #ifdef MULTI-THREADED and adding this ctx for each call as first call is a matter of minutes. I've done this multiple times.
Much worse is protecting concurrent writes, eg to an object or hash table
If the discussion here is meant to be solely about JavaScript, then I'll happily consider all my comments in this thread to be obsolete, since I don't have particularly strong opinions about that language since I don't use it a lot.
I was under the impression that many people here were discussing the usage of global variables more generally, though.
> The problem is data access. Nothing more, nothing less.
I agree with this, but the problem with global variables is precisely that they make bad data access patterns look easy and natural. Speaking from experience, it’s a lot easier to enforce a “no global variables” rule than explain to a new graduate why you won’t allow them to assign a variable in module X even though it’s OK in module Y.
You might like the article I wrote for next week. Could you tell me where this post is linked from? I didn't think anyone would see this when no one commented the first day
Because as a programmer I have responsibility for the technical soundness of the program, and I don't create threads haphazardly.
> when i thought i understood multi-threaded code, but didn't.
All the more reason to carefully plan and limit shared state among threads. It's hard enough to get right when you know where the problems are and impossible if you spray and pray with mutexes.
Yeah but just because you do the right thing doesn’t mean that others will. That one person that creates threads haphazardly will wreak havoc and at scale this will happen. It’s an unstable equilibrium to put the onus of program soundness on the contributors to a program.
> That one person that creates threads haphazardly will wreak havoc
What if someone comes along and starts adding threads and doesn't check what they are accessing? And doesn't read the documented invariants of the design?
Well I don't think any project can succeed if that's the level disorganization.
Are there certain kinds of bugs that are easy to regress last minute? Yes. A brand new thread without limited state is not one of them.
> put the onus of program soundness on the contributors to a program.
Who then is responsible for making sure the program is correct?
> Who then is responsible for making sure the program is correct?
I’d say this is mostly a function of the language or framework.
After that, it’s up to the tech leads to provide access patterns/examples that align with the needs of a particular project.
My point is not so much that you shouldn’t ever think about the implications of your code, just that contributors are humans and any project with enough contributors will have a mix of contributors with different knowledge, experience and skill sets. If you are leading one of those, it would behoove you to make doing the right thing easy for later contributors.
Most programming languages written after 1990 let you initialize global variables lazily. The main problem is that the initialization order might be unexpected or you may run into conflicts. Singletons make the order slightly more predictable (based on first variable access), although it is till implicit (lazy).
But singletons are still a terrible idea. The issue with global variables is not just initialization. I would argue it's one of the more minor issues.
The major issues with global variables are:
1. Action at distance (if the variable is mutable)
2. Tight coupling (global dependencies are hard-coded, you cannot use dependency injection)
3. Hidden dependency. The dependency is not only tightly-coupled, it is also hidden from the interface. Client code that calls your function doesn't know that you rely on a global variable and that you can run into conflict with other code using that variable or that you may suddenly start accessing some database it didn't even know about.
Singleton does not solve any of the above. The only thing it ensures is lazy initialization on access.
The burden is on the programmer adding a new thread to know what they can safely access.
The conclusion of your argument looks like 2000s Java - throw a mutex on every property because you never know when it will need to be accessed on a thread.
Designs that spread complexity rather than encapsulate it are rarely a good idea.
I agree that dependent sequences of events, coordinated through a global are bad. But there are other usages which are not error prone. For example an allocation, logger, readonly settings, or a cache.
You may hate my article next week, it's meant to replace this article. If you want you can email me for early access and tell me how I can improve the article. Lets say you can guess my email if you're emailing the right domain
Of course in a real life program, it may be lost in other code logic and, most importantly, the function performing the clone may not be so explicit about it (e.g. an "update" function that returns a different copy of the object).
mutable global variables are intrinsically incompatible with a multithreaded environment. Having mutable shared state is never the right solution, unless you basically can live with data races. And that's before taking maintainability in consideration too
I think the author forgot the most useful use case for globals, and that is variables that has to do with the context the program is running under such as command line arguments and environment variables (properly validated and if needed escaped).
Any program that uses a database has a very similar problem to global variables.
As Gilad Bracha has pointed out, types are antimodular, and your database schema can be considered one giant type that pervades your program, just like globals can be.
I don't think we have tools to compositionally solve this, across different programming languages.
> The problem is data access. Nothing more, nothing less. There's a term for this that has nothing to do with global variables: "action at a distance."
I mean yes, using global variables is just one of the ways to cause action-at-a-distance and that is... apparently a big reveal?
Otherwise sure, there is no pattern that cannot be utilized 100% correctly and without introducing bugs. Theoretically. Now let's look at the practical aspects and how often indiscriminately using such tempting patterns like global variables -- and mutexes-when-we-are-not-sure and I-will-remember-not-to-mutate-through-this-pointer -- lead to bugs down the road.
The answer is: fairly often.
IMO the article would be better titled as "There is no pattern that a lazy or careless programmer cannot use to introduce a bug".
What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
> What do you suppose is the recommended way to use the encapsulation? ;) This is partly why there's a "defining global variables" section, I know people will will consider some usage as not using a global
What the post discusses can be distilled into the concept of Referential Transparency[0]:
A linguistic construction is called referentially
transparent when for any expression built from it,
replacing a subexpression with another one that denotes the
same value[b] does not change the value of the expression.
Any construct which is not referentially transparent can be classified as potentially effectual, which is not a Bad Thing(TM) as if there were no observable effects of a program, it would be useless.
What makes programs easier to prove correct and reason about is when effects are localized to well-defined areas often "at the edges" of the architecture.
What makes programs impossible to prove correct and very difficult to reason about is when effectual logic is pervasive throughout, such as most of the "Global Variable Use Cases" the post advocates.
Globals are essential to track state/values of variables in intermediate step, for debug,
I create unique object on globals and store intermediate variables in it, so i can inspect values when something goes wrong, being only one unique object to maintain it will not override any existing vars thus not affecting existing programs
I've recently found it helpful to think of problems at the scope of an individual process, rather than a set of functions in a library or framework. This makes it much clearer when a global variable makes sense or not, and generally where to initialize things, and place state.
Perhaps a unicorn doesn't die as soon as you first use a global var. But it has two .45s pointed to it cranium left and right. And at any random moment Danni DeVito will start blasting.
Global variables are fine when you read many times, but you write to them sparingly and when they are updated but the old value is read/used you have mechanisms to handle errors and retry
I'm the author. No, I completely disagree. No one ever complains about logs and malloc except having too much of them - A line I removed from this article
The next article has examples and IMO much better, but makes no mention of action at a distance, and currently makes no mention of how people use clone to prevent mutability.
Moving state into the global namespace and accessing it directly from that space makes it much more difficult to test, instrument and integrate.
Sure, if you're building disposable toy software, do whatever is easiest. But if you're building software for others to use, at least provide a context struct and pass that around when you can.
For those cases where this is challenging or impossible, please sequence your application or library initialization so that these globals are at least fungible/assignable at runtime.
What does the ideal look like then? Data storage encapsulation in the app? Perhaps different DB users with granular access, accessed from different non-global parts of the program. Chuck in some views later when you need performance pragmatism!
Man, the bugs they prevent. Vs everyone rolling their own multithreaded file reader/writer code in C. How many programmers would think to journal transactions and ship them for backup for example.
SQL or 15000 lines of C++ or Go or Rust doing whatever with files.
Well personally I favour event sourcing and so in my systems the SQL database (if there is one) is only written to by the event rollup component. But even if you're not that extreme you probably want to have some clear structure to where your database writes happen rather than randomly writing it from any arbitrary line of code. The important thing is to have a structure that everyone working on the code understands so that you all know where any write to the database must have come from, not the specifics of what that structure is.
I have gotten shit before for using global variables, and sometimes that is justified, but I almost never see anyone given shit over treating Redis as a big ol’ global map.
Yeah. Honestly, I think Redis is very often overused and likely makes stuff slower.
If your application is only running on one computer, and especially if it's only running in one process, you will likely get better performance using a big ol' thread-safe hashmap that's global/singleton. You pay basically no latency cost, no (de)serialization costs, and the code is likely going to be simpler.
I've seen people who seem to think that just inserting stuff into Redis will somehow automatically make their code better, only for the code to actually become slower because of network latency.
I've also seen people use Redis as a way to have global variables because it's too hard to figure out how to scope stuff correctly.
The problem with global variables is that they imply that there's only one of that thing. This assumption - no matter how certain it seems - will inevitably be proven false in any sufficiently long-lived software component.
Agreed, global variables are fine up to a certain scale, especially if they're only used in the main file. They only really become a problem if you start modifying them from inside different files.
The real underlying problem is 'Spooky action at a distance'; it's not an issue that is specific to global variables. If you pass an instance between components by-reference and its properties get modified by multiple components, it can become difficult to track where the state changes originated and that can create very nasty, difficult-to-reproduce bugs. So this can happen even if your code is fully modularized; the issue is that passing instances by reference means that the properties of that instance behave similarly to global variables as they can be modified by multiple different components/files (without a single component being responsible for it).
That's partly where the motivation for functional programming comes from; it forces pass-by-value all the time to avoid all possibility of mutations. The core value is not unique to FP though; it comes from designing components such that they have a simple interface which requires mostly primitive types as parameters. Passing objects is OK too, so long as these objects only represent structured information and their references aren't being held onto for future transformation.
So for example, you can let components fully encapsulate all the 'instances' which they manage and only give those parent components INFORMATION about what they have to do (without trying to micromanage their child instances); I avoid passing instances or modules to each other as it generally indicates a leaky abstraction.
Sometimes it takes some creativity to find a solution which doesn't require instance-passing but when you find such solution, the benefits are usually significant and lasting. The focus should be on message-passing. Like when logging, the code will be easier to follow if all the errors from all the components bubble up to the main file (e.g. via events, streams, callbacks...) and are logged inside the main file because then any developer debugging the code can find the log inside the main file and then trade it down to its originating component.
Methods should be given information about what to do, they should not be given the tools to do their job... Like if you catch a taxi in real life, you don't bring a jerrycan of petrol and a steering wheel with you to give to the taxi driver. You just provide them with information; the address of your desired destination. You trust that the Taxi driver has all the tools they need to do the job.
If you do really want to pass an instance to another instance to manage, then the single-responsibility principle helps limit the complexity and possibility for spooky action. It should only be passed once to initialize and then the receiving component needs to have full control/responsibility for that child. I try to avoid as much as possible though.
Never tie global state information to ephemeral objects whose lifetime may be smaller than what you want to track. In this case, they want to know how many times `simple` is called across the program's lifetime. Unless you can guarantee the `obj` argument or its `counter` member exists from before the first call to `simple` and through the last call to `simple` and is the only `obj` to ever be passed to `simple`, it is the wrong place to put the count information. And with those guarantees, you may as well remove `obj` as a parameter to both `simple` and `complex` and just treat it as a global.
State information needs to exist in objects or locations that last as long as that state information is relevant, no more, no less. If the information is about the overall program lifecycle, then a global can make sense. If you only need to know how many times `simple` was invoked with a particular `obj` instance, then tie it to the object passed in as the `obj` argument.
reply