A nasty bit of undefined timezone behavior in Golang

xyzzy_plugh · on Sept 4, 2021

This does not appear to be undefined at all.

The issue is using reflect.DeepEqual to compare structs with unexported fields. The fields are an implementation detail, and comparing time.Time values is generally incorrect.

Here are the actual docs from https://pkg.go.dev/time#Time.Equal

> Equal reports whether t and u represent the same time instant. Two times can be equal even if they are in different locations. For example, 6:00 +0200 and 4:00 UTC are Equal. See the documentation on the Time type for the pitfalls of using == with Time values; most code should use Equal instead.

and

> Time instants can be compared using the Before, After, and Equal methods. The Sub method subtracts two instants, producing a Duration. The Add method adds a Time and a Duration, producing a Time.

and

> Note that the Go == operator compares not just the time instant but also the Location and the monotonic clock reading. Therefore, Time values should not be used as map or database keys without first guaranteeing that the identical Location has been set for all values, which can be achieved through use of the UTC or Local method, and that the monotonic clock reading has been stripped by setting t = t.Round(0). In general, prefer t.Equal(u) to t == u, since t.Equal uses the most accurate comparison available and correctly handles the case when only one of its arguments has a monotonic clock reading.

If I were reviewing this code, the bug squarely lands on the shoulders of the author who chose to use reflect.DeepEqual (or similar function available in many test helper libraries) here. Frankly, I find code that uses such blind comparisons to be lazy and prone to bugs exactly like this one.

For readers unfamiliar with Go, this pattern is not unique to the time library. In general you avoid these sort of comparisons, at least without reading the docs. It's one thing if it's your own type or a fully-exported type or something obvious but this is just so clearly wrong.

Furthermore I find this entire blog in pretty bad taste. Read the docs, own up to your own mistake, don't make blind assumptions about data types, don't lean on reflection to be lazy. Trying to throw shade at the Go authors crosses the line for me here, and name-calling Rob Pike is ultra immature.

Trying to be cute, I guess? But ultimately I now think poorly of this business and its engineers. Bad form, folks.

RVuRnvbM2e · on Sept 4, 2021

Absolutely spot on.

And just to pile on a bit more, this part of the blog post gives me the heebie jeebies:

  There are some actual foot-guns hidden in the runtime:
     * Reads from a nil map are fine, writes panic
     * Reads from a closed channel block, sends on a closed channel panic
     * nil sometimes does not == nil
     * Any panic in any goroutine kills the entire process
     * Can't safely store references to loop variables

Calling these footguns is an exaggeration. All are natural outcomes of the design of the language.

* Zero values are a key concept in Go, and avoid uninitialised memory. Reading from a map returns the zero value of its keys. Writes panic because there is no underlying object.

* Channels are another key concept in the synchronisation of goroutines. Block/panic behaves this way because these are programming logic errors.

* Equality of interfaces (another key concept) includes the dynamic type.

* I'm confused about what the blog author expects from `panic()`?

* This is the same behaviour as any language with pointers, such as C. Pretty fundamental stuff.

If the author finds this list surprising I wonder about the quality of their software.

And this is just cringingly arrogant:

> stumped an entire room of seasoned industry veterans for a solid hour

cinntaile · on Sept 4, 2021

> Trying to throw shade at the Go authors crosses the line for me here, and name-calling Rob Pike is ultra immature.

The whole blogpost is written in a tongue in cheek fashion, I don't think it's meant as name-calling.

d0100 · on Sept 4, 2021

I agree with you, but "tongue in cheek" when you are wrong can come out as obnoxious

cinntaile · on Sept 4, 2021

Yeah absolutely, I'm not trying to defend the blog post in its totality. Just that one thing that really rubbed him the wrong way.

amw-zero · on Sept 4, 2021

The main problem I got out of the article was that the stateful timezone of the entire process does not get set until `time.Format` gets called. I think that's a legitimate criticism - which part made you think that `reflect.DeepEqual` is at fault?

xyzzy_plugh · on Sept 4, 2021

DeepEqual is simply inappropriate here. It's fundamentally problematic to do deep comparisons on types you don't have visibility into. You're not entitled to manage the fields, so you lack any guarantee that the implementation matches your expectations. Maybe comparisons require holding a lock, maybe there's a pointer to a shared object (like in this instance), maybe there's some reference to CGo stuff. It's just not the right way to go about it.

If the language lacked reflection, how would you solve this problem?

amw-zero · on Sept 5, 2021

Can you try addressing what I said? You’re really hung up on the deep equality check. As I said, I don’t think that’s the root of the issue. The root issue is time.Format changing the global time zone unexpectedly.

Do you have an opinion on that?

marcan_42 · on Sept 4, 2021

This isn't so much undefined behavior as unexpected statefulness, but yeah, it shouldn't have been designed like this.

I've had a couple of run-ins with the Golang runtime myself, which were actual bugs. The first one was an issue with C interop triggered by my first production Go app; it had been reported but never successfully debugged, so I did so. It took a couple evenings to get it figured out, and a then a month to get the fix merged upstream. To be honest, that put me off of using Go for anything serious; I just never got the feeling that it's a solidly designed language and runtime.

The second one I ran into as a user, and I think has ended up on HN a few times:

https://marcan.st/2017/12/debugging-an-evil-go-runtime-bug/

barsonme · on Sept 4, 2021

Frankly, you did some amazing debugging.

> and a then a month to get the fix merged upstream. To be honest, that put me off of using Go for anything serious; I just never got the feeling that it's a solidly designed language and runtime.

This seems a little picky, though. Ian mailed the CL one day after you found the root cause. The CL was merged a couple days later. The fix was added to the very next patch release. That's a good turnaround time for a bug, especially a vDSO corner case.

Go has its fair share of (sometimes quite annoying) oddities. In hindsight, some of the design and implementation choices were definitely incorrect. But they were still well thought out and I can't say I'd have made a better argument or decision at that point in time, either.

marcan_42 · on Sept 4, 2021

That was the second bug. The first bug was this one (CL sent on 2014-08-19 and merged on 2014-09-24):

https://github.com/golang/go/issues/7978 https://codereview.appspot.com/131910043

I particularly like this comment:

"Ping. Could somebody review this?

Internal Google users suspect this is the cause of some of their problems."

I guess once fixing Google production, always fixing Google production (I ran into this at my next job after leaving Google).

But after that first bug, and having to dive deep into the golang runtime... it just didn't look well engineered to me, at all. Things like having architecture-specific details scattered open-coded all through stuff like the stack tracer (IIRC when they added ARM support they had to add an `lr` argument to a ton of codepaths). That first app was doing some Cgo stuff, and after looking carefully through how Cgo is supposed to work and how it actually works and what guarantees there are... it was just a mess. The only way to guarantee memory wouldn't be yanked out from under C was, apparently, to use malloc() and copy, which means you cannot do zero-copy calls into C code. There was no formal specification for what memory pinning is guaranteed or not.

This is a general theme in Go. It feels like an ad-hoc language with a bunch of interesting design decisions, but largely superficial; in the end the deeper you dive into it, the more the answer to how things work is "shrug" (and "may change in the future"). It's clearly written by C people with a C mindset, trying to make a "better C" without letting go of the bad habits that C brings with it, and this is more evident the deeper you look. And I say this as primarily a C and Python coder.

Then there's the whole "reinventing libc" insanity, even on macOS (where no such ABI stability was guaranteed, but they did it anyway, and that ended up with a macOS update breaking all Go apps). On Windows they can't get away with that, so they use Cgo instead, and then we're back at the Cgo mess/overhead. This design decision is also, ultimately, how the vDSO problem happened.

I've also seen a tendency towards bloat in Go (see: the stories about Go binary size forever increasing). I find it particularly crazy that these days they need to have metadata that describes stack layout at every possible instruction in the program, to make their GC work.

I don't mean all of this in a "Go is a bad language that should go away" sense; it has things it does well, and if I ever have to put together a high-performance concurrent network server with no external dependencies I might choose Go. No language is perfect. I just find that, after ending up deep in the bowels of Go, I'm not really inclined to default to it for anything that isn't very clearly its forte.

I'm really looking forward to learning Rust, which seems much more serious in this regard, but I had a very different problem there; after writing a relatively simple Rust app once, I tried to engineer a more complex application in it, and ended up unable to wrap my head around how to make an abstract interface work with proper ownership rules. I should go back to doing something in Rust some day, something a bit less ambitious...

tialaramex · on Sept 4, 2021

Or maybe go back to the ambitious application and take a fresh run up at it. Maybe first identify one or two aspects you struggled with most and brush up on those, the Crust of Rust series does some nice worked examples, live but recorded of some harder to grasp features: https://www.youtube.com/playlist?list=PLqbS7AVVErFiWDOAVrPt7...

Still, even if Rust is "much more serious" there are coherency problems right at the heart of what we do still in programming computers. Notice that some vital elements of the core of Rust are basically hand-waved, for example the details of how Memory works. Other "serious" languages like C hand-wave this stuff too, typically describing a non-existent abstract machine and saying your program will perform as if on that machine, which is fine right up until your low-level code could not exist on such a hypothetical machine and instead very much depends on the details of the real machine the program runs on.

marcan_42 · on Sept 4, 2021

It's been a long time so I'm sure Rust is nicer now too, though that project is well in the back burner stack now so I'd probably want to find something else anyway (also, in the mean time, a platform that would make a nice base for that project is already being developed... with a Rust codebase... so I might as well wait for that to make progress).

And yes, you can shoot yourself in the foot in any language. The deeper you go into memory models, the more bizarre everything gets. I spent two weeks trying to understand ARM memory ordering guarantees, and that actually wound up with me finding an on-paper bug in Linux's atomic operation implementation on ARM64 (written by the ARM employee doing all this formal memory model work!), complete with a formal proof of the bug (though I'm not sure if any real CPU actually triggers it, so it hasn't been fixed yet, but the maintainer at least knows about it now).

To be clear, C is a bad language in this day and age too; I continue to use it because it's supported everywhere and I'm familiar with it and it does some things well like any other language. The reason why I don't use Go is because I find it does not fix enough of C's problems, and introduces some of its own. Rust does seem to do a much better job here, making much harder guarantees and having more well thought-out design decisions. I find it particularly unfortunate, as someone doing low-level embedded work, that Go is completely unfit for that purpose, as it has a hard requirement for an OS. Rust does not, which means I will be able to use it in almost any context I use C today.

nicoburns · on Sept 4, 2021

Definitely post on the /r/rust subreddit if you're new to Rust and struggle to architect your code. They're super helpful, and you often get responses from well known figures within the Rust community.

Perhaps also of interest, the Generic Associated Types (GATs) feature is due to land on stable soon (hopefully before the end of the year), which should bring quote a bit more flexibility to trait definitions, including around lifetime issues.

matheusmoreira · on Sept 4, 2021

> Go also happens to have a (rather insane, in my opinion) policy of reinventing its own standard library, so it does not use any of the standard Linux glibc code to call vDSO, but rather rolls its own calls (and syscalls too).

May I ask why you think this is insane? I think it's the most interesting thing about Go. It produces code that stands on top of the Linux kernel directly.

Are all languages doomed to be implemented in C and use ancient libc functions for everything?

marcan_42 · on Sept 5, 2021

The problem is the "write everything in Go" mentality doesn't result in the best result. Syscall interfaces aren't easy, and glibc has been doing this properly for ages; Go reinvented it and screwed up vDSO calls.

I want whatever programming language I use to allow me to make use of the best possible code available. These days, that means C libraries very often, especially when performance is concerned (this isn't a "C is fast" thing, it's a "people have already spent decades optimizing C libraries, often with architecture-specific SIMD and assembly" thing). Not only that, but this is required on Windows for syscalls. Go started off with this idea of prioritizing reinventing the world over a good FFI, and is now paying the price in overhead on Windows, because they couldn't reinvent the world there. If they had a good FFI, they also wouldn't have had to reinvent glibc, and would avoid the potential bugs that come with that.

In the end, I just don't agree with treating FFI as a second class citizen. You need to be able to interoperate well with the C world (Rust is a good example of this).

matheusmoreira · on Sept 5, 2021

> Syscall interfaces aren't easy

The Linux system call binary interface is pretty easy though. Put a bunch of values in registers, system call instruction, get result back in register. I'm surprised the Go compiler can't emit code that follows this calling convention. I've seen projects here that do just that and it's really nice.

I agree that the vDSO is more complex. It's a shared object that exports symbols with the C binary interface. Much harder to interface with.

> These days, that means C libraries very often, especially when performance is concerned (this isn't a "C is fast" thing, it's a "people have already spent decades optimizing C libraries, often with architecture-specific SIMD and assembly" thing).

I can respect that.

Go always had a focus on eliminating dependencies though. Using system calls directly allows Go executables to be freestanding, they don't even require system libc. I think this is an interesting proposition for systems programming. Great for executables that must work even when the rest of the system is screwed up.

> Not only that, but this is required on Windows for syscalls.

> Go started off with this idea of prioritizing reinventing the world over a good FFI

I completely agree with your criticism of Go here. They shouldn't have attempted this on other systems. I read your description of how some macOS update broke Go software because the kernel ABI changed. That's insane and should never have happened.

Linux is the only kernel with a stable system call binary interface, the only system where reinventing the world is possible.

> In the end, I just don't agree with treating FFI as a second class citizen.

Absolutely agree with you. Rust does this much better, it has proper support for foreign calling conventions. Don't think the vDSO problem would have happened in Rust. It also supports freestanding code!

Thanks for replying. I agree with you. I think this is a good idea on Linux specifically but as you noted it's problematic if applied to other platforms.

Abishek_Muthian · on Sept 5, 2021

For the first time I've seen some seemingly valid criticism Go (thanks Marcan) and some good counter reasoning in the replies.

To summarize, If I don't use CGO, Don't mind the increasingly large binary and run my Go production application only on Linux I wouldn't be facing the issues faced by Marcan?

matheusmoreira · on Sept 5, 2021

> If I don't use CGO

Well, the vDSO bug Marcan described happened because the Go runtime itself was supposed to properly interoperate with a C library but didn't.

https://github.com/torvalds/linux/blob/master/Documentation/...

> These functions are called just like ordinary C function according to your platform's ABI.

> Call them from a sensible context.

Go wasn't calling them from a sensible context. The functions didn't have enough stack space.

Not sure if there are more ABI problems like that in the Go runtime. Probably not since the vDSO is Linux-specific and it's already been fixed.

> run my Go production application only on Linux

You should be fine. Linux system call binary interface is stable.

https://github.com/torvalds/linux/blob/master/Documentation/...

If Linux is all you're depending on then you're good.

Abishek_Muthian · on Sept 5, 2021

Understood, Thanks for being precise and succinct.

> Not sure if there are more ABI problems like that in the Go runtime. Probably not since the vDSO is Linux-specific and it's already been fixed.

Thanks to people like Marcan who put the effort in debugging and getting them fixed. Although it's a shame that the time-frame wasn't reasonable to Marcan and it seems to have influenced his decision to not indulge in such activities further.

SAI_Peregrinus · on Sept 4, 2021

On Linux, this is fine. On some OSes (BSDs, MacOS, etc) the only stable interface to the kernel is libc. Linux doesn't doom you to a C interface forever, but the BSDs do. RedoxOS will likely require Rust forever. OSes don't have to define a direct interface to their kernel, they can (and often do) provide an official required abstraction layer.

matheusmoreira · on Sept 4, 2021

> On Linux, this is fine. On some OSes (BSDs, MacOS, etc) the only stable interface to the kernel is libc.

I agree with you. Linux is the only kernel with a stable user space binary interface and it's a huge reason why it's so interesting to me. It's possible to recreate the entire Linux user space using any language.

tgv · on Sept 4, 2021

I can see why: that's pretty deep and detailed. I'm sure the resulting threads have been pointing the finger at all imaginable components.

barsonme · on Sept 4, 2021

The problem here isn't with time.Time, it's with reflect.DeepEqual. Using it in your code to compare arbitrary data is almost always incorrect. And IMO, adding it to the stdlib was a mistake.

time.Time values should only be compared for equality using the Equal method (or IsZero), but reflect.DeepEqual does not know that.

This can happen with any type, not just time.Time. For example, I have a decimal library. It can represent 200 as either (mantissa=2, exponent=2) or (mantissa=200, exponent=0). Comparing those two numbers for equality will return true, but comparing them with reflect.DeepEqual will return false.

So, it's not a "nasty bit of undefined timezone behavior." It's somebody misusing an API.

tialaramex · on Sept 4, 2021

This feels to me like maybe a good example for operator overloading. Often the "good" case of operator overloading is given as some fancy numeric type, like Complex or Matrix or something, for which some arithmetic makes sense but the fundamental integer and floating point operators don't cut it.

But here is a case where programmer defined overloading of equality shows a benefit, if each type (including time.Time) takes responsibility for knowing how to compare that type then nobody else needs to know this detail and the natural comparison operators work, rather than needing to awkwardly define == and Equal separately and then forever live with people choosing the wrong one by mistake.

After all, in the absence of such an overload (and I agree having it would interfere with Go's goal of simplicity) this would likely happen even without reflect.DeepEqual because people want to do this, and "You can't do this safely" is unlikely to have better results in Go than other languages.

tialaramex · on Sept 4, 2021

Further thought: It's probably also necessary to spell out what exactly "equality" means for your operator so that programmers know what they're supposed to be implementing, and what they're getting from other programmers. It's interesting for example that Rust gets away with

assert_eq!(vec![1,2,5,9],[1,2,5,9]);

That's claiming a vector and an array are equal since they have the same integers in them. Neither of Go's notions of equality think these are equal but sure enough they do feel pretty equal in Rust, and in lots of places these would be interchangeable.

formerly_proven · on Sept 4, 2021

This excerpt from the Go docs is an excellent case for (some) operating overloading:

> Note that the Go == operator compares not just the time instant but also the Location and the monotonic clock reading. Therefore, Time values should not be used as map or database keys without first guaranteeing that the identical Location has been set for all values, which can be achieved through use of the UTC or Local method, and that the monotonic clock reading has been stripped by setting t = t.Round(0). In general, prefer t.Equal(u) to t == u, since t.Equal uses the most accurate comparison available and correctly handles the case when only one of its arguments has a monotonic clock reading.

wnoise · on Sept 5, 2021

The misusing the API did not cause anything. What it did is surface some unexpected behavior:

"But calling Format on a time object, for some format strings (such as time.RFC3339), needs to load this information, so will do so implicitly as a side effect. After such a call to time.Format, all time.Time structs with the time.Local location will include this info, when they didn't before.

This is the kind of surprising, spooky action-at-a-distance nonsense that well-designed libraries tend to avoid. Or they at least try!"

That's the actual complaint, and that behavior remains whether or not anyone calls DeepEqual.

sgarman · on Sept 4, 2021

I feel bad for that new guy. I hope that was not the actual conversation.

exciteabletom · on Sept 4, 2021

>I demonstrated outstanding leadership and blamed him without any further evidence.

At least the Author seems to know it wasn't OK.

papaf · on Sept 4, 2021

One thing I would recommend when someone blames a bug on you -- without hesitation, always say "yes, I will take a look at it".

This way if you have to push back and say it wasn't you, you have much more impact because you have investigated the problem.

Edit: but I agree, that is no way to treat a colleague.

sqldba · on Sept 4, 2021

Thanks! I felt the same way. Glad I’m not the only one.

I’ve had people attack me like that before and it’s not great. It’s always someone in a position of power, too.

shock-value · on Sept 4, 2021

It’s embarrassing and shameful behavior by the author, and even more embarrassing that he was willing to share it publicly. If it’s the culture of that startup that this kind of stuff is just typical day-to-day employee interaction, I don’t know who would want to work there, especially given how hot the job market is now.

zimpenfish · on Sept 4, 2021

Oh, this might explain the rare[1] local-only[2] "m=+0.00blah" test fails on deep `time.Time` comparisons we sometimes get talking to Firebase.

(We don't actually use `time.Local` anywhere but I'm guessing one of the support libraries does.)

[1] Once a month, maybe [2] But never on the CI - only when running the tests on developer machines

pphysch · on Sept 4, 2021

Storing timezone info by reference, and having dynamic timezones (like Local) are both great features for dealing with the messiness of time. But there are always tradeoffs...

SAI_Peregrinus · on Sept 4, 2021

Unexpected for go, but very much like what happens with C locales when they change.

ilammy · on Sept 4, 2021

Every once in a while I open this commit [1] and read it, just to remind myself to try and avoid making stupid decisions within my power, hopefully preventing my APIs from becoming a standardized mess that C locales are.

[1]: https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f02...

pronoiac · on Sept 4, 2021

> But locales give a delicious sweet spot of brokenness, where things are broken enough to cause neverending pain, but not broken enough that enough effort would have spent to fix it completely.

Oh, this feels familiar. I most recently had this feeling when reading about the git staging area, where that term is common everywhere but the man pages.

wereHamster · on Sept 4, 2021

Why is implicit global state still a thing in modern languages?

tapirl · on Sept 4, 2021

> Reads from a closed channel block, sends on a closed channel panic

"never block"?

wahern · on Sept 4, 2021

Maybe they confused closed and nil channels. Reads from a nil channel block, as do sends.

hajhatten · on Sept 4, 2021

Why not just explicitly use UTC everywhere?

solidsnack9000 · on Sept 4, 2021

Yeah, reading this article, it seemed like using a wildcard time zone was the real bug.

dblooman · on Sept 4, 2021

That message exchange screenshot is unfortunately seen too often in companies.

j16sdiz · on Sept 4, 2021

If you ask me, the problem lies in SQL datetime (without timezone) data type, not golang.

jbaviat · on Sept 4, 2021

Je suis en train de me faire un devis pour

purplerabbit · on Sept 4, 2021

Really loved the voice of this article. Very funny.

Just wrapping up a read of “the go programming language”, and this article called out literally every aspect of go I was surprised/annoyed to discover. The validation is much appreciated

silisili · on Sept 4, 2021

Just for reference, complete opposite for me. I found it rambling and missing the mark.

I expected a simple repro case exposing the bug. I got many paragraphs of random relevance and still not positive what the bug actually is. Even a link to an issue would have been helpful.

blowski · on Sept 4, 2021

And the memes. A couple of particularly relevant and funny memes is ok, but this article had a lot that were neither.