In a very globalised world, where your developers can be working in a multitude of countries, I've found that the only sane thing to do is a) run all your boxes with a TZ of Zulu and b) the only date format allowed in logs is RFC-3339.
Sure, you can format datetimes in localised formats for end users, but standardising on UTC+0 and RFC-3339 for programming and maintenance purposes just prevents so much confusion and hassle.
Not that I've ever had to spend a day trying to troubleshoot a bug that was alerted by our monitoring system at 2020-11-09T09:11:05Z (as it was defaulting to UTC), but the relevant logs were timestamped at 10:11:05 09/11/2020 (I only wished they'd used the normal German period separators in the date to give me a clue) because of course your German colleagues want their timestamps in German format and defaulting to CET...
When you have colleagues in Europe, Oceania, and distributed from coast to coast in the US, doing everything in UTC means everyone only has to do one mental conversion from their local timezone to UTC, as opposed to trying to remember if they're in LA or New York, or if they're in Mountain Time, if they're in the parts of the US that are Mountain Time but don't do daylight savings.
> Not that I've ever had to spend a day trying to troubleshoot a bug that was alerted by our monitoring system at 2020-11-09T09:11:05Z (as it was defaulting to UTC), but the relevant logs were timestamped at 10:11:05 09/11/2020
It gets even more fun when such events happen during the switch between daylight saving time (like CEST) and standard time (like CET).
When I worked closely with Germans there was a special time of year where our daylight savings changes overlapped, so for a few weeks we'd have a 10 hour time difference, then an 11 hour time difference, then 12. We tried to minimise meetings during that period, as someone always got the timing wrong.
CloudWatch at least lets you choose in which timezone to display datetimes. Other AWS services don't let you choose and don't even show in which timezone they display datetimes, which is somewhat infuriating to me, when I'm in a hurry anyway and suddenly have to wonder which timezone the shown dates are in.
I also remember that AWS used to display some datetimes in Pacific Standard Time, which is completely useless to somebody outside the US, however I haven't seen that in a while anymore.
Similar to the global language switch I wish the AWS Management Console would allow setting in which timezone to display datetimes globally as well.
Or zero mental conversions because it's trivial to write a simple script that ingests RFC-3339 and spits it out in your local time.
"2020-11-09T09:11:05Z" uniquely identifies a point in time. "10:11:05 09/11/2020" is some time 11 minutes past the hour on either September 11th or November 9th. It doesn't identify anything.
The first format also makes it easy to do counts of # of entries within November 2020. The Second makes life considerably more difficult (not impossible though).
I did always love a quick and easy `WHERE ts like '2020-09%'` when flinging data into Athena or Spark SQL etc. and treating the column as text for ease of loading.
Am I the only one who would expect the behavior in the top example?
I don't want my equality operator doing timezone conversions. If one type has no timezone attached and one does, then you probably shouldn't be able to compare them at all. Likewise, for a type that has a timezone attached, if the timezone differs between a and b, then I want a == b to return false, even if it represents the same instant in time.
Equality can be a tricky thing. Named functions are the way to go when two different developers might intuit different behaviors.
The problem is not that the equality behavior is wrong, it’s that I would expect “utcnow()” to return a timestamp with a UTC timezone. The problem is that it does not, it has no timezone at all, and therefor is generally less compatible than creating a timestamp with timezone from a utc timezone object.
I work on the Python API of a timeseries database and it’s very frustrating to have to accept timestamp objects without time zone because you don’t know what to do with them. People aren’t aware of this constraint, which leads to unexpected behavior.
Timestamps must have an implicit timezone, otherwise they are meaningless, because they cannot identify a certain moment in time.
Only time differences do not have a timezone.
When Python continues to use the word "timestamp" for a time value with an unknown time zone, then that is the fault of Python, because the name is completely inappropriate.
I assume that whoever enabled a value "None" to be possible for the timezone did not really expect that anyone will ever use a timestamp with unknown timezone, but that in such cases there is a timezone known to the programmer, which is not stored with the timestamps, and that it is the responsibility of the programmer to restore the timezone whenever necessary.
In this case we're talking about Unix timestamps though, which count the number of seconds since 1970 UTC. Adding a timezone to it is nonsensical. The number is always the same, no matter which timezone you're in. It's in the conversion to year-month-day, etc. that timezone becomes relevant.
That's not exactly right. I don't live in UTC, when I use timestamps they don't automatically (whether implicitly or explicitly) refer to UTC.
A Unix timestamp is a difference in time, not a time point. It's a number of seconds. It's a duration, not an instant, which is why it doesn't in and of itself need a time zone attached.
That being said, people do use timestamps to represent time points sometimes, and that does requires picking a particular timezone. That's fine too, but it's not the only use or meaning of a timestamp. It doesn't make timestamps without time zones meaningless, and doesn't mean they implicitly refer to the UTC epoch.
Durations and time differences are timezoneless numbers. When you add a duration to a time point (like the Unix epoch in your preferred timezone), you get another time point, in the same timezone.
> A Unix timestamp is a difference in time, not a time point. It's a number of seconds. It's a duration, not an instant, which is why it doesn't in and of itself need a time zone attached.
The number of seconds since what, exactly?
It appears to be the number of seconds since Jan 1, 1970, at midnight... in some timezone.
The point is that the question 'since what' is a misleading intuition, it's not at all essential or implied.
You want to know how long something takes. You take a timestamp before and one after.
A timestamp since what?
Well, by definition, the number of seconds since the unix epoch.
Which unix epoch, exactly? In what timezone?
Unspecified. You could have picked any timezone, and the result would not change, as it is not essential. Even if there is no timezone, not even one that you picked in secret, it still works.
Asking the question for durations is a subtle error. You only need a timezone for a time point. Timestamps are often immediately converted to time points, but that is neither necessary nor inherent.
This seems just like kicking the can down the road, pretending that time zones don't matter because you don't want them to.
If I'm comparing system time one year ago to system time now, that's a duration, and the timezones don't matter. But if I'm not comparing, just recording, then the point of comparison is the unix epoch, which is a point time.
Time zones don't matter to durations, only to points in time, okay. The unix epoch is a point in time. Saying it isn't a point in time, or that its timezone doesn't matter doesn't actually make it so.
"It's the very precise number of seconds since an event that took place 52 years ago, and it has a margin of error of 86,400 seconds," sounds crazy to me.
Comparing seconds-since-epoch "now" to the same from a year go: timezone doesn't matter so long as I'm consistent. Pick UTC, pick CDT, pick nothing, just do the same thing both times. This is the vast majority of usages of seconds-since-epoch, I think.
Comparing seconds-since-epoch "now" to epoch: in the abstract, timezone still doesn't matter, except, well, how do use the same timezone as epoch unless you know which timezone epoch was in? You could be off by 82,800 seconds! In practice, assuming that the epoch was Jan 1, 1970, midnight UTC seems to work for those rare cases, leading to the widespread (but technically incorrect) belief that epoch is in UTC.
If you refer to that date, then UTC. But you may just as well say that it's the number of seconds since 1969-12-31 7:00 pm EST. There is nothing special about UTC wrt Unix timestamps.
I agree with you and I think the above poster needs to adjust their thinking.
What they're saying is correct in the most technical sense, but our industry has landed on timestamps not needing timezones. If they want to work in this industry, they need to accept that.
It's ok to be surprised by something like this, it's less ok to argue about it.
That is true, but a timestamp is intended to specify a particular time, and a simple temporal interval fails to do that just as a simple spatial interval fails to specify a location; both are incomplete (the former on account of a missing starting point, and your counter-example on account of a missing azimuth.)
On the other hand Unix (POSIX) time does not have leap-seconds, while UTC does not, so it is no longer "in" UTC.
POSIX time isn’t “in UTC” regardless of that. UTC is just one of many timezones you can use when specifying which point in time timestamp 0 refers to. You could just as well use EST or anything else.
You are right - through this discussion, I have come to see that unless a value expresses a specific time (as opposed to interval) in a form that includes hours and minutes, then timezone is not an applicable concept. POSIX time does not do this.
So the analogy is "Distance from London in feet" to "Time since epoch in seconds"? Ok, but "epoch" has a timezone, which is UTC, so measurements relative to that have the same timezone.
It doesn't. Equivalent definition of Unix timestamps would be "seconds since 1969-12-31 7:00 pm EST". It doesn't mean Unix timestamps have an implicit timezone of EST.
This does exist as a concept, though I agree that "timestamp" is not a good name for that concept. The Java standard library calls it LocalTime. It is necessary to attach time zone information in order to determine an Instant.
The big problem with the python API, in my opinion, is that they conflate these two very different concepts in a single type.
utcnow() does not return an integer timestamp, it returns a tuple including hours and minutes values, in both Python 2 and 3. Regardless of whether you regard Unix (POSIX) time as implicitly timezoned, the return value of utcnow() unambiguously is.
Django solves this by having the default timezone in the framework settings. Any datetime without a timezone is assumed to be using the default one. If you don't set a default timezone, the server timezone is used, and you are responsible for making sure your data is congruent.
Yes I know, but that doesn’t help pretty much any of our Python users that are all heavily reliant on Pandas and/or definitely not writing web services.
As of now, we just reject any timestamp without an explicit timezone and try to explain to our users why utcnow() is bad. People are generally very surprised that this issue exists at all.
Yeah naive datetime should not even exist in high level API IMO. They're like bytes string with no encoding or geospatial data without a projection name field.
A timestamp doesn't require TZ data because the timezone of the unix epoch is defined in UTC. To convert a timestamp into a meaningful datetime value, we need a timezone, otherwise there is loss of information.
The main issue is that .utcfromtimestamp and .fromtimestamp both return naive datetimes, instead of aware ones with the timezone property set to UTC or the local timezone respectively.
If I’m calling .utcfromtimestamp then by the name I expect it to be doing _more_ than just attaching a time zone (which it also doesn’t do). The only other behaviour that seems sensible is that it’s doing… some sort… of conversion. Exactly what I’d need to look up at the time I wanted to use it.
I have, however, been using python so long that I’ve gotten used to the way things sort-of-work and developed such a healthy caution around it’s unintuitive built-in time zone handling.
It's actually worse than I had initially suspected. Pythons' dateutil timezone handling is, depending on your opinion, either hard to reason about or just outright broken - certainly it tries to be clever in unintuitive ways. This is why there is a couple of external libraries, like dateutil to try and make timezone things easier - which is even mentioned at the beginning of pythons' datetime documentation!
In this case, I believe that it's actually the inverse from what I thought - utcfromtimestamp is doing... nothing; but is different by the fact that nothing is different from everything else. It looks like .fromtimestamp builds a "naive" datetime in local time. So, it assumes the timestamp is UTC and "helpfully" converts it to the local date and time. When you do .timestamp it inverses this process and gives you back the timestamp value in UTC by converting implicitly from your local time zone. Thus, if you do .utcfromtimestamp it doesn't do this conversion, and gives you a naive datetime "in UTC".
This is horrible and I fully support the recommendation never to use it.
The comparison is being done between x_ts, which is a float, and x, which is a datetime object. Call be crazy but I think allowing this comparison to begin with is the problem.
That's not the code from the article. The code in the article is comparing 2 floats. Though, equality operator would return `False` for comparison `x == x_ts` as well, instead of throwing an exception.
> Likewise, for a type that has a timezone attached, if the timezone differs between a and b, then I want a == b to return false, even if it represents the same instant in time.
Really? So if you want to compare that two times are really the same time for two different timezones, you'd want to convert them to UTC first? What's the use case for this?
You don't need to convert them to UTC, just one to the timezone of the other, but yes. "a.is_same_instant(b)" is a lot more clear to me than "a == b", especially in a dynamic language where it might not be obvious what the types involved are.
But an API should never get into a place where your personal choices should matter and various incompatible interpretations potentially make sense to different people.
Noone else has to guess (or go and inspect what an overloaded comparison operator actually does) if there is a meaningful name attached to the comparison, like in the parent comment.
And that reason is that there is a canonical definition of what it means for two numbers to be equal.
The fact that your intuition of "these two timestamps are equal" is "these two timestamps denotes the same instant" seems problematic when we know that the notions of timestamps and time are not in fact aligned (because, for example, of leap seconds).
It would be turning a really bad API that is a constant source of errors into a slightly better and less error prone API. It's totally possible that "pythonic" means "error prone", but that would not be something to tout proudly if so.
I think a better API would be to have a different types for timezone- naive vs. aware datetimes, and a canonical way to convert either of those types into an instant in time (requiring a timezone be attached for the naive type). Then the instants in time could be compared with the equals operator without confusion.
> It's totally possible that "pythonic" means "error prone"
Those who say that don't know what it means. And given how much I have to deal with crap APIs that disregard those principles it's safe to ignore discussions leading elsewhere
Strongly implied but not explicitly stated unless I missed it: the actual surprising behaviour here is that utcnow returns a naive datetime, not a timezone aware one, despite the appropriate timezone to use being obvious.
I was about to argue for your initial version! I would say that utcnow() is marginally acceptable (though only on backwards-compatibility grounds), but only if the conversion from a naive time to aware time raises an error if done without an explicit timezone specification. It is the combination keeping utcnow() as-was (or, more generally, the concept of naive time at all), while introducing a default for the conversion, that turns it into a subtle trap.
Unfortunately, now that the default has been introduced, there seems no way to get to get out of this situation while preserving backwards-compatibility. The lesson here, perhaps, is to not make it easier to use that which should have been deprecated.
Technically correct (best kind of correct) but irrelevant here. "utc" (lowercase) is a timezone in the sense that it is a named instance of python's `timezone` class provided by the standard library, and the obvious choice if utcnow() were to return a timezone-aware datetime.
It's compatible with the datetime api, but it has sane default, nice tools to convert between timezone, some cool date adjustment stuff, and can humanize time in several languages.
Basically, datetime has the same problem as text vs raw bytes in python 2.7, except it has never been fixed.
Timezones aren’t really the problem, implicit locales are the problem. All this stuff would be a lot easier if timezones always had to be stated explicitly.
> All this stuff would be a lot easier if timezones always had to be stated explicitly.
On Java, you can use the forbidden-apis build plugin (https://github.com/policeman-tools/forbidden-apis) to fail the build whenever a timezone or locale or charset is not specified explicitly (it forbids the methods from the Java API which use an implicit timezone/locale/charset). I don't know whether there's something similar for Python; it might be harder because Python is much more dynamic (though it might be possible to use monkeypatching to warn whenever the bad methods are used).
The real problem is that datetime.datetime.now() returns a naïve datetime.
If only it returned something that contained the UTC offset in which is was created, then all of this would be much easier to deal with. Conversions to timestamps would just work, and comparisons would just work.
Note that I said "UTC offset", not "timezone". datetime doesn't deal in UTC offsets, it deals in timezones. And timezones are a zillion times more complicated, which is why no one want to deal with them on the critical path, and no one has ever proposed having now() return an object with .tzinfo set. Logical though it may be.
This kind of stuff really needs to be made explicitly clear in any/all API documentation as it inevitably leads to underlying issues later down the line. It's amazing how something as "simple" as official time/date libraries still have footguns like these.
I'm not bashing anyone that's involved as I realize we're all human and prone to mistakes, I just wish we could all do better.
Both datetime.utcnow() and datetime.utcfromtimestamp() have red box warnings in the documentation. People copying code from StackOverflow won’t see them, of course.
I should mention that the warnings are relatively recent, added in 3.8.
I suspect that the author of Jodatime (which effectively became the java.time API), who was notorious for being almost Linus Torvalds like in his attitudes and approaches, was probably once a very nice and kind individual. Until he started implementing a datetime library.
I mean, he nailed it, but at what cost? java.time.* is my second favourite datetime API, the first being Postgres'.
I'm actually surprised the source code of Jodatime isn't entirely in Zalgotext, technically it'd be valid Java code (I'm pretty sure, anyway).
It doesn't surprise me. Timezones - and dates and time in general - are extremely complicated. Deceptively so. Although we interact with timezones, dates, and time every day, we don't think about layers of complexity and edge cases. More importantly, we don't practice them.
We have courses about compilers, databases, data structures, algorithms, cryptography. It's surprising we don't have courses about dates and time.
It's not surprising they didn't make an appearance in the academic world. They are utterly boring. Insanely complex, of course, but there's nothing that can be built upon. Everything in a CS curriculum is an extendable domain
True but often you need something to explain at a bit more of
an overview level. Intelisense helps when you know the object and
want to explore the methods.
Most languages have something similar, that's why it's important to use static analysers as part of your process to identify usage errors. Everyone is human, mistakes happen. Having tooling as a backup is great.
"If self is naive, it is presumed to represent time in the system timezone."
presumed is kind of a bad word in Python. I would never write an API today that "presumed" something that can be lots of other things and I would not allow any tz conversion on a naive datetime without requiring the existing known timezone be passed.
As mentioned in the article, this was a deliberate change in Python 3. If you read this article: https://blog.ganssle.io/articles/2022/04/naive-local-datetim... you will see why this actually makes quite a bit of sense, given the design constraints the authors were working with.
I tried to read it, will try again later but I wasn't really getting it. the idea would be, .astimezone() is bad, and should be something equivalent to .convert_between_timezones(from, to), where there's some convenient way for "from" to indicate "the system time zone", but you still have to be explicit that you consider this arbitrary number to be in a particular time zone.
not following why the "this is the system timezone" must be hardcoded to be an invisible assumption, as opposed to something somewhat explicit. I mean this is literally not far off from a simple namechange of ".astimezone()".
Actually it's sticking with backwards compatibility that is causing most of the problems.
It's not hard to come up with an improved library design. But it is extremely hard to improve it while remaining compatible with the billions of lines of Python code out there.
What libraries are folks using for dates/datetimes these days in python? I've been using arrow but I believe it also has some issues and doesn't work well with pandas/etc.
Depends what version python you are on and what work you are doing. Our application deals with user calendars and scheduling across multiple calendar providers (google, microsoft, zoom, you name it), so I deal with the full brunt of timezone complexity all the time.
I use the stdlib datetime.datetime with pytz, with a few in-house helper functions, and it's frankly kinda awful across the board but we deal with it.
Pendulum is really nice, and I want to use it as my daily driver, but I unfortunately found some timezone bugs [1] in v2 that make me hesitant to trust it for what we are doing in production (it screws up DST when doing `.utcoffset()`). If you aren't constantly dealing with data timezones from all over the world, it's probably fine, and it's the only one that the least surprising thing at each step.
Pytz has the whole .localize footgun.
The std datetime lib has horrible ergonomics and can't even cope with IANA timezones until the zoneinfo module was introduced in 3.9. Every single day it bugs me that the classes `datetime` and `time` are lowercase, and there is no consensus on datetime.datetime vs from datetime import datetime.
I tried using arrow and while I applaud them for moving away from the horrible datetime class, that makes it pretty much incompatible with any other time library, including things like pydantic and orjson.
Seems to have started in 2.x but is fixed in the 3.0alpha. You have to be specifically using utcoffset, which in general is probably not something you should do and use the timezone types to do conversions, but it was enough to give me pause
Whenever possible I "shift right" and return a unix timestamp. If I have to read something in I try to understand the context and document that context, and turn it into a unix timestamp.
Mostly I'm not operating at the application / presentation layers.
This does cause friction when taking things at "interface value" (Sherry Turckle): just like everybody (I'm sure) I `grep -E '^2022-10-09'` or `grep -E '^Oct 09'` (looking at YOU, journalctl!). But I can also `perl -ne 'BEGIN { use Time::Local; $TIME = timelocal(0,0,5,7,8,122); } $t = (split /\s+/)[0]; $t > $TIME && $t < ($TIME + 60) && print;'`.
I have a little python script which does pretty much the same thing, with some attempts to recognize and fill in for rando date patterns just to make it a bit more universal.
I would counsel against storing human readable dates in databases and use unix time.
Arrow makes the unfortunate conflation of Date, Time, and Datetime types, which causes all sorts of subtle errors when working with Dates that don't have times attached, and vice versa.
The best Datetime lib I've seen in any language is Chrono in Rust. Maybe there's a way to port it to Python via FFI.
This article does not make any sense. You don't get a timestamp from nowhere without knowing the kind of time it references: local or UTC or fixed to anything like client time.
For python the important thing to understand is the difference between tz aware and naive datetime objects. But after you can do depending on your needs.
For example, if you ensure to convert your inputs to UTC and output UTC also, you can easily just work with utcnow by only dealing with UTC internally.
You would have to do that to store datetime in MySQL for example.
> You don't get a timestamp from nowhere without knowing the kind of time it references: local or UTC or fixed to anything like client time.
Imagine you have a software which deals with CSV files sent by clients. Clients are international corporations. What's the timezone the datetimes in the CSV files belongs to?
Something has generated this file no?
Timestamp has nothing to do with the problem.
If you get dates without timezone or any info with your CSV, you can't deduce anything. That has nothing to do with the api used in python to decode it...
I was familiar with the issue described, but it was useful to learn the shortcut of passing a timezone to `now()`, since I used to rely on `utcnow().replace()`, which was needlessly verbose.
# what I used to do:
>>> datetime.utcnow().replace(tzinfo=timezone.utc)
# what I'll be doing from now on:
>>> datetime.now(timezone.utc)
Okay but my most common case for datetime.utcnow() is that I want the time in UTC for an external system that doesn't understand timezones, you really want me to do datetime.now(tz=timezone.utc).replace(tz=None)?
Sure, you can format datetimes in localised formats for end users, but standardising on UTC+0 and RFC-3339 for programming and maintenance purposes just prevents so much confusion and hassle.
Not that I've ever had to spend a day trying to troubleshoot a bug that was alerted by our monitoring system at 2020-11-09T09:11:05Z (as it was defaulting to UTC), but the relevant logs were timestamped at 10:11:05 09/11/2020 (I only wished they'd used the normal German period separators in the date to give me a clue) because of course your German colleagues want their timestamps in German format and defaulting to CET...
When you have colleagues in Europe, Oceania, and distributed from coast to coast in the US, doing everything in UTC means everyone only has to do one mental conversion from their local timezone to UTC, as opposed to trying to remember if they're in LA or New York, or if they're in Mountain Time, if they're in the parts of the US that are Mountain Time but don't do daylight savings.