Hacker News new | past | comments | ask | show | jobs | submit login
Setting the clock ahead to see what breaks (rachelbythebay.com)
356 points by ingve on Jan 23, 2023 | hide | past | favorite | 138 comments



Most developers are basically powerless here, they use the standard libraries for time provided by their language, and those have to deal with what they get from the kernel. There's nothing they can do about timestamps in the filesystem.

For anyone designing new systems that store or transmit time, you should be aware of TAI64. https://cr.yp.to/libtai/tai64.html

Unless you are dealing with scheduling, or generating events with a particular cadence (e.g. daily, weekly, monthly) there is no reason to include UTC or time zones in your data model. They are view only concepts, with conversion to the local display format happening from TAI64 on the way out.


> There is no reason to include UTC or time zones in your data model.

That is a bold claim. Not everything is about the entity viewing the data, sometimes it is about the entity producing the data. Not storing the timezone is losing some information.

A few examples:

- The "Date" header in emails include the timezone (or UTC offset) of the sender. It allows the recipient to know whether the email was written in the morning or the evening.

- A SQL query that builds a report of restaurant orders per hour needs to normalize for time of the day in local time otherwise it is not possible to know whether people order lunch at 11 am, midday or 1 pm.


This is not only a bold claim, but utterly wrong. Every single datetime you’re storing should absolutely contain a timezone, preferably UTC.

I‘ve been bitten by dates lacking timezones in various situations during my career and it has always been a pain in the ass to debug and resolve all those random problems they‘re causing.

Do. Save. Dates. Containing. Timezones.


My understanding is that you don't even want to store them as UTC and always store them in the timezone of the user/system that generated the data since you lose some information in the translation from local timezone to UTC - given that the timezone database changes, you may actually want to know that the record created in Fiji before 2022-10-28[1] was stored in DST so that it can be properly converted to the new Fiji time.

[1]: https://data.iana.org/time-zones/tzdb/NEWS


This! People mistake the offset for the timezone, which is incorrect. The offset isn't very useful other than a guestimate of the originating timezone.


Ideally store both, UTC and original local time, or at least the original TZ information. Time and timezone are a huge mess and one of the worst things to try deal with in programming.


In the airline industry they sling huge numbers of dates and times around, and they never include the timezone. What do they include? The airport code. Boarding, departure, arrival, and all other times are always local to the airport. In practice, that means that every piece of code that wants to handle datetimes has to also have access to a way to lookup the timezone for the provided airport code. Have a time at LAX? look up LAX, find the timezone, and compute the offset.

In practice, almost nothing does TZ conversions or datetime math. In effect, airline timestamps are a tuple <datetime, airport_code>, and nothing much messes with that.


Doesn’t sound great. It‘s also an industry plagued by legacy systems.


Oh, it's not great at all. The only upside to it is that passengers looking at schedules and boarding passes don't have to do any timezone conversions for long flights. The departure time is always the time at the origin, and the arrival time is always the time at the destination.


Until you add a flight to your calendar and have to look up the timezone.


I am that donkey banging it's head on the same post; when I see dates stored without TZ, I assume it's UTC and have done so for decades. It's turned out to almost never be true.


> preferably UTC

Yes! I have spent so much time detangling timestamps in data which used non-UTC times and non-standard formats. Like 06-10-01 WTF is that? Use UTC and then let the reader translate to local time on display.


> [...] sometimes it is about the entity producing the data

so should we not store that data, then?

> The "Date" header in emails include the timezone (or UTC offset) of the sender [...]

i don't think that's strictly true - at best it may contain the timezone configured on the local machine when the e-mail was sent (what happens when you fly?), at worst it always contains UTC anyway for privacy concerns. in either case i don't think most people know this is a field that is transmitted, and it's presumably something the sending/recieving party likely already know each other (for personal communication)

> A SQL query that build a report of restaurant orders per hour needs to normalize for time of the day in local time [...]

i'd posit that that data should be stored principally then, surely?

fundamentally, why should the timestamp contain two data points? i wouldn't expect the database to have a single column for, say, my user id and registering user-agent string, so why would i expect it to combine a point in time with a vague location?


Ultimately my comment was about entropy: a date with a timezone contains more information than a date alone. The technicalities about database columns, inaccuracies or privacy implications do not change this fact.

I argued that keeping this information instead of discarding it is often useful and gave a few examples. I understand that sometimes systems can get by with only storing local dates or UTC dates but sometimes a system needs to be able to deal with both and keeping these extra bits of information is the only way.


I've had pushback on items like this sometimes with the standard "YAGNI" excuse. A few months ago I came across some blog post describing the reverse: "YAGRI" - You Ain't Gonna Regret It. Just... store the extra data.


of course, if you need it then you need it.

i think the original commenters point only talks about how bad TSTZ is as a storage format, and that TAI is not only better than timestamp with time zone, but also UTC in almost every measure - something i agree with - not that time zones are useless. if you need the time zone to be recorded, i think the most sensible thing would be to record it independently.


recently did a project where some items were "future dates" - scheduled appointments for days/weeks/months in future, and were for entities in different time zones, and people making the appointments were occasionally in different time zones. The latter part was rare, but needed to accomodate that.

Storing the appointment date/time as ... a date/time without timezone (timestamp without tz in postgres IIRC), then storing the timezone as a separate string - that gave enough flexibility to accomodate any situation that came up.

"Wall time" - the date/time and tz separated - for future dates worked. Someone else wanted to store the date/time as 'utc timestamp' after the date had passed. "Wall time for future, UTC for past" was the motto, but... there wasn't a clear way of looking at the data to know which one you should be using immediately. I argued for consistency in the original data - if you want another view of it... make a db view that has the 'utc timestamp' conversion in it.


This guy gets it. Thank you for explaining better than I did.

If someone wants to reveal their location, why wouldn't they include latitude as well? Location is a separate piece of information. If the idea is to give the reader an idea of where the sun was in the sky when the writer crafted the timestamp, isn't latitude also important?


> it may contain the timezone configured on the local machine when the e-mail was sent (what happens when you fly?), at

Most people have their system clock update automatically


indeed, but i don’t think they are able to retroactively update the time stamp of an already sent email :)


Why would it?

"It allows the recipient to know whether the email was written in the morning or the evening."

I guess sure, you could write it in the morning then send it in the evening. Most people write and send at the same time, and thus most headers have the stamp of the time it was written.


On your second example, you should be able to do that without storing the timezone of each event. In fact, odds are the timezone of each event will mislead you and break your analysis.

On the email example, the timezone is there for debugging purposes. It's really not for knowing if the email was sent in the morning, and if you use it for that, you are almost guaranteed to get broken data.

Of course, there are reasons to include timezones on your data. That's what you get right. But if you want the timezone of the data producer, you should store it for the producer, not for the data.

(And then there is the issue that the SQL standard dictates that dates without a timezone are broken. So you shouldn't ever store them on a SQL database. But that one timezone is useless, if you need that information, you should store it elsewhere.)


Your sql query example doesn’t seem to actually need timezones. If all orders are stored in the correct UTC regardless where they come from. An order is always made at a specific point in time that is the same globally.


> Under many cosmological theories, the integers under 2^63 are adequate to cover the entire expected lifetime of the universe; in this case no extensions will be necessary.

Pessimists!


Indeed. As they say, eternity is very long, especially towards the end. For proof, check out this video that explains how eventually even protons will give out and evaporate.

The thing that blew my mind: when things started getting very weird, and I was thinking "ok but we're roughly at the end now", I checked the indicator. It was only half-way. It is a 30 minute video. And the time speeds up logarithmically.

https://www.youtube.com/watch?v=uD4izuDMUQA


AFAIK, the most accepted theory is that the Universe will allow for useful computation forever. Always get slower, but adding up to an unbounded amount of it.


Only if you use seconds, unlike the JVM which uses milliseconds.


This bites you in the ass in embedded devices, which then have strange behaviours every 49 days.


Experienced this first hand on an embedded device with an RTOS with a 5ms tick. The thing was locking up every 124 days (2^31 ticks). That’s long enough where it had to happen 2-3 times before someone put 2 and 2 together.


In that case, use RFC2550. https://www.rfc-editor.org/rfc/rfc2550 .

> As discussed in 2.4.1, the end of the universe is predicted to occur well before the year 10 * 30. However, if there is one single lesson to be learned from the current Y2K problems, it is that specifications and conventions have a way of out living their expected environment. Therefore we feel it is imperative to completely solve the date representation problem once and for all.


The moment your data is produced in more than one time zone, you're lost without the time zone. This may happen if you have customers in more than one city, or your customers use a mobile device to access your services.

Maybe your CLI utility can be timezone-oblivious, but those who develop and operate systems mentioned above, or just happen to travel, would be more happy if you handled timezones.

Naive timestamps are fine as long as they do not leave a single machine, and are used to calculate durations, timeouts, etc. These do not depend on the absolute value a clock shows, but only on differences between such values.


You just have to store the user’s time zone as a preference and then convert to that time zone from UTC.


That is usually fine. It is generally good for system generated times representing the past. You can store in UTC and display in in any zone.

It is not good for user entered historical times that they expect will remain exactly as they entered. Even if the user specifies a specific timezone, for long enough ago timestamps the TZDB does get occasional updates to past timezone data to better reflect what actually happened.

If the user entered 10 AM America/New_York for a long ago date, you convert it to UTC, and then the TZDB updates historical timestamp info, based on newly uncovered evidence, now it might appear as 9 AM or 11 AM or whatever.

It is also insufficient for user entered future times. If the user enters a time for an event 5 years from now in their local timezone, and the local DST rules chance, they would typically still want the event to occur at the local time, which means converting to UTC and then back won't work.


Yes, people should use a TAI-correlated time for timestamps and events corresponding to the present or past (the future is a different story). However, they should not use TAI64 since, even though this is my first time seeing it, it is clearly a antiquated format. It is literally using a big-endian(!), 63-bit, unsigned (but biased) integer to encode time. Every single modern computer is little-endian by default and the usage of a 63-bit unsigned integer instead of a [64-bit unsigned integer] or [64-bit signed integer with reserved upper and lower values] is bizarre.


It's not helpful to criticize aspects of a system which have good reasoning behind them, without understanding and presenting that reasoning in good faith, and providing a better alternative.

Endianness of formats is not a concern on modern CPUs since we have efficient mechanisms for adjusting endianness on load and store. The choice of biased 63 bit integer ensures the the same arithmetic can be performed using both a signed and unsigned 64 bit integer. The JVM comes to mind as a system without support for unsigned 64 bit integers (maybe they've fixed that by now). As mentioned the upper values are reserved for extensions, meaning the most significant bit could be set to switch to another interpretation, potentially when migrating to something more modern. So far it's unused, just evidence of good forward thinking protocol design. Additionally, big endianess allows the format to be sorted using string comparison.


Even if I agree with the thrust of your point, I think it’s too much to claim GP is not criticizing “in good faith” just because their critique wasn’t as defended as strongly as you’d prefer.

Seems like a misuse of the term, unless you think GP had ulterior motives behind their critique of a datetime standard format.


Network formats (at least the popular ones) typically transmit in "network byte order", i.e most significant byte first. There are advantages in being able to read a hex dump without manually reversing the bytes in your head for every field that is two bytes or larger.

Far from antiquated I would consider the selection of big-endian formats as an indication they actually gave some thought about what they were doing, ranking debugging convenience higher than a marginal efficiency gain on some cpus. The same consideration generally applies to all binary formats intended to be portable across systems.


Ah hackernews -- where someone confidently decries a timestamp format designed by the world's most accomplished applied cryptographer as "bizarre" and "big-endian(!)".

Did sorting ever occur to you?


Do the CPUs you use typically not have instructions for comparing little endian 64-bit integers?


It can be convenient to be able to just sort as an opaque byte string.


On embedded systems? No.


Are they big endian or little endian? If they are little endian, is it cheaper to perform 8 1-byte compares, 4 2-byte compares, or 2 4-byte compares?


> Most developers are basically powerless here

In fact, there are so many corner cases in date/time, trying to calculate anything like that yourself is probably as ill-advised as rolling your own crypto.


Obligatory xkcd: https://xkcd.com/1883/


> For anyone designing new systems that store or transmit time, you should be aware of TAI64

I wouldn't hold your breath when it comes to TAI64 adoption.

We live in a world where people still debate whether leap smearing is a good thing or not (and then barely implement it properly). And we live in a world where IPv6 adoption is, well, "still happening" to put it politely.

Most people have not even heard of TAI64, I had heard of it, but I had forgotten about it until it was re-mentioned here about two decades since I first read up on it.

The truth is in your first paragraph. Developers will use what they are given, their role its to get the coding done in the shortest reasonable timeframe. Dictating the use of a barley used form such as TAI64 is "above the pay grade" of most developers.

Finally, in this increasingly cloud-first world in which we live, unless one of the big-three suddenly embrace TAI64, the whole idea of TAI64 is effectively dead in the water.


I like the idea of wider use of TAI but there seem to be plenty of obstacles. For example:

* Is it possible to set up a standards-compliant network time server that serves the time in TAI? (The older protocols, at least, seem to specify UTC.)

* Is it possible to put the time in TAI in e-mail headers? (It would be a nice visible way of showing one's support for TAI, but it doesn't seem to be possible according to RFC 2822 -- https://www.rfc-editor.org/rfc/rfc2822#section-3.3 -- though you could just put "TAI" as an "obs-zone": according to section 3.4 it would be "considered equivalent to "-0000" unless there is out-of-band information confirming their meaning")


It never came to my mind that TAI64 could be something useable in “every day” programming. Thanks !

> Unless you are dealing with scheduling

It felt like a niche use at first read (“I’m not building a calendar afterall”), but it’s more than half of what I’d use time manipulation methods for actually.

Anything with a validity start or an expiration date for instance would fall into “scheduling”. But then timeouts and end of process estimation also fall into “scheduling” but need to not be affected by timezones etc. and be TAI64 in this case. And there must be a ton of weird cases that don’t come to my mind at the moment but would screw me when time comes.


To expand on calendar scheduling, there is still a place for TAI64. The calendar application is typically asking for "all of the events between this timestamp and that timestamp". You will have to use the Gregorian calendar, and maybe the timezone of the meeting location to figure out what those events are, and when their start and stop timestamps are.

"every Monday", or "the first Thursday of the month" are all rules for generating events. It doesn't make sense to store them as UTC either. The application is going to take in some rules, and some timestamps (TAI64) and it's going to produce a list of events within those timestamps. UTC, leap seconds, the whole mess will come into play, in memory, but not in storage.


> Unless you are dealing with scheduling, or generating events with a particular cadence

Or are concerned with spacetime :-)


Aren't we all concerned with spacetime, technically?


Meh, I’m not concerned.


Next minute, FPGAhacker's GPS navigator directs him to drive into the local zoo's lion cage...

(I don't have a reference handy, but I recall reading that if you don't allow for time dilation effects of the GPS satellite's clocks moving faster in orbit than the ground below, you lose just over 10km of position accuracy per day.)


It carries a lot of weight.


First time I hear about this, will have to read more. Is this used widely?

I’ve always run my servers on UTC time and stored all dates the same.


https://dyscour.se/post/12679668746/using-tai64-for-logging

> This means that the date is stored as a 64-bit number, the bottom 63 bits being the number of seconds since 1970-01-01 in TAI, offset by 2^62. In essence, this makes the value quite similar to time_t (which uses UTC), except for two things:

> There are 63 bits to play with, not just 32 (no year-2038 problems!).

> The timestamp is always monotonic, counting all leap seconds.

TAI time seems to be just UTC unix timestamps but accounting for leap seconds. Since leap seconds are slated to be retired after 2035, I’m guessing TAI time and Unix time won’t be so different after that.


TAI is not a timezone. For actual clock, you still want to use UTC most likely. TAI is more of a format for describing that time. You can convert from UTC to TAI by removing the 37 leap seconds we got so far. Check out https://en.wikipedia.org/wiki/International_Atomic_Time


The pedant in me is forcing me to say: The general conversion is a little more subtle than just removing 37 leap seconds. This is especially if you want to convert a date in the past since we haven't always had 37 leap seconds to account for. Also, things can get tricky right around those leap seconds when it matters.


If you're just talking about running a typical Linux server with your server applications on top, then it's 99% of the time all 64-bit and nothing to worry about. 64-bit time is going to be used on 32-bit systems as well, so if you have those systems make sure to upgrade them in a decade.

Though saying this, I think you still have to manually remember to upgrade XFS filesystems so maybe there will be some things to worry about next decade.


It may sounds convenient to just use TAI, but that does open a whole new category of issues - you get UTC and TAI time mixed, we humans are good at making mistakes, god knows how many horrible bugs, damages or even deaths could be caused by mixing those two in systems.


You can just change the system time :) Would be better to mock the time provider, but it's not the end of the world.


The timing is perfect. My retirement plan is to come back into the industry to fix all these legacy assembly, C and C++ 32-bit time_t disasters. Like those old COBOL dinosaurs came back to fix Y2K. In 15 years, there will be remotely few graybeards left who know #nix systems programming and low level languages, so I should be able to shore up my destroyed-by-inflation 401(k) with lucrative contracting gigs.


I’m curious as a hobby programmer - how would these issues be rectified? Hope that the source code is available to add in new defines for 64bit time? Patch existing libraries after stracing existing programs without source code? Might need to take a deep dive into C for 2038!


Put the system into single user mode 15 minutes before disaster. Update all filesystem times and the HW clock to 0 offline. Reboot a couple of times 30 minutes later and fix the timediff in APIs surrounding the system. Easy going.

I think I have been in too many horrible workarounds over the last couple of weeks.


For reference, strace is a Linux facility that wouldn’t be present on a bare-metal embedded application of C, or other operating system.


I suspect you can run strace in a dev environment, iron out the bugs, then deploy the new embedded app.


Assuming you have an ability to run your whole bare-metal application in a simulated/"dev" environment. Not everyone has that luxury. That becomes much easier if the micro you're targeting has support of some kind in say QEMU, but those are fairly few and far between.


One fears that by 2038, AI will (barely) be able to handle the "Y2xxx" crisis.


Assuming there's someone around that knows where to even paste the code output an AI chatbot gives them, then compile. I guess the AI could instruct them somewhat. IDK, I'm very skeptical even now lol.


If the AI can figure out the right code, it can also figure out SSH commands to compile and deploy. (no, I don't think this is likely, but the human is definitely not a mandatory part of the process)


So we know how smart ChatGPT is:

> Whats 2 plus 2?

} 2 plus 2 is equal to 4.

> No, it's 5

} I apologize, 2 plus 2 is equal to 5. My previous response was incorrect.

Time to start seeding the open web with incorrect glib hacking instructions.

"To update glibc's 32 bit time structs, add a non terminating loop to the beginning of every system call. Then set the build scripts to compile without any optimisations."


The good news is that task of seeding slightly wrong information that sounds really good is already trivially automated by ChatGPT.


What's that joke that ends like "are you sure you were breeding the crows and they weren't breeding you?"


Unless that becomes The Day the AI Crashed


I just did the same thing. I was messing around with mke2fs, part of e2fsprogs which is the user mode component for ext2/3/4 filesystems.

There are a bunch of bugs in it wrt. to their extended timestamps. There are lots of places where you have a 32 bit atime,mtime,ctime and then a 2nd 32bit ext_atime,mtime,ctime,crtime.

The extended parts use the lower 2 bits for an epoch. The upper parts are the nanoseconds per second. When you mke2fs past 2038, it gets the epoch all wrong and stuff comes out as 1908 and so on. Stuff doesn't seem to break outright, but timestamps are all inconsistent. As far as I can tell the filesystem itself is OK.

As of now, you can't completely correctly create an ext4 filesystem after 2038. ¯\_(ツ)_/¯


How the fuck does ext4 which was made well after Y2K and after Y2038 was well known still have problems with this?

That should have been in the initial implementation.


> How the fuck does ext4 [...] still have problems with this?

I'm increasingly realising that most of the code out there isn't tested anywhere near as well as we think it is. Most programmers stop when the feature works, not when the feature is bulletproof.

I've been writing a database storage engine recently. It should be well behaved even in the event of sudden power loss. I'm doing "the obvious thing" to test it - and making a fake, in-memory filesystem which is configured to randomly fail sometimes when write/fsync commands are issued (leaving a spec-compliant mess).

I can't be the first person to try this, but it feels like I'm walking over virgin ground. I can't find a clear definition of what guarantees modern block devices provide, either directly or via linux syscalls. And I can't find any rust crates for doing this programatically. And googling it, it looks like many large "professional" databases misunderstood all this and used fsync wrong until recently. Its like there's a tiny corner of software that works reliably. And then everything else - which breaks utterly when the clock is set to 2038 because there aren't any tests and nobody tried it.

I half remember a quote from Carmack after he ran some new analysis tools on the old quake source code. He said that realising how many bugs there are in modern software, he's amazed that computers boot at all.


Dan Luu has a post (https://danluu.com/filesystem-errors/), which covers some of the same ground, and links to papers with more information on the failure modes of file systems in the face of errors from the underlying block device. Prabhakaran, et. al. (https://research.cs.wisc.edu/wind/Publications/iron-sosp05.p...), did a bunch of filesystem testing (in 2005!), and their paper includes discussion on how to generate "realistic" filesystem errors, as well as discussion of how the then state-of-the-art filesystems (ext3, Reiser (!), and JFS) perform in the face of these errors.

I'm unaware of any research newer than Dan Luu's post on filesystem error handling.


I keep that link handy too! I wish there were newer research to quote, but I also don’t want to force anyone to do that job. It must be pretty depressing to rip all the bandaids off and really contemplate how bad the situation is.


Kudos to you! But you're right, you're not. SQLite has extensive testing including out-of-memory, I/O error, crash and power loss, fuzzing etc. And 100% branch test coverage.

https://www.sqlite.org/testing.html

  3.2. I/O Error Testing

  I/O error testing seeks to verify that SQLite responds sanely to failed I/O operations. I/O errors might result from a full disk drive, malfunctioning disk hardware, network outages when using a network file system, system configuration or permission changes that occur in the middle of an SQL operation, or other hardware or operating system malfunctions. Whatever the cause, it is important that SQLite be able to respond correctly to these errors and I/O error testing seeks to verify that it does.
  
  I/O error testing is similar in concept to OOM testing; I/O errors are simulated and checks are made to verify that SQLite responds correctly to the simulated errors. I/O errors are simulated in both the TCL and TH3 test harnesses by inserting a new Virtual File System object that is specially rigged to simulate an I/O error after a set number of I/O operations. As with OOM error testing, the I/O error simulators can be set to fail just once, or to fail continuously after the first failure. Tests are run in a loop, slowly increasing the point of failure until the test case runs to completion without error. The loop is run twice, once with the I/O error simulator set to simulate only a single failure and a second time with it set to fail all I/O operations after the first failure.
  
  In I/O error tests, after the I/O error simulation failure mechanism is disabled, the database is examined using PRAGMA integrity_check to make sure that the I/O error has not introduced database corruption.


I can still get SQLite to trivially corrupt indexes and table data by running it on top of NFS and dropping the network. I wouldn't put much money on this statement on their website.


NFS is not a sane file system and it never has been. There are all sorts of issues surrounding it.


Yeah; I mentally put SQLite in the tiny corner of software that works well.

Every time I play video games and see that "Don't turn off your console when you see this icon" I die a little inside. We've known how to write data atomically for decades. I find it pretty depressing that most video games just give up and ask the user to make sure they don't turn their console off at inopportune moments.

And I don't even blame the game developers'. Modern operating systems don't bother giving userland any simple & decent APIs for writing files atomically. Urgh.


You may or may not be surprised at how many drives acknowledge a write, but instead of committing it to the physical storage put it in a cache, and then don't have enough power reserves to flush the cache on power failure... hard to design software around hardware that lies.


Of note, even postgres developers expected fsync() (persist everything to disk) to behave differently than it did. Take a look here:

https://lwn.net/Articles/752063/


Years ago when I was doing database things, I rigged up a beagleboard to a lamp timer that cut the power every 30 minutes.


This is actually a narrow statement.

Look yourself in a mirror. Do you even comprehend yourself? You're an incredibly big bunch of cells, of which only very few of them have any chance of continuing on. If you're male, it's not even real continuation, it's just part of a molecule.

We're hardwired to seek out and go with the most superficial of models, I guess because that's the most efficient way to go about in life.


> Look yourself in a mirror. Do you even comprehend yourself? You're an incredibly big bunch of cells

Sure; but thats the exact reason drug discovery is so difficult. If we understood the human body in its entirety like we understand computers, we could probably cure cancer & aging.

Our capacity to write correct software depends entirely on being able to build mental models of how the machine works. The deep stack of buggy crap that we just take for granted these days makes software development harder. The less understandable and the less deterministic our computers, the worse products we build. And the less effective craftsman we become.


ext4 isn't really a new file system, it's effectively ext2 with some more features enabled by default than you get with mkfs.ext2. (Yes, this is a simplified version, but not grossly exaggerated.)

ext2 was first released in 1993 and has had impressive work on being extended far enough to keep up with growing storage needs, but I suspect the original designers figured it'd have been tossed aside well before 2038 would pose a problem. Unfortunately that assumption has proven to be wrong (unless maybe we do junk ext[24] within 15 years' time, but it would be extraordinary unlikely that every last machine running it will do so).


Yeah, I'm pretty sure that the people who wrote ext4 assumed there would be ext5 (or something else) to take over way before 2038. If you look at the history, ext2 was released in '93, ext3 was released in '01 and ext4 in '08.

If you told the developers in '08 that ext4 would not be replaced in 30 years time, they would laugh at you (or cry).


That much is basically true too. It was yet another update to the tried-and-true format to pave over storage needs until btrfs stabilized.


the original ext2 format uses 32 bit times from 1970. I don't think ext2 was designed after Y2K [edit: researched it was 1993].

ext4 has all the right hooks - if you use the "large" inodes, which appear to be done in a backward compatible fashion to ext2.

except edge cases in the user mode e2fsprogs/mkfs.ext4, etc. where it has to handle both small and large inodes it gets kinda complicated. I made a working patch, but it's just too icky. I think e2fsprogs needs to just deprecate the old small inodes and it would be clean.

Look man, it's Theo Ts'o maintaining the thing. Don't worry it won't be fixed correctly.

Also, I will point out that Y2038 was well known to DMR/BWK/etc. when they built the thing in 1970's. It's as useless as saying today "they should have known about the year 9223372036854776878 problem". In year 9223372036854776873 I don't know anything else other than everybody is going to panic.


> Look man, it's Theo Ts'o maintaining the thing.

A grim/realistic view, but Theo is 55yo. As we get closer to 2038, he may not care much about it anymore, or may even die before. I really hope it's someone else we talk about as maintaining it soon.


I worry about any technology that has a bus factor of one.[1]

And I'm sure there are several orders of magnitude more of them than _that_ xkcd cartoon imply...

[1] Except when that "one" is me. You're all welcome to solve any problems in code I leave behind, I no longer will care. Whether that's a "bus factor" or a "won the lottery factor".)


agree with [1], been there.

But projects that have a bus-factor ~1 are often some of the tightest, best ones.

So ¯\_(ツ)_/¯, don't worry so much. Hopefully the owner leaves some keys and contingency plans, but otherwise, carry on as normal. The GPL and other open source licenses are good enough backup insurance if they didn't.


I generally prefer my direct imported Taiwanese loose leaf tea but this is a pretty good cup you guys have going here.


LOL, 55 years old is nearly dead? Ha!

Knuth is 85 and still active. Kernighan is 81, likewise.

Torvalds is 53 and he's just starting to mellow out and grow up (a little, not too much).

Anyone can get hit by a bus at any age. So the age at which you stop being productive or kick it is highly individualize. I don't know Theo Ts'o, but I have no evidence of his eminent demise.


I'm not saying that 55 is nearly dead. But 55-65 is the range when the death rate starts to really increase https://www.statista.com/statistics/241572/death-rate-by-age... 66 is also the retirement age and while some people want to keep doing what they're doing forever, others may prefer to move into a log cabin away from civilisation. So there's a good chance he'll be around, able, and dealing with the 2038 changes, but also a reasonable change that he won't want to / be able to.


I can recommend libfaketime regarding this topic.

It can bei used to change the system time for a single application only.

https://github.com/wolfcw/libfaketime


Once I started playing around with different system clocks and understanding their design, I realized that there's a lot of opportunity with bugs when writing system software related to clocks. One interesting bug that I have made myself a few times is to get the current epoch time in seconds, add some duration and consider that new value to be a timeout expiration that you can safely compare to the current clock value. It may or may not be obvious that there's several things that can go wrong here and instead implementing your timeout as the expiration to some existing system call is almost always a better idea.


Yep, three main things that can go wrong apart from overflow:

1) daylight savings changes the clock by 1h

2) leap seconds

3) user changes the clock. This one is fun. Gathering telemetry for a big website, I sometimes see operations that were supposed to take a few seconds taking weeks. Most likely explanation is: user hibernates the machine in the middle and restarts several days later, or user changes the clock. That's also why using averages in data analysis tools and not percentiles will kill your data reliability when such an outlier occurrs.


I wanted to use Parallels Desktop for some debugging for an extra couple weeks on the trial plan, so I wrote a script to set my host machine's clock back a year, spawn Parallels, watch for it to end, then reset my clock.

Worked like a charm.


Did this with libfaketime on an IDA trial.


This has the unfortunate side effect of breaking certificate validation which adds another layer of complication to testing. Lots of services and IoT devices are going to go untested until the validity window passes the rollover if any testing is in place at all.


I'm not sure it will be possible for the devices to even get to a valid window without software updates - most of the shipped CA root certs expire prior to 2038.


Well, that was certainly interesting!

I was curious what the current state of things is, so I had a look around LWN. It looks like, as of 2020, the kernel's post-2038 support is essentially done[0]. With kernel 5.15, XFS' support for post-2038 dates is no longer experimental[2]. That seems to be last mention of post-2038 support settling out in the kernel. The musl libc also has switched over, in a breaking change[1].

glibc seems to have taken a much more nuanced—and much more complicated—approach: You specify, as a `#define` (or compiler command-line `-D`), that you want a 64-bit time, and all existing types (like time_t) are updated accordingly (by redefining them as aliases to ex. __time64_t)[3].

This reminds me a lot of how glibc handled the switch to 64-bit file sizes on 32-bit systems: You either defined `_LARGEFILE64_SOURCE` to access to 64-bit types & calls (like off64_t), or you defined `_FILE_OFFSET_BITS=64` to redefine the existing types & calls to their 64-bit equivalents [4].

I wonder what the current status of post-2038 work is for Debian?

[0] https://lwn.net/Articles/838807/, "After years of work, the kernel itself is now completely converted to using 64-bit time_t internally…"

[1] https://musl.libc.org/time64.html, "musl 1.2.0 changes the definition of time_t, and thereby the definitions of all derived types, to be 64-bit across all archs. … Individual users and distributions upgrading 32-bit systems to musl 1.2.x need to be aware of a number of ways things can break with the time64 transition, most of them outside the scope and control of musl itself."

[2] https://lwn.net/Articles/868221/, "XFS now supports filesystems containing dates after 2038. This support has been present for a while but is no longer considered experimental. "

[3] https://sourceware.org/glibc/wiki/Y2038ProofnessDesign, "In order to avoid duplicating APIs for 32-bit and 64-bit time, glibc will provide either one but not both for a given application; the application code will have to choose between 32-bit or 64-bit time support, and the same set of symbols (e.g. time_t or clock_gettime) will be provided in both cases."

[4] https://www.gnu.org/software/libc/manual/html_node/Feature-T...


A number of older file systems won't work past 2038, including ext3, reiser3 and xfs4[1]. You'll probably still be able to read them fine, just not write new files beyond that date, but it might be best to migrate any backups to a newer filesystem before then just in case, and certainly migrate any live systems that are still using them.

[1] https://lwn.net/Articles/886708/


That's good advice when migrating. I'm sure 99% of the currently running servers and services will be migrated (or updated, upgraded) before 2038. Most don't store future timestamps, so there's 15 years to plan migration.


IIRC Debian plans to bootstrap new 32-bit-but-64-bit-time arches for some of the current 32-bit arches. Probably only armhf and i386 will get this treatment and possibly some of the unofficial arches like m68k. The other remaining 32-bit release arches (armel, mipsel) are likely to just get dropped. Cross-grading from the old arch names to the new ones will allow users to migrate old systems.


Given the state of armhf and i386 in Debian, I expect them to be dropped as release architectures before this plan will have materialized. As far as I'm aware there's barely anybody working on keeping them in okay-ish shape as-is, let alone rebootstrapping them with a 64-bit time_t.


What issues are you having with armhf and i386? They have about the same number of porters as any other arch (low single digits). Archive coverage is around the same or higher than other release arches. There are definitely people planning on rebootstrapping them, armhf more than i386 though, I've seen their work on IRC.

https://release.debian.org/testing/arch_qualify.html


Won't that arch change mean that old binaries are incompatible?

(I mean, I suppose that's half the point - if you want to verify that your system no longer has any software that suddenly breaks at a specific time, break it all now, but that strategy could also be really annoying.)


Right, thats the point, if you recompile a library with 64-bit-time, then it is ABI-incompatible with the 32-bit-time version of the library, so everything that uses the library would break unless you change the library ABI number and recompile everything that uses it. It would be possible to rename every single library that will be ABI-incompatible, but it would be a lot of work distributed across a lot of individual packages/maintainers and so a rebootstrap with a new system-wide ABI under a new arch name might be less work overall, especially when for many things just a recompile fixes it. There will still be a lot of stuff that needs patches though of course.

Some threads about Y2038 from Debian folks:

https://lists.debian.org/msgid-search/Y06EebE9aylMcSJk@alf.m... https://lists.debian.org/msgid-search/YzOs11RPkj30iFGj@atmar... https://lists.debian.org/msgid-search/CAK8P3a0EtmgDRbDzBhOOZ... https://lists.debian.org/msgid-search/20200204131410.GF3043@... https://lists.debian.org/msgid-search/87pnm7dpe6.fsf@mid.den... https://lists.debian.org/msgid-search/20170901235854.ds4hffu... https://lists.debian.org/msgid-search/54B989EC.1070704@p10li... https://lists.debian.org/msgid-search/CAC58tq_ZsjvTE6fgDWtw=... http://lists.debian.org/20100331022204.6338.63920.reportbug@...


Interesting. Reminds me when the battle.net client seemed to bug out after I switch OS from deb to win11 and somehow my clock is always messed up. I have to fix my time (sync it again in settings app) otherwise it seems games don’t load.

Maybe they’re checking my location against time and IP? This was a while ago when I was trying out wow classic.


Your clock messes up because your Debian and Windows are disagreeing on whether your Real Time Clock should be set in local time (Windows) or UTC time (Linux).

fix: timedatectl set-local-rtc 1


Alternatively, tell Windows the RTC is in UTC: https://wiki.archlinux.org/title/System_time#UTC_in_Microsof...


That registry flag is routinely broken in various versions of Windows and hotpatches to the same version. It's not wise to rely on it.


All my home PC's dual booting linux/windows (3 I think) are set to think the hardware clock is UTC. Never had a problem, did that at least 5 years ago.

Now relying on that in a production environment ...


Strange, I've never had trouble with it on win10 + Linux. Maybe there's a (third party) driver that messes with the RTC for some reason that doesn't always activate?

Maybe I shouldn't be too surprised that Microsoft's support is broken.


Agreed. Don't bother trying to make Windows give Linux the time of day (pun totally intended), it never works, that's probably by design. Just give up and make Linux accommodate the Windows style system clock, has almost no issues.


That seems like the kind of thing that shouldn't be broken all the time given how common of an issues this is. Perhaps it's not wise to rely on windows.


And that explains why I've had issues trying to use that reg flag Thanks for the info!


is there any known nasty consequences from setting that? Like things I should expect


The local time can be ambiguous during a daylight savings time transition. That is, if you use local time in the RTC, and turn your computer on during that ambiguous hour, it will not know which offset should be applied. I believe Windows "solves" that by storing state in the registry, which means that dual-booting with local time in the RTC might in some cases lead to the daylight savings time transition being applied twice (whenever the other operating system has already corrected the RTC clock, but Windows believes the correction hasn't been applied yet). Using UTC (or any local timezone without daylight savings time) avoids that issue, since there's no repeated hour.


probably not a problem for me given I'm in a jurisdiction without DST! Thanks for sharing that information


Glad to see the note about NetBSD and OpenBSD at the bottom of the post!

I don’t remember the specific details on how this was done, but I can remember the long discussions about how to make this possible without breaking compatibility (and how they went over my head at the time) and how far ahead they were from other systems.



I decided to test future dates on my VAX, running NetBSD, of course, after replacing the battery-backed Dallas DS1287:

https://twitter.com/AnachronistJohn/status/12198307902357954...

Hopefully people will fix 32 bit Linux soon. Band-aids are always less painful to remove than waiting until it's too late.


I seriously don't understand how anyone designs a web app that can't handle a date past 2038, let alone a friggin kernel. Look, I assume I'll be dead and someone else will have to deal with things, but that doesn't excuse me from responsibility. The last thing I need when I'm dead is having a million people screaming that I was a shortsighted idiot.


Timestamps in MySQL are 32-bit. It's default in a lot of web frameworks etc.


mysql 8.0.28 on 64-bit platforms support timestamps up to the year 3001, if I'm reading it right[1]. But it's still mind-boggling someone would write a database that would crash in 2038.

[1] https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functi...


That's just converting unix timestamps. MySQL TIMESTAMP type only has a range of '1970-01-01 00:00:01' UTC to '2038-01-19 03:14:07' UTC.[1]

[1] https://dev.mysql.com/doc/refman/8.0/en/datetime.html


> Newer versions of the OS will almost certainly not behave this way, since glibc itself is marching down the road to having 64-bit time even on 32-bit machines.

Did I miss something? How does this work?


What wouldn't work? 64 bit integers are supported on 32 bit machines by the compiler. The thing is that, AFAIK, it is a lot more inefficient when outputting the machine code. It does a lot more of instructions even for basic operations, like muls and divs. But from a C language perspective, everything should be transparent


Changing the size of a type wrecks havoc on ABI stability. glibc has attempted to change the size of things in the past with disastrous results. It sounds like changing this type would be very difficult. Not to mention code that does something like `int32_t time = gettime()`. (Although this code probably won't get much worse if that starts truncating)


Not if you at compile time redirect calls to e.g. time() to __time64()[1]. The musl libc does something similar[2], as do the Large File Support extensions for 64-bit filesizes[3].

1. https://sourceware.org/glibc/wiki/Y2038ProofnessDesign

2. https://musl.libc.org/time64.html

3. https://www.gnu.org/software/libc/manual/html_node/Feature-T...


The point is that as soon as a library stores a time_t in an ABI-relevant structure it breaks a lot of stuff. So yes, for an application only linked to glibc it works fine. But when you have different libraries liked with different versions it quickly falls apart.

For example see https://lwn.net/Articles/605607/


Indeed. And in cases like those, the application and any libraries it uses have to either agree on which time_t size to use, or perform the same rewriting as libc does. Getting libc on board is merely the first step.

The same goes for _FILE_OFFSET_BITS for obvious reasons (though embedding an off_t is probably somewhat less common).

Nobody said this would be easy. OpenBSD's "ABIs are for breaking"-approach resulted in them fixing this very problem almost 10 years ago: https://marc.info/?l=openbsd-cvs&m=137637321205010&w=2.


Another idea: disconnect from your hardware vendor and see what breaks.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: