Hacker News new | past | comments | ask | show | jobs | submit login
gettimeofday() should never be used to measure time (pp.se)
95 points by aethertap on Jan 18, 2015 | hide | past | favorite | 41 comments



Except when doing a pcap, I actually want the actual time it is. Maybe in some cases you'd explicitly want an offset, but not in general. It's really a plea to get your clocks synced up, so you aren't forced to choose between reporting an incorrect time or an incorrect duration. If I'm running a pcap and the system time changes over a day by several seconds, I'd prefer each packet to report the closest concept of the right time instead of being way off as time goes by.

Not to mention: if the monotonic clock can keep such accurate timing, then everyone would just use that and NTP would not be so necessary.

Really: Under what conditions do you have a usefully functioning system when the clock is so off you need to do multi minute jumps? Even HyperV, with the utterly atrocious w32time manages to keep it with a minute or two (and a Linux guest can easily have ~ms accuracy).

The leap second point is valid, but that's an argument against leap seconds which serve no use in today's society other than to introduce unnecessary problems. Even Google just gives up and purposely introduces inaccuracies in their clocks for a day so that when the leap second comes around they're synced again. A leap hour would be a far better solution, as it's something many people are (unfortunately) used to from DST, and it wouldn't bother us for a dozen centuries.


Under what conditions do you have a usefully functioning system when the clock is so off you need to do multi minute jumps?

One example is embedded systems. Many don't have an RTC, or boot after the RTC has lost power. If a network connection finally comes up, NTP will instantly fast-forward the clock by years


The simple solution would seem to be setting the clock first then doing the packet capture instead of setting the clock in the middle of the packet capture.


for debian based embedded systems, the fake-hwclock package is helpful here (it's a script to periodically save the current time, and restore on boot). You'll still have big jumps after a power loss, but probably not years. It's also helpful in case you ever change the motherboard on a regular system with a RTC.


Many embedded systems don't have any writable durable storage.


> if the monotonic clock can keep such accurate timing, then everyone would just use that and NTP would not be so necessary.

Stable and accurate is not the same thing.

And if you have two NTP servers, and one is off (example from the article), then yes multi-minute jumps do happen. A misconfigured NTP server caused TCP sessions in a load balancer to drop and ping commands to just hang. That is not OK.

Side-note: The misconfiguration in that example was "the NTP server ran on a virtual machine, where timer ticks from hardware to VM drifted, and the time difference to upstream became so big that the NTP daemon stopped trusting it and ignored it". I'm not defending that design, I didn't do it, and I fixed it when I found it.


I wish that HN would use the Public Suffix List (https://publicsuffix.org/) in its algorithm to display domain names of submissions. That way, we wouldn’t get things like this, where the domain given (pp.se) does not say anything about what the actual site is.


I think the HN admins/devs will have a better chance to see your suggestion if you send it to hn@ycombinator.com

Edit: Good suggestion BTW :)


OS time handling is feeble across the industry. Its inherited from 20-year-old ideas about APIs. Not just the antique time structures that are useful for rendering but not much else. Also the abominable Sleep() and such.

Imagine you want to do something every second. You Sleep(1000) or some such. But it takes time to do the thing, so its actually a bit longer between loops. Maybe it doesn't matter; maybe it does. But you're stuck doing stuff like that.

Why not Wait(timetowaitfor). Not a duration; the actual time you want to be woken up. Now it still takes time to wake up and run. And it takes time to make the call. But now, your stuff actually runs say 60 times per minute (e.g. if you wait for successive seconds), hour after hour and day after day.

Also, what's with limited resolution on the time? Its due to the common implementation of timers as a counter of ticks, where a tick is whatever regular interval some hardware timer is set to interrupt. Why not instead interrogate a free-running counter? And if I want to wait 1 second plus 150 nanoseconds, then I Wait for that time to arrive, and the library (or OS) set a real timer interrupt to go off when that time has arrived? Sure there's latency in calling me back; that's inevitable. What's not inevitable is some limited multi-millisecond tick resolution.

Anyway, whenever I'm in charge of designing an OS or application environment, I provide real timers like this. It's about time the big OS providers catch up to the 21st century.


> Why not Wait(timetowaitfor). Not a duration; the actual time you want to be woken up. Now it still takes time to wake up and run. And it takes time to make the call. But now, your stuff actually runs say 60 times per minute (e.g. if you wait for successive seconds), hour after hour and day after day.

There's a lot of "what ifs" that need to be answered for something that seems so simple:

* What clock are you using? Machine ticks? Wall clock? Is it correct? Is it stable enough?

* What if the clock misses the time that I asked for? Do you run it anyway? Do you skip that invocation?

* What if the clock moves backwards? Will you trigger twice? Will you even notice?

* What if I have a leap second so there are 61 seconds in the hour? What if a second is removed so there are 59 seconds in the hour?

The reason why people don't touch this stuff or get it wrong is because it's really hard. There's a lot of corner cases when it comes to time handling.


Really hard things are what OS code is FOR. Get it right once; then apps call it and it works.


You can already do this in POSIX.1-2001-conforming systems like Linux using `clock_nanosleep` (http://man7.org/linux/man-pages/man2/clock_nanosleep.2.html).


Still just a Sleep(), right? So the latency in the callback can be added to the next time, with resulting drift. The idea is, I know when I want to be called back, as an actual number. I want a sleepuntil() that I pass the actual time.


Unless I misunderstood, that's what the TIMER_ABSTIME option is for. This is actually what I use for soft real-time loops, when I'm concerned about both drift and jitter.


Ok thanks, I reread it, that seems to be the case! I hope the seconds field is honored along with ns. WinCE had a bug where they used a long. If latency made your call for a small interval occur after the interval had elapsed, you slept for 19.5 minutes instead (nearly a full count). Combining the ns with a second counter should be enough bits to avoid that.


You probably want to look at setitimer.


Why? That still uses intervals instead of real time indexes.


It's a similar situation on iOS, where new developers sometimes use (in Objective-C) `[[NSDate date] timeIntervalSince1970]` which is natural, but wrong. NSDate draws from the network synchronized clock and will occasionally hiccup when re-synching it against the network, among other reasons.

If you're looking at measuring relative timing (for example for games or animation), you should instead use `double currentTime = CACurrentMediaTime();` That's the correct way.


Ironic that the "What to use instead" part doesn't even feel the need to check return values of functions.

Am I missing something ?


Example code often skip error handling, because that's not the point.

of course clock_gettime() should have its return value checked, just like you always check the return value of gettimeofday() and time()


In an article that tries to tell us we're doing it wrong, doing it right would be nice.


According to the API, clock_gettime() can only error out for two reasons: EFAULT when the 2nd argument points to invalid memory or EINVAL when the clock is not supported on the system.

The first thing is ensured in the code, the 2nd thing can be ensured by, for example, checking for the existence of CLOCK_MONOTONIC in an initialization function, which we don't see in this example.

So I'd say that this is a case where you can relinquish checking the return value without feeling bad about it.

Same thing about sleep/usleep: All possible situations (sleeping correctly, and being interrupted) are handled correctly by the code (if you don't intend to handle signals, of course...).


Let's talk about the sad state of clocks today. There exists a few ways to query NTP time on Linux. (1) Directly through NTP (2) the adjtimex syscall, (3) the ntp_gettime call. I found it hard to find many codebases using the proper NTP. In fact codebases that need reliable time, like Cassandra and OpenLDAP. don't use NTP time APIs to check whether the system clock is in sync, or to get accurate time. Even if we were to make PTP accessible to the world, it would be some time before its usage actually became ubiquitous. The understandability of time keeping, and clock yielding in our community is a sore point.


I think NTP usually synchronizes the system time, so programs don't have to use any NTP specific API to get NTP time. PTP won't help here: all it does is increase the accuracy from milliseconds to microseconds - not much use if the Linux scheduler tick is 1ms. PTP time is mainly useful together with hardware event timestamping, where stuff like interrupt latency can be excluded. If you want, for example, to send a frame every two seconds as soon as the device boots, you still should use CLOCK_MONOTONIC, otherwise you produce a glitch when PTP ramps up.


I think the article and much of the discussion misses a larger point -- time is hard, and very very hard when there are multiple systems with different clocks. The APIs are the way they are because there just aren't solutions especially since all systems ultimately have unreliable connections to good time sources.

The miserable APIs are New Jersey/Worse-is-better answers to intractable problems.


That's just not the point of this article.

Basically, my takeaway from this is: if you /can/ avoid the complexity, by dealing with absolute relative timings (if something takes 2 seconds, then it takes 2 seconds, even if one of them is a leap-second), then you /should/.

And the best way to do this is using the techniques mentioned in the article


This is but one chapter of what could be a huge book on time programming, I agree.

But it is a self contained advice, that should be adhered to.


the semantics you'd like the OS + standard library to provide would be some kind of gettime() call that returns a time thingie, and a secondsbetween(a,b) call that reliably tells you the time between the two time thingies.

the fact that it doesn't already work this way is a design fail.

all the nonsense about NTP and clock slew and monotonicity are implementation details that should be hidden below this layer.


the last time I tested them on redhat 6, clock_gettime(REALTIME) and gettimeofday are slightly faster than clock_gettime(CLOCK_MONOTONIC), and gettimeofday is much faster than any clock_gettime on older platforms.


This isn't even right....

The correct call is to clock_gettime(CLOCK_MONOTONIC_RAW...


Well, that depends. CLOCK_MONOTONIC_RAW does avoid NTP slewing, but NTP doesn't just adjust the time scale to adjust for offsets - it also attempts to correct for frequency error in your system clock itself. If you use CLOCK_MONOTONIC_RAW, you don't get that correction; you get something that on desktop hardware can easily be tens of ppm off from actual time. Of course, your MONOTONIC_RAW clock is still subject to the vagaries of physics; as temperature or system load changes, depending on the quality of your time source, you might get significant changes in the rate of CLOCK_MONOTONIC_RAW as well (which NTP will correct for in CLOCK_MONOTONIC, given enough time to adjust).


You usually don't want RAW. The speeding up/slowing down of clock is a good thing.

Hardware clocks don't always tick at one second per second.


    while (ts_remaining.tv_nsec > 1000000000) {
        ts_remaining.tv_sec++;
        ts_remaining.tv_nsec -= 1000000000;
    }
should be

    int secs_in_nsec = ts_remaining.tv_nsec / 1000000000;
    ts_remaining.tv_sec += secs_in_nsec;
    ts_remaining.tv_nsec %= 1000000000;
Right? I mean maybe he microbenchmarked it and looping is faster because no div or mod, but intuitively this seems like it would be better. If the loops are a result of benchmarking, it should probably be called out in comments.


I sometimes deliberately write really slow, stupid C code when dealing with time or other things that are both extremely important and (if I'm honest with myself) unlikely to be tested exhaustively.

Apple has really awesome developers who don't need to do stuff like this, and that's probably why iPhone alarms fail to go off every other leap year and reliably sound at 2 AM on January 32nd of years ending in '3'. Time-related code is like rolling your own encryption, in a sense. It's a trap for amateurs and pros alike.


I don't really understand how ts_remaining.tv_nsec > 1000000000

could ever be true. I thought we had a-b, with a<1000000000 and b>=0.

For the negative check, the loop should only be able to run once, so could be changed to an 'if' for clarity. Regarding the timing, branching is probably faster.

It is my understanding that the c percent operator can produce a negative number by the way, so your code wouldn't work.


OP or mods: please add [2010] tag to the title.


If adding the year to every post that's not "news" is to become a convention, it should be added to the guidelines. I doubt this is going to happen because, really, what difference does it make for an article like this?


Old posts tend to contain outdated information, although I don't think that's the case here. The real issue is that almost every old post has already been discussed on HN previously, and posting it again just wastes time by restarting the discussion from scratch.


On articles more than a year old, I find the year helps me know at a glance if a familiar sounding title is something I've read before or fresh content with the coincidentally familiar headline.


getyearofpost() should never be used to measure timeliness


Ha!

Indeed. I'll go back and update posts if they are no longer true.

/Author




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: