Probably floating-point oddities. The bane of deterministic programming everywhe...

tachyonbeam · on April 24, 2015

Quake 1 & 2 largely used fixed point arithmetic, for speed, AFAIK.

clarry · on April 24, 2015

They've got fixed point code in a few spots (especially where it makes sense if you're e.g. indexing into a texture), but the original Quake source really says float all over the place.

tobinfricke · on April 24, 2015

Floating point is deterministic.

limaoscarjuliet · on April 24, 2015

I am one of the authors of Fire Fight game, which had the same multi-player model as Doom.

Main source of multi-player sync loss was a different number of rand() calls between machines e.g. a different animation sequence for own character vs enemy. If own animation calls one rand() too many, the next common rand will return diff values on own vs enemy machines. And you essentially start playing two different games together ;-)

That being said, we fixed all bugs. There were no additional checks to synchronize state other than keyboards. It was way easier than we anticipated!

Good times!

TheLoneWolfling · on April 24, 2015

That was what, 1996?

Unfortunately, many of the edge cases that cause non-deterministic behavior have gotten substantially more exploited by compilers since then.

limaoscarjuliet · on April 24, 2015

I really want to disagree. Assuming the code is correct: - proper handling of system calls and EINTR, - do not call illegal functions in signal handlers, - proper synchronization to avoid race conditions, - etc. it will work just fine.

The problems with today's systems does not lie in compilers. It lies in vastly increased complexity of the code itself.

Back in 96 we had a few sprites, with some animation sequence, sound effects. State of an object was kept in a few integers. Collisions were done using really stupid algorithm that would probe special mask in level builder with a circle consisting of 16 points, so we can get a vector pointing away from the wall. Etc.

Today, collisions, physics, presentation - all this is thousands of times more complex. More code, more bugs, more issues.

For that reason, it would be very difficult to write keyboard-state driven multiplayer today.

TheLoneWolfling · on April 24, 2015

Ah, you weren't using floating-point. That makes things substantially easier.

W.r.t. floating-point, good luck. See https://news.ycombinator.com/item?id=9430958

limaoscarjuliet · on April 24, 2015

Ah, yes, you are correct. We built our own fixed point library.

megablast · on April 24, 2015

Couldn't you reset the rand every now and again, without too much of an effect. Such as every 10 seconds? For every machine.

limaoscarjuliet · on April 24, 2015

Nope. Once the sync was lost, you were playing two different games: player A on machine A is in room 1, player A on machine B is in room 2. Resetting rand sequence would not correct the introduced error.

The only way to correct such error would be to change model to sync the actual state among machines. But that means completely different code.

All in all, I remember we added the multiplayer code at the very end. If we decided to sync state rather than exchange keyboard+mouse, we would not finish on time.

DougMerritt · on April 24, 2015

So is rand() and similar. The point is that Heisenbugs and such at the app level are hard to debug because behavior is not absolutely repeatable under what seem to humans to be identical circumstances.

That most certainly gives the appearance of non-determinism.

And floating point is frequently infamously tricky to deal with, I think is the obvious point without arguing terminology.

willvarfar · on April 24, 2015

Different machines have different internal precision. See http://gafferongames.com/networking-for-game-programmers/flo...

Games I've worked on have used libraries e.g. http://nicolas.brodu.net/programmation/streflop/index.html

TheLoneWolfling · on April 24, 2015

If only. Even something as simple as this, where x and y and a are floating-point values:

    if( a == x/y )
      assert( a == x/y );

Is legally allowed to trigger (!). Worse, it's also legally allowed to not trigger.

That's right. The above can be legally compiled to always trigger, or never trigger, or even to sometimes trigger (although this is rare in practice).

Why? C / C++ allow floats to be stored with excess precision. And they lose that excess precision when spilled to RAM. And GCC isn't deterministic. So x/y may be pulled out as a common subexpression and spilled to a register after the first usage, and it triggers. And then the next time it isn't, and it doesn't. (Effectively, the difference between this:

    temp = x/y
    if( a == temp) {
      spill(temp);
      assert( a == temp );
    }

and this:

    temp = x/y
    if( a == temp) {
      assert( a == temp );
    }

Yes, it'll (generally - though the standard doesn't dictate that it will be!) be deterministic in terms of always with a given example in an single binary triggering or not triggering, but that is irrelevant, as binaries aren't cross-platform and you aren't always loading exactly the same binaries of the dynamic libraries you've linked to.

You can (most of the time) work around this by manually setting FPU precision (note that you need to change the precision of both the mantissa and exponent, as otherwise you can still end up with issues). (Except that you, in C++ at least, have to work deep magic - you have to set it before main even runs, because you can end up with problems with static initialization otherwise.) And in the process losing compatibility. And finding that random code you call occasionally resets things. Or random code you don't call (a classic example: the printer driver!). And using exactly one of float or double throughout. And hoping that none of the libraries you use ever resets FPU precision for anything, or uses the other of float or double itself. (And hope that your OS properly saves / restores the FPU everywhere, though this has largely become a non-issue).

And, worst of all, you've got to hope that the compiler performs constant folding with the same level of care as you've just taken.

Look at these:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=323

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=37845

http://yosefk.com/blog/consistency-how-to-defeat-the-purpose...

http://gafferongames.com/networking-for-game-programmers/flo...

http://www.yosoygames.com.ar/wp/2013/07/on-floating-point-de...

pjc50 · on April 24, 2015

Yes, we got bitten by this when using an algorithm with a k-means clustering step in the middle; minor perturbation in the least significant bits could change which cluster an item got assigned to, which would then result in massive changes to the eventual output.

(Relatedly, several ASIC and FPGA tools suffer from nonlinearity so badly that they will let you specify the random seed at the start and/or do multiple runs with different seeds to pick the best result)

TheLoneWolfling · on April 24, 2015

That bad? Yow.

lectrick · on April 24, 2015

This is like a case study in why any computer architecture from the metal to the top should try like hell never to lose determinism.

If only to have deterministic test cases...

TheLoneWolfling · on April 24, 2015

Yep, and we (mostly) manage to have determinism all the way from the metal up... Only to lose it at the language level.

jjuhl · on April 24, 2015

Which is why you should never use == with floats/doubles, but always <=, >=, <, > and why modern compilers have warnings you can enable to tell you when you do ==.

TheLoneWolfling · on April 24, 2015

While true, that's irrelevant to my point. The same thing occurs with this:

    if( a <= x/y )
      assert( a <= x/y );

Or with any other of the operators you mention (<, <=, >, >=).

fancyketchup · on April 24, 2015

Not necessarily. The x87 implementation (which is relevant to everything PC, prior to SSE) of IEEE 754 internally uses an 80-bit extended precision representation for floating point numbers. Numbers are rounded to 64 bits for loads and stores of course...

Now, what do you suppose happens during a context switch, when a second process wants to use floating point math?

zurn · on April 24, 2015

You can save/load 80-bit values to memory with the regular x87 instructions, it's just called "extended precision". And the accelerated "save FPU state" instructions save the full 80-bit state. Otherwise it would be pretty nutty.

Where you can get unpredictable (though still repeatable) behaviour is when your compiler spills into 64-bit memory slots, and does it a little differently with each little modification to the code.

dilmuff · on April 24, 2015

For distributed scenarios where the architecture, language and library are unspecified and effectively random, it is not. Not so long ago x87 80-bit precision was common while non x87 systems typically used single and double precision. Finally, IEEE-754 does not specify transcendental functions, so once you use those your results vary all over the place as approximations vary. The fact that Go does not use the the CRT for FP will demonstrate this in the real world.

munificent · on April 24, 2015

On a single chip, yes. But different chips handle some corner cases differently, leading to divergence between machines.

TheLoneWolfling · on April 24, 2015

Actually with C / C++, it can be different even with the same chip. It can actually be different using the same binary and still be compliant.

Something like this:

    if( a == x/y )
      assert( a == x/y );

Is allowed to assert (!). It's also allowed to not assert. Worse, the following is a legal "optimization" of the above code:

    if( a == x/y )
      if ( getRandomBooleanFromSomewhere() ) 
        assert( a == x/y );

(I'd use rand, but I don't want to get into matters of state here. Just assume that getRandomBooleanFromSomewhere, well, pulls a random boolean from somewhere. No side effects.)

Why, you might ask? It's allowed to use extra precision, but not required to. And it's not required to even be consistent about it. So: if it uses extra precision, and it decides to lose the extra precision (say, by spilling x/y to stack) between the first and second uses, the assert could potentially fire. If it doesn't do both of those, the assert cannot fire. So it's legal to remove the assert, but also legal to keep it. And also legal to randomly decide which to do at runtime.

Now, this is a stupid thing for it to do in this case. But nonetheless, it is allowed. And there are cases where analogous optimizations actually make a plausible amount of sense.

You can work around it to an extent with FPU flags. But that way lies madness. To put it mildly. ("It desyncs when I print a document.")

lmm · on April 24, 2015

AIUI the C standard mostly regards floating-point as out of scope / platform-specific. Your vendor should (hopefully) provide stricter specs for how it works on your particular machine.

TheLoneWolfling · on April 24, 2015

The particular machine tends to be deterministic.

It's the compiler that's mainly the problem here. Or rather, the language standard.

dandelany · on April 24, 2015

Technically, maybe, sort of. http://gafferongames.com/networking-for-game-programmers/flo...

shultays · on April 24, 2015

May not be same on different machines

TheLoneWolfling · on April 24, 2015

May not even be the same on the same machine. At least if you're using C or C++ (or anything that ever passes through C or C++, i.e. almost everything.)

cbd1984 · on April 24, 2015

Deterministic but too hard to predict is a form of non-determinism, as far as humans are concerned.

TheLoneWolfling · on April 24, 2015

It's not even deterministic.

At least for anything that ever passes through C or C++, or uses code that does. (I.e. everything, close enough.)

See: https://news.ycombinator.com/item?id=9431372