1500 Archers on a 28.8: Network Programming in Age of Empires and Beyond (2001)

codetrotter · on Oct 16, 2021

The article seems to have been cut short. When I viewed it, it ends at the beginning of a sentence starting with the words "None of".

Here's an older snapshot of the article from back when the site was still called Gamasutra, that includes the full text of the article:

https://web.archive.org/web/20180719170411/https://www.gamas...

larsiusprime · on Oct 16, 2021

I'll contact the editor in chief and see if it can be fixed.

codetrotter · on Oct 16, 2021

Thank you :)

Macha · on Oct 16, 2021

Oh, didn't realise they had renamed themselves until I saw this post. Reading the article on why, the reason makes sense, though I never made that connection until reading the article. They could have picked a less generic new name though, feels a bit like the naming equivalent of the flat logos which currently have an article on the front page too.

blunte · on Oct 16, 2021

I don't know if Paul was the only guy working on the network code, but I believe he was responsible for most or all of it.

He wrote that code when he was 18 or 19 years old I believe. But unlike some of the other devs at Ensemble, he never had a big ego, and he was just generally nice and humble.

He (much) later created the early mobile hit, Words With Friends, which had many copycats later.

ddek · on Oct 16, 2021

As a fairly high rated AoE2 player (~1800), I can attest that the network architecture has not stood the test of time. Lockstep causes tonnes of problems. Desynchronisation is still possible, when that happens the simulation needs to backtrack (I think?), which can cause units to jump multiple tiles. Because your machine is resolving the movements of other players, you run into problems with more players. The wire protocol is widely known and easily hacked, in a way that's hard to detect in game.

AoE2 DE is still lockstep, but through a server. All players are connected to the same server. This has slightly improved on the previous P2P, but not really addressed anything.

Jare · on Oct 16, 2021

> Desynchronisation is still possible, when that happens the simulation needs to backtrack (I think?

Sounds more like client-side prediction to smooth things, than actual simulation desync. I have a hard time believing a deterministic game with such a large state was able to backtrack and resync the sim back to determinism. I did not think that lockstep RTS games would need client-side prediction (the indirect and long-term commands in RTS helps hides latency), but I guess if your gameplay lends itself to high APMs then it becomes necessary.

marginalia_nu · on Oct 16, 2021

Well you could just have state snapshots taken at regular intervals and then verify that both sides' hashes agree. It's only a couple of thousand entities so it's really not so bad. For you can probably ignore those that haven't deviated from the previous snapshot (and that would account for a state reconstruction taking time).

RTS games have a replay mechanisms at least as far back as StarCraft: Brood War, so a journal of player inputs are likely going to be recorded anyway.

Jare · on Oct 16, 2021

Yes the journal of player inputs sure, but the intermediate states is a different matter. AoE2's state size is peanuts for a modern machine, but I would say that at the time, it was quite significant and it would be too costly to store it on the fly. I certainly did not dare try that in the RTS-like deterministic games I worked on (Commandos and Praetorians)

marginalia_nu · on Oct 16, 2021

We're still talking about a few dozen kilobytes of data here. A dozen or so bytes worth of global state per player, up to 200 units per player with a few bytes worth of state (position, order, action, action target, hitpoints), maybe a hundred projectiles, order of 1000 static entities with just hitpoints.

Gotta keep in mind that these games were written in languages that did not have modern garbage collection, so almost certainly stored entity information in arrays to avoid heap fragmentation and malloc costs.

A few dozen Kb is far beyond what you can push over a modem in real time for sure, but memcpy:ing a couple of kilobytes' worth of arrays was still plenty fast in the late '90s/early 2000s.

They weren't running these games on a 6502.

alternatetwo · on Oct 16, 2021

It's actually a lot more, an uncompressed world state from a recorded game is 1.6MB - compressed (aoe uses deflate) it's only 153kb, but that's still a lot.

Jare · on Oct 16, 2021

I think you're severely underestimating the task and the level of detail required, but happy to leave it at that.

alternatetwo · on Oct 16, 2021

I think they have checksums every once in a while over their world state, fog of war state etc, and if these checksums don't match it desyncs. Then it creates an out of sync save, probably for just before the desync occured.

PicassoCTs · on Oct 16, 2021

Nah, the desync is when two floating point operations do not produce the same outcome, the checksum is detecting when that butterfly has caused a thunderstorm of diverging game states that is measurable. That can happen fairly late , depending on what is hashed.

The strategy of occasional save-game storage and backtracking only works, if the cause is rare and not deterministic.

alternatetwo · on Oct 16, 2021

I'd need to look into it again but I think pretty much everything object state wise gets hashed.

Edit: The checksum for the player includes the content of each attribute of the player, the object state for each object owned by the player, the master object id of that object, the amount of attributes they carry (which is I think resources that villagers carry for example) and the world x/y/z position.

Macha · on Oct 16, 2021

Aren't all other RTS also lockstep?

NortySpock · on Oct 16, 2021

Total Annihilation was asynchronous, Supreme Commander was synchronous lockstep (sending user interactions only), Planetary Annihilation said they were client-server, with the server only sending updates for units a player could see.

Still a Total Annihilation fanboy at heart.

https://www.pcgamer.com/planetary-annihilation-interview/2/

https://www.forrestthewoods.com/blog/tech_of_planetary_annih...

PicassoCTs · on Oct 16, 2021

Lucky you. https://www.beyondallreason.info/ http://zero-k.info/

orangeoxidation · on Oct 16, 2021

(2001) - Not some strange content collector, Gamasutra rebranded

Definitely worth a read. The deterministic simulation is easy to understand (though not to do), clever and allows for great networking performance.

Afaik this approach is still useful and used today (e.g. in fighting games).

halvnykterist · on Oct 16, 2021

While lockstep is useful for some games, it was never really a good choice for fighting games due to the fast and timing sensitive nature. Rollback is an improvement over it that adds speculative execution for remote inputs, this is what should be the standard, although Japanese developers have only recently gotten on board with it.

dilap · on Oct 16, 2021

I guess it depends on how high a level you're looking at it from. Both lockstep and rollback are variants of the general approach of "shared deterministic simulation" where you're mostly just sending inputs. ("Mostly" 'cuz on reconnect or desync or whatever you'll probably want to send full state.)

I really enjoyed this GDC talk on rollback networking:

https://www.youtube.com/watch?v=7jb0FOcImdg&t=536s

(From 2019.)

PicassoCTs · on Oct 16, 2021

One improvement is, to train a model on the player inputs and thus predict behavior. It takes some time, but if done right, the game after a while feels really "instantaneous" even when it comes to fast paced high-apm action.

triska · on Oct 16, 2021

Complementing this article, there's also a great post mortem article about Age of Empires:

https://www.gamedeveloper.com/pc/the-game-developer-archives...

In the section "Things That Went Wrong Or We Could Have Done Better", it mentions as the final point an insight I often think about:

"8. We didn’t take enough advantage of automated testing. In the final weeks of development, we set up the game to automatically play up to eight computers against each other. Additionally, a second computer containing the development platform and debugger could monitor each computer that took part. These games, while randomly generated, were logged so that if anything happened, we could reproduce the exact game over and over until we isolated the problem. The games themselves were allowed to run at an accelerated speed and were left running overnight. This was a great success and helped us in isolating very hard to reproduce problems. Our failure was in not doing this earlier in development; it could have saved us a great deal of time and effort. All of our future production plans now include automated testing from Day One."

R0b0t1 · on Oct 16, 2021

Very clever but it only tests actions the AI takes.

triska · on Oct 16, 2021

It is true that actual interaction is not tested in this way. Still, AI actions are not all it tests: For example, just by keeping such games running for a long time, it implicitly also tests resource management (such as garbage collection), robustness of the networking code, synchronization etc.

What I found appealing in the description is the idea of keeping your program running for a long time and have it perform all kinds of actions automatically. Nowadays, I often hear about "test-driven development", "unit tests" etc., and it often turns out that they test very specific things out of a vast universe of all possible things. That does not mean that they are useless. As I see it, it means - as they phrased it in this article - that we often do not take enough advantage of automated testing.

Personally, when I test software, I always try to follow the general idea stated in the post mortem: Keep it running, and perform all kinds of actions automatically. I found several crashes and memory leaks in this way, which were not noticed during manual interaction because they only became significant when the actions were repeated thousands of times.

alternatetwo · on Oct 16, 2021

The AI actions are the same actions that players would take manually, they just get computed by the AI instead of a brain.

Afaik the AI logic gets processed on the host instance, and the rest just follow the commands sent through the network.

shaggyfrog · on Oct 16, 2021

Based on my personal experience of automated testing in AAA games, even if you "only" count the overlap with the actions humans take, it's a huge overlap in practice. And while there are edge cases only humans find, automated testing finds bugs humans don't.

The point of that section is that the earlier you find bugs, the quicker you fix them, and you end up with less overall pain for the length of the project.

oreally · on Oct 16, 2021

This article is wayy outdated in the context of today's games. Look to https://gafferongames.com/post/deterministic_lockstep/ instead, you get bonus animated gifs there.

TacticalCoder · on Oct 16, 2021

This article paved the way. FWIW I had written a fully deterministic (non networked though) game engine around... 1991.

HN user "dfan" here wrote a fully deterministic game engine for the DOS game Terra Nova in 1996.

There were probably others but resources about deterministic game engines were very rare for nearly nobody had one so this article about AoE was a gold mine for many.

Some terms may have changed, but the techniques are basically identical.

Blizzard in 2002 with Warcraft III had basically solved that issue for RTS once and forever too (and maybe for Starcraft before that?) and not much has changed since then.

reitzensteinm · on Oct 16, 2021

What specifically is outdated? If you look under the hood of modern RTS games they're all going to look pretty similar, and this article has a bunch of info Gaffer's doesn't. I would read both.

oreally · on Oct 16, 2021

For example you really don't need to enforce a 200ms turn rate, and their notion of what makes a 'communications turn' is unclear given the terms gamedevs use nowadays.

reitzensteinm · on Oct 16, 2021

Yeah, I used 30hz for my turn rate, but the principles are the same.

The terminology is dated, but there's some real gold in there. The bits about deer facing snowballing out of sync errors, out of simulation code corrupting simulation state, comparing state dumps... this is exactly what your life will be like if you build your games this way. I know this because I have (although complicated by the addition of rollback).

The Gaffer article barely scratches the surface.

bawolff · on Oct 16, 2021

Its amazing what people can do with very limited constraints (8 player real time multiplayer on 28.8 modem!) when they have to.

I probably can't even do a (modern) google search on a 28.8 k modem without it feeling absolutely painful.

vvanders · on Oct 16, 2021

Subspace was another impressive one from around that era. 50-100 concurrent players on 28.8/56k.

They used a totally different technique(dead reckoning + game design that was about prediction) but it was pretty smooth given what average pings where back then.

docflabby · on Oct 16, 2021

It's also amazingly still going - I've played since 2003 (beta launch was 1996) - probably the oldest "action" MMO https://store.steampowered.com/app/352700/Subspace_Continuum...

The techniques used in the game ended up being replicated in many other games and apps. Its networking was also key to the low lag it is entirety based on a custom UDP network stack that can send both reliable and unreliable packets.

vinyl7 · on Oct 16, 2021

It’s more sad than amazing. The amount of computing resources we waste for no real reaso. We have crazy fast technology that runs software just as slow as they did in the 2000s

rigelbm · on Oct 16, 2021

Not having to care about performance (because hardware is fast) allows us to build more software than ever before. Although optimized software saves on computing resources, it is wasteful on the most important resource of all: Software Engineers' Time. Not to say it's not important, but saying we are wasting computing resources for "no real reason" is not fair. There are a lot of delicate trade-offs involved.

vinyl7 · on Oct 16, 2021

That’s also the sad part. It’s no longer about taking pride in one’s craftsmanship and caring about the end-users experience. It’s about cranking out janky products to make money.

ajconway · on Oct 16, 2021

No, it's about delivering products faster or prettier.

There are fields where squeezing every tiny bit of performance matters — media compression, 3D software (including game engines), data processing, hardware.

Engineering is not art, it's about building things that can be useful.

fakedang · on Oct 16, 2021

Unfortunately as much as I love it, the new AoE2 DE has fallen to the same trap. Old AoE2 uses a fraction of the power (and disk space) compared to the new one.

jcelerier · on Oct 16, 2021

Eh I disagree. On the same computer original aoe2 would lag like crazy even on a LAN with 4-5 players and pop. limit to 200, with AoE2 DE it's muuuch smoother

alternatetwo · on Oct 16, 2021

UserPatch 1.5 had insane improvements to LAN, so good that it actually felt like single player if the ping was low enough.

Unfortunately most of scripters work never got used for HD and DE2.

netfl0 · on Oct 16, 2021

I’ve been waiting 20 years for this article.

I used to have 28.8. I’d play this game for hours on there!

I was always in awe this ever worked, even more so over time as software has become more complex….

larsiusprime · on Oct 16, 2021

Appropriate, since it was written 20 years ago :)

aligray · on Oct 16, 2021

For anyone looking for further reading on the subject, this article popped up on hn a few months ago which I thought was fascinating:

https://ki.infil.net/w02-netcode.html

orionblastar · on Oct 16, 2021

I remember that BBSes added SLIRP and PPP in order to dial in and access their Internet connection. It was mostly pay BBSes with MajorBBS or some commercial software. We played Doom that way as well.

ericbarrett · on Oct 16, 2021

Did anybody else here play Doom on on DWANGO [0]?

My claim to fame was a 4-way deathmatch with Thresh [1], myself, and two other randos (like me) on Doom 2, I think the map E2M7? The rules were first to a score of 100, rocket launchers only. Final score: Thresh, 100; me, 7; other opponents -3 and -6.

[0] https://doomwiki.org/wiki/DWANGO [1] https://en.m.wikipedia.org/wiki/Thresh_(gamer)

aidenn0 · on Oct 16, 2021

I played a couple of times at my friend's house; I wasn't allowed to tie up the phone line

serf · on Oct 16, 2021

thanks for reminding me of DWANGO. I did.

bombcar · on Oct 16, 2021

The mention of Doom reminds me - games with this form of perfect synchronization allowed for VERY small replays - basically all it needed to record was keypresses and time stamps.

skocznymroczny · on Oct 16, 2021

Unfortunately it also struggles with patching, because due to balance changes the commands don't work anymore as they are supposed to. For example back in the day Starcraft balanced Spawning Pool to cost 200 minerals instead of 150, to delay early zergling rushes by several seconds (which makes a massive difference). But if watching an old replay, it will still try to build the Spawning Pool at 150 minerals and from there everything breaks.

bombcar · on Oct 16, 2021

Some games handled this with “rule sets” but is was usually handled by referencing the version it was played at.

setr · on Oct 16, 2021

A replay is just another client to sync with — just arbitrarily far behind.. :)

jabl · on Oct 16, 2021

I recall using SLIRP back in the day with an dialup ISP that provided shell accounts in the base package, but charged extra for PPP (IIUC this was a fairly common business model). With SLIRP I was able to access internet directly from my PC without paying for PPP access.

Ronll · on Oct 16, 2021

"Part of the difficulty was conceptual -- programmers were not used to having to write code that used the same number of calls to random within the simulation (yes, the random numbers were seeded and synchronized as well)."

What do they mean, "same number of calls to random within the simulation", didn't get it

thatjoeoverthr · on Oct 16, 2021

The random number generator needs to give the same outcomes on both ends so that you don’t need to transfer results. To this end, you have to use the same deterministic generator code, seed and ensure you call it the same number of times on each system. This last point is because queries to a random number generator also change its state. If system A and B start with some seed, but system A calls it 30 times and system B 29 times, the next call will yield a different result on each system.

short_sells_poo · on Oct 16, 2021

I suspect it pertains to the fact that’s most RNGs tend to be pseudo-random number generators that maintain an internal state. Each call of the RNG mutates the internal state. If you have two processes, one calls the RNG 2 times and the other calls it 5 times, the process states will have diverged and the simulation is no longer consistent between them. This is true even if the simulation ends up using 1 value out of those RNG calls, because the RNG state itself is implicitly part of the simulation.

transpostmeta · on Oct 16, 2021

If you call a pseudo-random number generator, you get the next random number in a strictly defined sequence of numbers derived from the seed. So the generator has state, and needs to be called exactly the same number of times in the same order on all clients to make sure that the random events happen the same on all clients.

Agentlien · on Oct 16, 2021

For those who may not be aware, the "upcoming" RTS3 mentioned in the article is Age of Mythology from 2002.

aetherspawn · on Oct 16, 2021

I found RTS3 had a lot more network issues than AoE2. It would more consistently fail to hole punch (fail to join a lobby game with certain people for no discernible rhyme or reason) and/or drop connection during play.

I could play RTS3 with most people around the world, it seemed, except a few of my best friends. They could individually play with random people all over the world, but not each other or myself, depending on the day of the week or something like that.

Perhaps their decision to move away from DirectPlay was a mistake or premature optimization!

jeroenhd · on Oct 16, 2021

Age of Mythology always worked for me on LAN and usually WAN without too much port forwarding. AoE (both 1 and 2) never seemed to work well over the internet for me without loads of manually forwarded ports.

The rewrites/updates have definitely helped in this situation but I think the AoM code was built for and tested on networking hardware that was a lot more error prone than modern routers are, which in turn affects how it behaves even today. Clever NAT bypasses definitely work better now than they did fifteen or twenty years ago, but they're definitely not guaranteed to work still. The rise in CGNAT usage also paints a grim picture for anyone playing video games together.

I've also had my fair share of peering issues between ISPs. At one point I discovered that I could send UDP packets to my friend, but my friend could not send UDP packets to me, no matter the port forwards and even after forcing the modem into bridge mode. TCP worked fine in both directions, though. Something upstream seemed to filter the packets out, I'm guessing to prevent DDoS attacks, so whenever we played P2P games one of us needed to run some kind of VPN.

IPv6 can solve many of the problems I remember having to deal with setting up games back in the day, but most games have become cloud-only anyway. The age of custom servers and P2P gaming is over, killed by lootboxes and shitty NAT implementations.

Agentlien · on Oct 16, 2021

For anyone wondering, RTS3 refers to Age of Mythology from 2002.

aetherspawn · on Oct 17, 2021

Ah, I thought they meant AoE 3. My bad.

the__alchemist · on Oct 16, 2021

I love it. I wonder how much of this they applied to AOE2 DE, which is currently thriving, and uses the same basic game mechanics, artwork, and balance as "AOK".

timpattinson · on Oct 16, 2021

It's the same engine with some updates, except game messages are routed through a centralised server. All of the stuff about lockstep simulation still applies.

fakedang · on Oct 16, 2021

Isn't the engine modified to accommodate for the new physics elements?

paavohtl · on Oct 16, 2021

If you're referring to building destruction, they are just baked animations; the physics simulation was done in an external 3D modelling program. Which is why the enhanced graphics pack for the game is about 30 gigabytes.

fakedang · on Oct 16, 2021

Thanks. That makes a ton of sense now.

nulptr · on Oct 17, 2021

If each command would only be executed two communication turns in the future, and a communication turn is ~200ms, isn't there ~600ms of lag for each command?

> For RTS games, 250 milliseconds of command latency was not even noticed -- between 250 and 500 msec was very playable, and beyond 500 it started to be noticeable.

The article mentions this as well, so was ~600ms latency just expected?

Why doesn't it work to execute commands _one_ communication turn in the future?

EamonnMR · on Oct 16, 2021

I've built a couple of demos using this article as a guide to understanding how the net architecture works. It's pretty fun to run through. Making your sim deterministic is a neat challenge.

gvv · on Oct 16, 2021

Very cool. Just a remark, I'm using Brave and the website won't load until I remove the shield.

29athrowaway · on Oct 16, 2021

[2001]