MMO Architecture: clients, sockets, threads and connection-oriented servers

yobid20 · 2023-10-21T16:39:05.000000Z

Amazed to see connection oriented protocols still being used here. As someone whose been in the real time media space for decades, we switched everything to UDP about 15 years ago and achieved much higher throughput and lower latency. Our server software architectures use a single dedicated thread on a UDP port, which reads packets and distributes them to lockless queues each owned by a dedicated single thread handler exclusively bound to separate physical cores. Similarly for the outbound side. This allows us to scale vertically to maximize a single machine's resources, while allowing us to essentially cap the upper bounds on latency in high load scenarios. Our architecture enables us to have hundreds of thousands of users PER MACHINE. We've been running this architecture for over 10 years now in production successfully. While we do have logic to scale horizontally, we minimize this and use it only when its required for scaling further or traversing specific network clouds, depending on where a user is able to connect. Worth mentioning we use general linux distros on physical hardware. No virtual machines. We also use kernal bypass techniques.

Any time you add more servers to spread load, you're increasing latency because for each hop you're traversing the entire software network stack twice plus hops through hardware switches.

Nobody uses dedicate thread per client anymore (if they do, its a poor design).

toast0 · 2023-10-21T17:02:42.000000Z

> Our server software architectures use a single dedicated thread on a UDP port, which reads packets and distributes them to lockless queues each owned by a dedicated single thread handler exclusively bound to separate physical cores.

As an internet backseat network performance person...

Have you considered one thread per core/NIC queue receiving packets, with RSS (receive side scaling)? If your bottleneck is network I/O, that should avoid some cross-cpu communications. Otoh, if you can't align client processing to the core its NIC queue is handled on, then my suggestion just adds contention on the processing queues; although maybe receive packet steering would help in that case. But, I'd also imagine game state processing is a bigger bottleneck than network I/O?

yobid20 · 2023-10-21T17:16:55.000000Z

We used to use RSS but switched to kernel bypass instead which increased throughput 10x easily. I imagine we also have a much higher bandwidth requirement than what MMO's use (we can do 40Gbps). Every stream requires encryption and decryption (AES GCM ciphers) so there is a huge user space cpu processing involved too (openssl). That's where kernel bypass helped a lot because it offloaded all of that network I/O to 2 single cores (input/output) and left all the other cores available for use for user space processing of streams.

also worth mentioning, we wrote everything in c++. anything else is too slow.

coppsilgold · 2023-10-21T21:10:16.000000Z

If latency is so important why use AES GCM instead of AES OCB?

openssl speed -evp CIPHER for example: (ECB only for reference, don't use)

    type               16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
    AES-256-GCM         655193.56k  1747223.44k  3399072.36k  4490100.61k  5033129.30k  5108596.96k
    AES-256-OCB         631357.15k  2268916.74k  4794610.30k  6492985.36k  7174274.14k  7301540.60k
    AES-256-ECB         997960.96k  3972424.35k  8096120.70k  8105542.89k  8179659.94k  8188882.64k
    AES-128-GCM         747463.90k  1856932.41k  3762591.34k  4700335.25k  5224533.22k  5157661.06k
    AES-128-OCB         702729.38k  2479151.86k  5919529.91k  8719316.46k  10545305.23k 10536095.61k
    AES-128-ECB         1291715.10k 5180010.63k  11093258.14k 11815558.81k 11913592.61k 11947322.76k

OCB also won the CAESAR competition for "High-performance applications" portfolio. It is much older than the competition and is no longer patent-encumbered.

yobid20 · 2023-10-21T21:38:52.000000Z

We have external dependancies requiring this. We can't change the specific cipher suites ourselves.

KRAKRISMOTT · 2023-10-21T22:57:26.000000Z

> Our server software architectures use a single dedicated thread on a UDP port, which reads packets and distributes them to lockless queues each owned by a dedicated single thread handler exclusively bound to separate physical cores.

LMAX disruptor?

samsquire · 2023-10-23T09:49:35.000000Z

This is really interesting.

Could you explain more about your lockless queues?

I recently wrote a lockfree ringbuffer inspired by LMAX Disruptor but it is only thread safe 1-thread to 1 thread. SCSP. It has latency between threads on a 1.1ghz Intel NUC of 80-200 nanoseconds.

I have ported Alexander Krizhanovsky's ringbuffer to C but I haven't benchmarked it.

https://www.linuxjournal.com/content/lock-free-multi-produce...

meheleventyone · 2023-10-21T17:00:05.000000Z

Games use UDP quite a lot as well.

lowq · 2023-10-22T04:50:36.000000Z

Sounds amazing! Did you implement congestion control per connection, and if so, which algorithms did you use? I can imagine that CC could really affect throughput at this scale.

PTOB · 2023-10-21T18:42:51.000000Z

"switched everything to UDP 15 years ago" ME: <Nods head approvingly in 25-years-ago-LAN-party>

hipadev23 · 2023-10-21T19:00:07.000000Z

I feel like the design you explained is the same, just swap connection to connectionless.

Animats · 2023-10-21T20:05:17.000000Z

Interesting, but talking to the clients isn't the bottleneck for big-world systems. It's deciding who needs to know about what, and how often. If you tell everybody about everything that's happening, you have O(N^2) traffic. For large N, that dominates the O(N) load imposed by clients doing things. So there needs to be something that only tells some users what other users are doing, based on where they are and maybe which direction they are looking.

Second Life has 50,552 users connected right now. Typical Saturday. They're all in the same world. But they're not near each other. The biggest crowds are about 130 users. There are a lot of filters. Each client is connected to the server for the region it is in, plus a few nearby regions, so it can see past region boundaries. Within a region, the server sends updates to each user as objects move. Updates are frequent, up to 45Hz, for nearby objects in the viewing frustrum. Lower for distant objects, and much lower outside the viewing frustrum. That avoids the O(N^2) load problem.

The clients overload before the servers do, incidentally. The classic clients are single-thread and do too much CPU work per visible avatar.

andai · 2023-10-21T13:53:38.000000Z

See also: Patrick Wyatt interviewed by Casey Muratori at HandmadeCon: https://youtu.be/1faaOrtHJ-A

He was the 2nd employee at Blizzard, did the netcode for their games and also for Guild Wars.

His blog is great https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...

martinohansen · 2023-10-21T15:22:11.000000Z

Unrelated to the post here, but I think both Pat and this guy[1] say that they build the multiplayer for Diablo by themselves…

Anyhow, both talks are fascinating and surely worth a watch.

[1] https://youtu.be/Mlrrc_vy79E?si=Ga50gQyyUXjHwZz0

vitus · 2023-10-21T16:57:15.000000Z

Right around 37 minutes in, Brevik mentions that Mike O'Brien (along with Pat, one of the three co-founders of ArenaNet) was the brains behind Battle.net, and then "a few of the guys from [Blizzard South] moved up north during the last six months of development and started making Diablo into multiplayer and integrating Battle.net into the entire thing."

That lines up with Pat's assessment:

> Initially Collin Murray, a programmer on StarCraft, and I flew to Redwood City to help, while other developers at Blizzard “HQ” in Irvine California worked on network “providers” for battle.net, modem and LAN games as well as the user-interface screens (known as “glue screens” at Blizzard) that performed character creation, game joining, and other meta-game functions.

instagib · 2023-10-21T22:44:06.000000Z

Thanks for the link but I think David Brevik said a couple guys from blizzard came up to help for 6 months and he had zero multiplayer code built as well as it was his first C program.

DrammBA · 2023-10-21T19:17:08.000000Z

That interview was exhausting, I couldn't finish it. I have no idea why the interviewer treated this like a systems design interview at a FAANG, with constant interruptions and unnecesarily rephrasing what the interviewee had already explained. The GDC talk linked in another reply, while only tangetially related, was much more enjoyable.

bob1029 · 2023-10-21T09:57:01.000000Z

I feel like some HFT/fintech concepts are ~ideal MMO architectures.

With a 2 tier setup, you could theoretically have a central server processing batches of events that are aggregated across multiple players/regions/etc. A single thread can service upwards of half a billion events per second, and if each of those events covers multiple potential players, then I'd argue we are in a pretty good position.

For me, having truly 1 consistent global universe is the only thing that would get me to consider an MMO in 2023+. The technical limitations forcing sharded worlds were excusable when WoW was released, but I don't have patience for those arguments anymore. Not if you want me to expressly burn my time and get paid for it on a recurring basis.

Animats · 2023-10-21T21:02:33.000000Z

That's what I was hoping for from the "metaverse" crowd - really big, seamless worlds. Didn't happen. For the amount of money spent, the results are very disappointing. Facebook/Meta spent how many tens of billions of dollars on this? The crypto crowd produced zero functioning high-quality big-world systems.

I had some hopes for Improbable. Improbable has a big seamless world metaverse back end system. It involves a lot of remote procedure calls between multiple servers. The result seems to be excessive server costs. Five indy games, some quite good ("Worlds Adrift" was one) tried it, went live, and went broke. Improbable, a VC-funded company with about $400 million, has been thrashing around ever since. They tried setting up their own game studio, and produced "Scavengers". That was a game where a few thousand players all charge the same goal. More of a crowd tech demo than a game. After that flop, Improbable pivoted to making simulators for the British military, which apparently worked, the military not being too concerned about a few dollars per hour server cost during war games. That job done, they tried hooking up with Yuga Labs, the Bored Ape / Otherside people. Did another zerg rush demo for them. But cost per user per hour was so high they only ran that twice, for a few hours each time. Now they're pivoting again, to servicing baseball ("MLB", as the baseball industry calls itself) so that fans can watch games in VR, or something like that.

So what went wrong? The business problem was that ad-supported metaverses don't work. There is no role for "brands" in a highly immersive world. Quite a few companies have now figured this out the hard way. Ignoring that, though, what are the technical problems with scaling?

Having spent too much time inside Second Life client code, and written my own client, I can answer that. First, the user needs a "gamer PC" and serious network bandwidth to deal with a highly detailed dynamic world. With user created content, a key metaverse feature, there's far less instancing, and you need about 3x the GPU memory of a curated game. So you need roughly an NVidia 1060 and a few hundred Mb/s of network connection. The average Steam user has that, (see Steamcharts) but the average Facebook user does not. If you want to support WalMart $99 PCs and phones, there's a big problem.

"Cloud gaming", where the GPU is in a data center, lets anything that can play NetFlix play AAA title games. The problem is cost. Most of the cloud gaming hosting services gave up. Even Google gave up. NVidia GeForce Now remains, but they've raised their prices several times. Each user is using a dedicated PC-class system with a good GPU, so this isn't cheap. So trying to solve the user cost problem with cloud gaming doesn't work.

Server side is actually less of a problem. The Second Life server architecture isn't bad. Second Life was supposed to be the "next Internet" when it was designed over 20 years ago, and as a result, it's overdesigned compared to most MMOs. User-developed clients are not only encouraged, most users use a third party viewer, rather than the open source Linden Lab viewer.

The networking architecture to the client is not too unusual. The time-critical stuff is on UDP, and the bulk data transfers are on HTTP. The UDP system supports out of order delivery to eliminate head of line blocking. (Reliable delivery, in-order delivery, no head of line blocking - pick two.) The trouble with allowing out of order delivery is that higher level operations with state can get into trouble. Hence discussions like this.[1]

The most unusual thing is that each client talks directly to multiple region simulators in the same area. This is what produces the seamless world illusion. The simulators also talk to each other, but mostly about objects crossing the boundaries between regions. Inter-simulator traffic is thus manageable. It's more of a state locking problem than a bandwidth problem. The design predates the theory of eventually consistent systems and conflict-free replicated data types, leading to immersion-breaking out-of-sync problems, including teleport failures and getting stuck crossing the boundary between regions.

With this architecture, there's no major limit to the size of the world. The internal traffic within the data center per region simulator does not grow with the size of the the world. Nor does the per-client traffic. Traffic to the asset servers does grow, but that's cached (AWS + Akamai). Shared services (login, billing, etc.) have multiple servers with load balancers.

So that's the successful path to a big, seamless world.

There's much trouble with seemingly random sluggishness, but that turns out to be due to various specific problems, some of which are being fixed.

[1] https://community.secondlife.com/forums/topic/503010-obscure...

chiph · 2023-10-22T15:25:38.000000Z

> The business problem was that ad-supported metaverses don't work.

Have any of them tried to add online shopping via affiliate links?

robertritz · 2023-10-21T10:25:20.000000Z

Yes! 1 consistent global universe would get me to jack into whatever matrix they were selling.

It could even be multiple games connected with sufficient break in context. Eve for space and another game for ground assault. But they actually interact live.

Then throw in next gen VR and you basically have ready player one.

marijnz · 2023-10-21T11:04:41.000000Z

I can't help but share the MMO we're working on that fits that mold! https://www.seed-online.com

EVE is by the way also working on an FPS, Vanguard. Which is supposed to interact with the game world of EVE.

meheleventyone · 2023-10-21T11:23:40.000000Z

CCPs previous FPS DUST514 also linked with EVE. All the resources were in the EVE economy and ships could perform orbital bombardments. Although it mostly demonstrated how horrible it is to try to intertwine two very different games together!

I worked with Oddur and Ivar there quite a few years ago now so am looking forward to SEED!

Kerbonut · 2023-10-21T16:38:00.000000Z

I originally bought a PS3 to play Dust 514, and only got to play for a few months before it got shut down. I really like the concept of different games sharing state and universe. I hope they eventually crack this nuts.

meheleventyone · 2023-10-21T10:24:18.000000Z

> I feel like some HFT/fintech concepts are ~ideal MMO architectures.

Can you explain/link to some?

I suspect MMO servers, particularly when server authoritative end up having to do a lot more processing per event and tick. Distributing world state updates is also an extremely thorny issue with lots of inter-dependencies.

bob1029 · 2023-10-21T15:22:41.000000Z

See concepts like LMAX Disruptor and https://aeron.io

Whatever is good for low-latency trading systems (i.e. where you are paying contractually for microsecond-level guarantees), is also maybe good for gaming.

The whole concept with these patterns is that there is only ever a single writer at any given moment. Single writer principle w/ batching is what gives you enormous uplift in throughput, even when you are necessarily constrained to fully serialized business semantics. I'd argue that an MMO does not need to be as strictly serialized as whatever CBOE, et. al. are doing.

meheleventyone · 2023-10-21T16:01:57.000000Z

Interesting, thanks! Skimming the Disruptor paper its very similar to how you keep audio context fed using a ring buffer to write ahead as the audio context chews through behind you so the entire thing can remain lock-free and low latency. Typically in a separate thread as the latency and timing requirements are very strict.

Aeron on the face of it looks very similar to the architecture diagrams you'd get popping up designing a networking layer in a game already. I suspect the cross-pollination in both directions could be interesting.

Neither seem to particularly help with the intensive/interesting bit of an MMO which is running the actual simulation. For example big fleet fights in EVE as far as I remember have never been IO bound rather single-core performance bound. It's common for the bigger Corporations to contact CCP ahead of big fights so the node it's likely to happen on can be moved to a bigger machine. It's actually quite interesting because a lot of it is written in Stackless Python so not the most performant of languages and trapped in a single-threaded context so there is (nominally at least) quite a lot of headroom.

For big seamless worlds how you distribute the simulation is the tough bit. Hence companies like Improbable trying to solve that generally.

brobdingnagians · 2023-10-21T16:48:23.000000Z

I had this crazy idea to write an MMO server using Seastar with ScyllaDB and try to fit a huge isometric RPG shard in one server. It seems that modern servers could handle that load and have some interesting possibilities with a unified server. Kind of makes you wonder what the possibilities with text MUds would be if you dedicate more server time to player experience and game systems. Things people would have liked to do back in the day with open worlds might be more possible now.

Daegalus · 2023-10-22T03:41:23.000000Z

I have been working on a MMO resurrection project of an old MMO for a while now. It's nice to know my design is actually decent. I have no MMO or game backend dev experience. I am a backend and DevOps engineer though.

I'm building the server in Go. I use goroutines a lot. I use a shrinking/growing ring buffer for requests and responses. Uses UDP and I just receive a request and pop it onto the queue. I keep the address of the client along with the packet request. The queue is read by another goroutine and gets processed and sent to a packet handler. Which does game state work, and queues a response, or passes it off to another handler. Then the go routine for the responses picks it up, reads the address and port of the client and sends it the packet.

All I have maybe 5-6 threads running in parallel and I already can log into the game and walk around with others. There's tons of work still needed. But it's nice to know my underlying design is sufficient to maybe handle the dozens of people playing for nostalgia.

cpeterso · 2023-10-22T03:51:15.000000Z

Very cool! What game are you resurrecting? Are you using the original client applications? Is the client/server protocol documented or are you having to reverse engineer both the protocol and game server state?

Daegalus · 2023-10-22T04:42:05.000000Z

It's Black Moon Chronicles Online: Winds of War.

I have client sources as I am working with the person that owns the rights.

Protocol is not documented, so reverse engineering, but having the clientside helps.

The server software was lost to time and accidents. But we have dumps of quest data, times, mobs, etc and all assets in original form. So as soon as I get the server in a working state. We have the full game.

We plan to open source/creative commons everything when we get something playable.

I've done 4 massive redactors as the code grows, to accommodate changes in understanding and design flaws.

mrklol · 2023-10-22T09:22:03.000000Z

What about this project here? https://github.com/jeanbmar/black-moon-rewind Is this far behind?

Daegalus · 2023-10-22T15:42:52.000000Z

We are aware of their work. We've been working on ours off and on for many more years than this attempt. We have just been slowed down by real life stuff and I'm solo working on the backend. So if I get busy, not much work happens.

We had reached out to him when we discovered his work back then. But he refused to join our effort unless we 100% open source everything then and there. Due to legal reasons at the time (related to the T4C owners and T4C using similar code to BMC), we had to stay closed source. He still refused to work with us even with a promise that we plan to open source eventually.

We shared with him all the packet data models and such but he was kind of unresponsive after our initial conversations.

He is also working 100% blackbox while we have sources for the client and assets/server data.

Hopefully all that makes sense. I haven't really paid attention to his project since then and it seems he stopped all work efforts on it 2 years ago.

But to answer your question, it's quite a bit far behind what we have. But there is so much work that goes into an MMO. We have solved the network stuff, we are just working now on world state as we have the ability for multiple people to log in and do stuff with dummy data. I have been writing tools and importers for the server data. Like zone files, creature, NPC, item, etc data. All the monster spawners are in the zone files for example and then needs to be cross-referenced with other data, and then assets to actually draw spawn mobs.

mentos · 2023-10-21T18:25:31.000000Z

One thing I was hoping from cloud gaming was the ability to trust the client and offload all of the authentication from the server. Shame stadia died I feel like there was innovation possible in that direction that could have created a killer app for the service.

candiddevmike · 2023-10-21T18:46:13.000000Z

Has there ever been a case where trusting the client ends well?

meheleventyone · 2023-10-21T19:15:22.000000Z

With cloud gaming there is no information leaked as the client is rendering video sent from a server and sending back inputs. You can still run an aimbot using computer vision techniques but the surface area for the client to be untrustworthy is much lower.

I also think there is a compelling design space where cheating doesn’t really matter or make sense where trusting the client is fine.

bullen · 2023-10-21T08:02:09.000000Z

I made a MMO in 2001 and came to the conclusions that one thread per socket was not going to scale and thought I would mothball the project until that was solved in Java.

2023 and now we have virtual threads that replace blocking IO with non-blocking automatically! Haven't tested it yet though! But I am reviving the 2001 MMO now!

In between I made my own MMO backend that uses NIO, conclusion is event-based network and validation is the only way to scale action MMOs past 100 players.

Also you need to share memory atomically between cores, so Java or C.

nvarsj · 2023-10-21T10:36:02.000000Z

I feel like the Erlang/OTP would be perfect for the MMO use case. It’s what I’d try first if building from scratch.

hmmokidk · 2023-10-21T15:15:44.000000Z

I have an MMO demo in Elixir!

I haven’t updated it in some time because busy from work but take a look:

http://rotb.io

Thaxll · 2023-10-21T14:09:20.000000Z

Erlang performance would be a big problem. It's subpar compared to C++ by an order of magnitude.

nvarsj · 2023-10-22T21:41:18.000000Z

Would an MMO backend be CPU bound though? I feel the use case of having a large numbers of active connections with the very distributed nature of passing events around between them all would be perfect for Erlang/OTP. Anything CPU intensive (like dungeon generation or similar) could be offloaded to another service if needed.

bullen · 2023-10-22T23:09:18.000000Z

You want low latency action, and for that you need one CPU with all cores talking to all the memory without locking. Erlang can't do that, and neither can any other language; except C (Arrays of 64 byte Structures) and Java (Concurrency complete rewrite of the JVM in 1.5 that was adopted and failed for C++11 because you need a VM + GC for that model to be meaningful).

nvarsj · 2023-10-23T08:52:21.000000Z

Does it really need to be low latency? It's not an FPS - so I don't think high tick rate is important. I'd rather have a wider synchronized global state and acceptable server side latency (<500ms) than instant server updates. The client can interpolate everything.

bullen · 2023-10-24T09:32:29.000000Z

Then you are making a slideshow, not a game.

The whole point of games is real-time action.

It does not need to be violent, but it needs physics that give you the feeling of being there.

Multiplayer > VR

nvarsj · 2023-10-29T19:36:56.000000Z

Lots of mmos have low tick rates. I believe Eve Online is once per second. Originally WoW was like 2.5/s. You don’t need high tick rates in MMOs (but you certainly have to design the game systems around expected server tick rate).

cmdrk · 2023-10-22T17:24:43.000000Z

depends on the kind of game you are making. People made MMOs in the year 2000 when CPUs themselves were an order of magnitude slower.

bcrosby95 · 2023-10-21T19:33:46.000000Z

Erlang/OTP seems perfect for a diablo-esque game where you have lots of instances with a handful of players each. But a giant shared world with thousands of players doesn't seem to play into its wheelhouse as much.

bullen · 2023-10-21T11:12:35.000000Z

No erlang cannot share memory between cores, it has to copy it.

klibertp · 2023-10-21T12:11:55.000000Z

I think binaries are refcounted and shared in Erlang? I'm not sure if this happens between schedulers/cores, though. They're also obviously immutable, so might not be what you mean by "sharing memory atomically".

Other than Erlang, Pony[1] might be an interesting choice. It allows sharing memory, even mutable memory, but it tracks the sharing in the type system. It's been a long time since I looked at it, and it definitely had a bunch of rough edges, but I really liked where the things were going. I hope it got even better since.

[1] https://www.ponylang.io/

bullen · 2023-10-21T13:16:47.000000Z

For me C (1970) and Java (1990) are the only options for eternity.

I use some C++ features and dabble in js when I make HTML but really since Applet and Flash has been removed the browser is just a bloated waste of time.

Make your apps fast by allowing them to share memory without latency!

I have also coded Perl, VisualBasic, php and C# and I'm pretty sure this is it.

What we need now are VMs that can take any bytecode/instructionset and translate it to all others. For true crossplatform development. But that requires us to dump dynamic allocation. Arrays of 64 byte structures FTW!

ReactiveJelly · 2023-10-21T16:14:28.000000Z

> VMs that can take any bytecode/instructionset and translate it to all others

You're gonna love this: https://en.wikipedia.org/wiki/WebAssembly

bullen · 2023-10-21T18:23:31.000000Z

I would if anything worked.

Still waiting for a Windows release of this: https://github.com/bytecodealliance/wasm-micro-runtime

Don't hold you breath.

Also needs Risc-V and ARM 32/64.

jongjong · 2023-10-21T10:43:40.000000Z

I didn't realize that one-thread-per-socket was still something people considered; especially given that many programming languages nowadays have excellent support for async/await which does a great job at managing multiple concurrent connections within a single process/thread and doesn't have the overhead of context-switching.

flohofwoe · 2023-10-21T11:22:07.000000Z

In an MMO backend architecture, the way client connections are handled is really just a tiny implemention detail at the edge of the architecture diagram, only affecting a very small part of the overall architecture. Ideally you'd move that stuff out into proxy servers sitting between the clients and the actual game servers with "one connection per client" on the public-network side and "one connection per game server" on the internal-network side. The proxy servers would also deal with validating the incoming traffic (on the message protocol level at least), encryption and of course routing messages between clients and game servers.

PS: now that I actually RTFA, the same concept is already mentioned there, it's just called Frontend Server.

otabdeveloper4 · 2023-10-21T12:25:10.000000Z

Context switching will happen regardless of where your thread implementation lives. Reimplementing scheduling in a bespoke fashion in usermode will not give you the gains that you hope for.

Basically, learn how an OS works first before trying to reimplement it badly.

flohofwoe · 2023-10-21T12:34:15.000000Z

I guess the idea is that an async/await context switch would always be cheaper than an OS thread context switch.

Whether that is actually true probably depends a lot on the specific operating system and async/await runtime (I think there's a lot of mysticism involved seeing async/await as some sort of silver bullet instead of relying on cold hard performance numbers alone).

otabdeveloper4 · 2023-10-21T14:54:07.000000Z

There's no reason to believe it should be cheaper. The contents of the context and the process of switching it is the same regardless of where in the software stack you're doing it.

(Async/await was invented in interpreted languages to circumvent their global interpreter locks, which is a whole different problem not related to context switching at all.)

ReactiveJelly · 2023-10-21T16:10:51.000000Z

That's now how I heard it.

Event loops were old hat in GUIs, and web servers like Nginx adopted them so they could service thousands of requests concurrently without allocating stack space for thousands of threads.

And the context is not the same. Switching threads requires you to go through the kernel scheduler, so it has to change page tables and stuff in the CPU, right?

It lets you write simpler code too, because some events can be handled in the loop without involving mutexes and thread safety.

If async/await is, as you claim, a pointless hack for interpreted languages, why did Nginx, written in C, get so much traction with essentially the same architecture?

gpderetta · 2023-10-21T17:40:26.000000Z

There are two primary costs to preemptive, kernel-driven context switch: first is the userspace to kernelspace transition cost which, while has been going down for a while, it is still non-trivial and actual got worse with the various spectre mitigations. Second cost is that preemptive context switches must by by necessity conservative and need to save and restore a lot of state that might not be necessary on a cooperative switch.

Stackless (i.e. async/await as opposed to green threads or stackfull coroutines) context switches have the additional advantage that they reuse most of the stack and only suspend a single stack frame. This means that the rest of the call stack (shared between contextes) can stay hot in cache.

Whether any of these costs matter depend a lot on the application, the amount of context switches and the amount of work done between switches.

yazaddaruvala · 2023-10-21T17:07:12.000000Z

I’m not sure where you’ve gotten this information but I would recommend you do some more research.

There is a lot of great info about operating systems - specifically ring0 vs ring1 for the context switch overhead.

Meanwhile, only Python and Ruby had GILs. Async/await became popular through JavaScript and C#. Now very popular in Rust as well. None of these runtimes have a GIL.

Async/await or Fibers (C++, Go, Java 21) are entirely about context switching overhead.

vitus · 2023-10-21T22:02:12.000000Z

> Async/await became popular through JavaScript and C#. Now very popular in Rust as well. None of these runtimes have a GIL.

To be fair, while typical JS implementations do not have a GIL, it's usually unnecessary due to the runtime being single-threaded.

In case anyone else is curious when the above languages grew this feature:

(F# piloted this in the 1.x days, back in 2009 or earlier.)

It seems like C# developed this feature first in 5.0 in 2012, followed by JS (Dec 2016 in Chrome, widely supported across other major browsers within a few months). Python 3.7 came out in 2018, Ruby 3.0 came out in 2020.

Async/await landed in C++20 via coroutines, and Java 21 landed last month. Rust grew this feature in 2018, which landed in the stable channel in 1.39 (Nov 2019). Golang has had goroutines / channels since day 1.

dragonwriter · 2023-10-21T22:33:58.000000Z

> To be fair, while typical JS implementations do not have a GIL, it's usually unnecessary due to the runtime being single-threaded.

Yeah, "we avoid a GIL because we don't offer threads at all" is, like pre-1.9 Ruby's, "we don't have a GIL because are threads are 1:N green threads" both technically "no GIL" but also worse for thread-based parallelism, then "we have native threads, but there is a GIL, which native code can release".

> Ruby 3.0 came out in 2020.

Ruby 3.0 doesn't have async/await, though. There's a third-party colorless Async gem, and built in constructs (that require a scheduler implementation) for lower-friction asynchronous programming (async as an approach has been popular for Ruby for quite a while), but not with async/await syntax in either case.

vitus · 2023-10-22T04:06:52.000000Z

> Ruby 3.0 doesn't have async/await, though. There's a third-party colorless Async gem, and built in constructs (that require a scheduler implementation) for lower-friction asynchronous programming (async as an approach has been popular for Ruby for quite a while), but not with async/await syntax in either case.

(I haven't written anything in Ruby in roughly a decade, so I'm happy to defer to anyone who's actually followed the language the whole time)

That's fair. I had skimmed and found some reference to the async gem relying on Fiber::SchedulerInterface which was introduced in 3.0, although it apparently existed back in 2017.

There's similarly the async-await gem by the same org (also first released in 2017), which at least partially supports that syntax. But in either case, it's not a native language feature.

otabdeveloper4 · 2023-10-23T04:04:16.000000Z

You should take your own advice, because have no clue what a context switch even is.

P.S. C++ async is entirely about running on embedded hardware, not about "overhead".

P.P.S. All garbage-collected languages have a GIL of some sort. This is the real reason GC languages aren't used in systems programming, not GC pauses.

fenesiistvan · 2023-10-21T12:51:59.000000Z

The one client = one socket assumtion is also bad. You could use a single UDP server socket to handle all clients.

not_the_fda · 2023-10-21T13:16:33.000000Z

async/await is just hiding the threads from you. There are threads behind the scenes. You want a thread per IO point, whether explicitly or using async/await.

ReactiveJelly · 2023-10-21T16:12:24.000000Z

No it's not. You can set up async runtimes that run on a single thread and service thousands of long-running connections with slow clients. Tokio, the most popular Rust async runtime, starts one thread per CPU core by default, and you can have thousands of tasks running on just a few OS threads.

Snacklive · 2023-10-21T16:38:16.000000Z

"No it's not" Proceeds to say exactly what the guy above said

VoidWhisperer · 2023-10-21T10:56:53.000000Z

Question with regards to the FIFO queue and processing it:

Say a user does two actions in quick succession - action A that is something that inherently takes a bit longer to process on the game server (ie a complicated transaction of some kind) and then right after does action B, one that requires little to no processing.

How is it best handled to make sure that the order of actions by the player is maintained with processing? With this scenario, it is possible action B will be processed before action A, leading to the game server considering them in the wrong order. I'm assuming the solution is to have some additional logic to make sure you are only processing one action per player at any given time, but I was wondering if anyone knows any better ways?

flohofwoe · 2023-10-21T11:54:40.000000Z

One way to solve this is to have per-player task queues in the game server which processes tasks sequentially (but concurrently with other player task queues, this is actually a perfect use case for async/await style code).

Critical actions should also have a sequence counter to make sure they are enqueued in order and to detect whether actions got lost somewhere (especially when using an unordered networking protocol like UDP) - message ordering usually already happens on a lower level than the game code though in the code layer that sits on top of UDP sockets, but a higher level per-player action counter can't hurt either I guess, if only for debugging)

gryn · 2023-10-21T12:11:26.000000Z

Only if player tasks are independent. For example, If both are trying to mine a ressource(assuming a shared world) and you do it parallel you might end up with the wrong person winning the mining race condition.

giantrobot · 2023-10-21T19:34:58.000000Z

The server side can "pre-roll" resources and clients make a roll when mining the resource. The winner is the one that is closest or matches the pre-rolled value on the server. In situations where there's no contention the user is default the closest to the pre-roll. In situations where there is contention and events happen close together the precise timing isn't required.

The server also doesn't need super precise time slices. Some of that can be hidden on the client with animations, cooldown timers, and other visual tricks. An MMO will have different timing expectations than say a twitch shooter.

flohofwoe · 2023-10-21T12:19:23.000000Z

Yeah, synchronization between task queues is left as an excercise to the reader :D But I guess in that specific case it doesn't really matter, the server could just as well roll a dice to decide who wins the race (or rather: access to shared resources between task queues need to be handled like a regular data race in multithreading, what's important is that the server is the only source of truth).

Latency-sensitive games also might want to speculate ahead by running the local player's task queue on the client, and roll back when the authorative state coming in from the server disagrees with the speculated client state.

firechickenbird · 2023-10-21T08:47:41.000000Z

I think your last chart is slightly incorrect: the game server is sending a message directly to the user, whereas it should theoretically only talk to the intermediate message queue (or frontend server)

monlockandkey · 2023-10-21T11:46:10.000000Z

I wonder what a mmo using Golang and goroutines look like?

flohofwoe · 2023-10-21T17:31:52.000000Z

Anectodal, but from what I've seen spending a couple of weeks at a Chinese mobile games company in 2017, Golang game servers are quite popular in China but not elsewhere (in combination with Unity for the game client and Protobuf for the message protocol). I have no idea why that specific tech combo is "big in China" but not outside.

AFAIK in the rest of the world, Photon Server is the most popular off-the-shelf solution (not necessarily for true MMOs though). Maybe that's good enough for most games to not wanting to write your own server backend (https://www.photonengine.com/)

Daegalus · 2023-10-22T03:44:55.000000Z

Well I am building an MMO Server in Go, and I am in the US. Probably because I am a backend/DevOps engineer and i do a lot of Go as it is. I did a small rundown elsewhere in this thread: https://news.ycombinator.com/item?id=37972638

Daegalus · 2023-10-22T03:42:54.000000Z

I actually am building an MMO server for an old MMO. And I am using Go and Goroutines for it. I have a small rundown posted here in this thread: https://news.ycombinator.com/item?id=37972638