Hacker News new | past | comments | ask | show | jobs | submit login
MMO Architecture: Source of truth, Dataflows, I/O bottlenecks and how to solve (prdeving.wordpress.com)
234 points by buba on Sept 29, 2023 | hide | past | favorite | 74 comments



The "cache, lots of cache" statement is the most true of any MMO architecture we can build. I did some optimization work earlier this year on a project where the single back-end server is now handling 2 billion requests per minute and had around 3TB of RAM for cache (I think the final production system was aiming for 12TB of RAM).

There's concerns around race conditions as you pointed out, message passing from client to server, and server to server, client hand-off between sharded servers. Those synchronization problems will haunt your dreams.

I think the biggest issue I still struggle with is tracking those ephemeral problems that only happen on one shard, or only when going between this shard and this shard, but not the other way. One useful trick is obviously message prioritization and different messages heading to different servers - though these days I'd put a message router in front of the shards and the router handles persistent connections other than the usual technique of direct connection I've employed in the past.

Contributed to an MMO game that involves waving light sabers around, another where you defeat the ultimate prime evil (though I was more on the fraud detection on that one), an unpublished MMO that unceremoniously died during the 2008 financial crash, a "shared world" game that involved animals, an open-world game that involves driving cars and running pedestrians over, a few "internet scale" websites, and am currently lead back-end on another MMO - though our database requirements are relatively simple this time around, but it is still again, read-at-start-up, write-only-when-necessary.


Let's say that among developers there was a history of how-to-implement gameplay traditions (like how to implement third person player movement, "gameplay ability systems", etc.) in programming languages besides C++. Like C# & Java, the memory managed friendly ones with good tooling. And let's say you're forbidden from reinventing C++ inside C# or Java, like Unity's Burst does (so called HPC#). But you can "do ECS," there are C# and Java ECS frameworks, that even use those languages respective arena allocation techniques well. You just aren't allowed to reinvent C++, but you can use a high performance middleware that does, like Netty.

Would you choose to author an MMO backend in one of those friendlier ecosystems?

Do you think there's value in having access to other Java applications, to embed as libraries of your grander "in memory" ideas?


On the backend, you aren’t limited to sub 16ms frame rates and can use languages that provide more productivity vs performance. I have infinite cloud compute, but you only have 4ghz and 16gb ram. I’ll send you highly optimized c++ and use ruby as my cloud language, it doesn’t matter.

You’ll frequently find Java, C#, Go, and Python as backend rpc/tcp game servers.


Interestingly enough, Jeff Kesselman was working on Java middleware for MMOs at Sun Microsystems before the Oracle acquisition. See https://en.m.wikipedia.org/wiki/Project_Darkstar

There's definitely some value in using more productive languages for the backend services, as they're not as latency sensitive as rendering.


How did they get the architecture so wrong on that "open-world game that involves driving cars and running pedestrians over"?


I assume that is a tongue-in-cheek description for Grand Theft Auto (whichever one they made online). That game series has had various moral discussions / controversy surrounding it since before the first game came out like 25 years ago.


I assume he got that and he was referring to the jankiness of GTA Online.

For example, it took 5+ minutes to load for years, until some random guy fixed it for them [1]. Though I suppose that had little to do with the overall architecture.

[1] https://news.ycombinator.com/item?id=31681515


It uses a client-authoritative peer-to-peer architecture, which has surely led to billions in lost revenue. I'm surprised selling GTA money has done as well as it has, considering anyone could get as much as they wanted for free.


Not Carmageddon?


> Not Carmageddon?

Carmageddon is not an MMO. :-)


Well, neither is GTA to be honest, at least not by default.


Could be APB honestly.


I wrote about some of these issues client-side, in a previous post about a Rust metaverse client. This is a much worse problem in a metaverse system, because there are no static game level maps. Every object in the world is in a database somewhere.

Second Life / Open Simulator makes a big distinction between assets, inventories and area state. Assets (meshes, textures, animations, sounds) are immutable, and are stored more or less permanently. (There's a garbage collection batch job that runs monthly or so) Those are basically files. There's a vanilla web service running on an AWS web server, and Akamai, both heavily cached.

Inventories are like file directories. They have asset UUIDs and some metadata (name, etc.) Those are in a database, but that data is dynamic and not cached. Each user has an inventory, of course, and it can be huge. 50,000 items are not unheard of. This is a metaverse; you can build stuff.

Area state is in server memory for each region. That's saved periodically, once a minute or so. This is a backup file, not a database. If you wanted a more continuous save process, you could keep a log of recent changes on a different machine than the server. After a crash, reload the server state and rerun the recent changes. Area state is under a gigabyte per region (A region is 256x256 meters).

With a three level system like this, none of the levels are severely overloaded. The greatest data volume is from the asset store, and because that's immutable, it can be and is cached extensively. There are three levels of caches - asset server, CDN, and client. It's still a problem getting assets out to the clients fast enough, but with prioritization and concurrency, that's solveable. The inventory database is mostly-read, so the usual scaling techniques for mostly-read databases work. Area state is in memory. The main trick is taking a clean backup without visibly freezing the system.


Oh I feel for the author. I’ve made games. I’ve made distributed web platforms. I have profound respect for modern mmo architectures because of one, nasty, “I wish this wasn’t a thing” class of data. State. Who, where, what animation, what modeled entity, is in my party, on my screen, under my axe. Synchronized playback of my swing to my party members so we all yell in excitement at the same time when the boss falls. This level of synchronization across shards (server clusters of servers) is enormously complex. Not to mention just writing “net code” in general. Network speed is the biggest issue and often TCP isn’t enough. You need network prediction. Where will they be based on position, direction, etc until I receive the next packet. I can then error check the prediction with the actual and correct. If UDP is available to you, you use it so you can deliver that state as fast as possible, with no ACK back and forth. A combination of UDP state transfer peer2peer for animation and basic state, TCP network connections for services and server state, REST for that auction house. SQS or pub/sub for that item delivery and party/match making/world chat. It’s a beast of a problem.

Rewind 15-20 years ago and all the folks who wanted to make a game, their first game, and they want to build an mmo. None of them succeeded. Not 1. The only ones since were from people who knew the ask. Or had a crowdfunded ponzi scheme.


> Rewind 15-20 years ago and all the folks who wanted to make a game, their first game, and they want to build an mmo. None of them succeeded. Not 1.

I guess it depends on how you define success, but I would posit FOnline [1][2] as a success story. FOnline is a fan made MMO of Fallout by a single guy, using the assets of the original Fallout 1 & 2 single-player games. Having these assets and also general game mechanics already finished definitely played a huge role in getting it to a playable state in reasonable time. Still, FOnline is a from scratch code base not a mod of the originals. Also it changed plenty of mechanics too, most notably being real-time while the original games were turn-based.

It's still being pushed forward even today after 20 years of development by this one guy but it was playable in late 2000s already. Peak concurrent players that I remember seeing was a few thousand. Definitely not AAA level, but way past simple multiplayer. Would have gone higher due to the hype at the time, but the server started to really struggle at that point. After a few years of being a closed source free game it got converted into a SDK and spawned a dozen new fan games using that engine.

Perhaps even more importantly, it was extremely fun in the early days. PvP gained you experience and all the other player's loot. Later on the PvP was limited due to PvE lobbyists, but perhaps it made the game more fun for PvE lovers.

Here's a random screenshot from my personal archive that shows a bunch of players on the screen at once. [3]

In any case, I view it as a great example of a single person MMO success.

--

[1] https://fonline.ru/

[2] https://falloutmods.fandom.com/wiki/FOnline_Engine

[3] https://imgur.com/9sMJNE5


While I commend his efforts, it’s not even remotely MMO. A few thousand can be handled by one server. One beefy server, but one server nonetheless.

I’m talking about 10,000+ players. Where you need clusters of servers and synchronization techniques.

There have been some small multiplayer games that have tried to pass as mmo’s but without the player base in the 10k+ range, you never encounter certain classes of engineering problems.


Going by your logic. Nothing is an MMO. Minecraft is a significantly more demanding game than some cooldoen based ability activator where you can only hit things you click on or hit with an area attack and yet it is possible to have thousands of players on one big server.

Your "synchronization techniques" pale in the face of destructible terrain.


Except in Minecraft, the world is voxel so to sync destructible terrain I only need to send the x,y,z of the block removed.

Going by my logic, there are MOG’s and there are MMOs. Guild Wars 2, World of Warcraft, EverQuest 2, games where thousands of people can be on screen at once, where that server instance is synchronized with the rest of the cluster, for one seamless virtual experience. I’d say half of the self-proclaimed mmos ever really reach massively-multiplayer status. Games like Dota 2 and CS2 are played by millions but I wouldn’t say it’s an MMO because each match it’s 5v5.

Realm vs Realm or World vs World mechanics explain my perspective perfectly.


> World of Warcraft

The original release of World of Warcraft had a limit of a few thousand per server too. It's only much later that this got increased. In the classic MMO era there's really only EVE Online that pushed to 10k and beyond, and never in the same star system. Single star system was limited to around 500 people for the game to still be playable. It's only later that they added time dilation which allowed for thousands to be in a single system at once.


Yeah and Ultima Online was the same. I’m not talking about 1999’s definition of an MMO, when people were still using dialup.


None of those MMOs can handle thousands of players on the screen at once.


> Rewind 15-20 years ago and all the folks who wanted to make a game, their first game, and they want to build an mmo. None of them succeeded. Not 1. The only ones since were from people who knew the ask. Or had a crowdfunded ponzi scheme.

This is an odd statement given that Ultima Online can out in 1997 and WoW came out in 2004. Are you referring to hobbyists?


Both of which were developed more than 20 years ago. I also think they knew the ask since they made games prior. I can see how words are confusing and math is hard.


RuneScape might be a better example. They had made hobby games, no professional experience, and launched a massively successful MMORPG.

If you are arguing they made it more than 20 years ago, so it was easier, I'd like to learn more about how things degraded in the early 2000s. I assure, I am not confused by words or math. :)


> the I/O bottleneck in the database

I thought most MMOs kept everything in memory and only offloaded to the database periodically. I clearly remember rollbacks to fix times (XX:00) when things went down.

Edit: Sorry, should have read the whole article before commenting.


Exactly, there is no way the state can be persisted on absolutely every change. It has to be periodically dumped to the database.


Speaking about WoW specifically, since I'm not familiar with the others, I've always been curious about their quest system. Specifically keeping track of what are available for the character efficiently along with the event system to flag quests as completed, etc.

There's so many of them. I have to assume they're spatially limited. You enter a zone, or an area, and the system loads up all of the quests located in that space, then it runs through to determine whether you qualify for them.

As for quests that you're on, that's a bit more straightforward, since you're so limited to how many you can carry around with you at any one time. Then, every event can practically just be brute forced across your pending quests to see which ones get progressed, etc.

But it was always a curiosity to me considering the magnitude of the quests available how most anything can trigger quest progress.

There's also the whole achievement system, which perhaps is similar in design.


> Specifically keeping track of what are available for the character efficiently along with the event system to flag quests as completed, etc.

This is not necessarily that difficult, at least the first part. A lot of games will have quests be given out by a 'quest giver' character of some sort, or they will activate at specific interaction points on the map. You can do some cheap 'has-player-finished-quest' type of checks to determine if for example the quest giver has some sort of UI to indicate they have a quest available that activate when the quest giver first comes into view range. Quests with more initial conditions can hide their checks behind the interaction with the quest giver.

Doing quest progression can be a bit more challenging. You need to determine when to do the checks for progress, and also how comprehensive you want them to be. The more complex the check, the less often you can run it without affecting game performance. I've seen designers use all sorts of tricks depending on the specific quest. Interaction volumes that run checks, periodic ticks, on entity flash messages..etc.

> Then, every event can practically just be brute forced across your pending quests to see which ones get progressed

This only works for games that have a small number of active quests and not a lot of events. And with MMO's, you really need to be considerate of the accidental quadratic performance problem.


Sorry in advance for not answering right to your question, but you may check sources of ManGOS/TrinityCore/family WoW servers for that.

In short, from what I know, yes, quests are stored in "quest log" fields of character data in server DB, and they are tracked by the clients and checked by the server. Some simple auto quests like "find this item" are not even tracked by server and only stored on completion. Since both client and server have all game data, the client knows about all possible quests and only shows to the player what is appropriate at a current state.


You missed the opportunity to talk about WoW private servers like Trinitycore.

Trinitycore emulator can handle 10k+ players on a single server.


I 've seen those so called 10k players per server and in reallity it just does not work and it's pretty much a lag fest. You should see how those servers run in China ( where private wow server are very popular ).


Is it the server that's laggy or the client?


I don't understand how they are able to replicate all the interaction rules. There's a bunch of things going on in WoW, and unless you did a heck of a lot of experiments how would you uncover the rules governing all the interactions between different things? For instance you can bubble hearth on classic but not in hardcore. How is an external server developer going to know that?

Also how do they place all the NPCs and hook up the quests and lot tables? Do they scrape wowhead?


(Trial + error) * hardcore players.

Yes they scrape as much as they can, but the originals of these (almost as old as WoW itself) were absolutely not 1:1 behaviour. By hardcore players noticing the minute differences, taking the time to detailed bug report, then the admins/devs looking with care, they've come pretty close.

Never underestimate a group of nerds with a passion.


use to play ragnarok in private servers, the explanation is simple, they know, because you know, they are super fans


Eve Online does this for 20years now.

the size of 'big engagements' that are still playable has gone from a few hundred to a few thousand in this timeframe. thou players tend to hit the ceilings and "playable" is a matter of opinion.


to be honest, i know nothing about private WoW servers but i promise i'll check it out!

Thanks!


azerothcore is probably the best and most polished, if you don't mind wotlk


the last expansion that was any good?


Depending on who you ask.

Tangent: Imho, the only reason it is good is because it's not as grindy and / or the community just didn't put as much emphasis on min-maxing things. GearScore was a thing of course, but theory crafting wasn't anywhere close to what we have now.


Not sure that's true for me, very good raids (minus the trial). Ulduar and Icecrown being the highlights for my guild. Though didn't mind the single boss ones either. Trial was a little janky though but the fights were fun. The big progression guild we were part of up through burning crusade wanted to leave for greener server pastures, a handful of us were kinda done with hard progression and the cuthroat nature of it. Picked up a few more and did 10mans/hardmodes mostly. Hooked up with another guild like us for 25man but we only cleared those hard modes once.. 25man was still a management nightmare...

Played through pandaland, skipped warlords, came back for some of legion and then finally broke the habit.


Early MMOs (WoW, Asheron's Call, Everquest, Daoc) were very impressive in terms of distributed computing.


They were not really distributed though, I think one of the first that really started was Guild Wars 2.

https://ubm-twvideo01.s3.amazonaws.com/o1/vault/gdc2017/Pres...

DAoC was basically a Linux box with a bunch of processes connected to MySQL.

https://www.gamedeveloper.com/disciplines/postmortem-mythic-...


Would this count? https://www.gamedeveloper.com/design/classic-postmortem-i-as...

> One the most impressive features of the Turbine engine is the continuous outdoor environment. This is made possible thanks to dynamic load balancing, which is a scalable serverside architecture. The easiest way to appreciate the need for dynamic load balancing is to consider the following scenario.

> Dynamic load balancing solves this overloaded server problem. Instead of assigning a static geographic area to each server, the individual servers can divide up the game world based on the relative processor load of each server. In the previous example, instead of remaining idle, all four servers would divide the load equally among themselves, ensuring the most efficient use of the hardware’s processing capacity.


That's a useful technology. Second Life / Open Simulator do not have that, and need it. It's good to hear about a success with that approach.

Improbable tried that, dividing the world into regions but moving the region boundaries around based on player density. This worked, but apparently required huge amounts of inter-server traffic. The system was too expensive to operate. (Running it on Google Cloud with metering for every client/server transaction didn't help.) Five indy free to play games, some of them good (look up Worlds Adrift), went bust because of server cost.

Improbable then pivoted to simulators for the UK military, a much less cost-sensitive market. That worked, but they had way too much company and funding for that niche. Then they tried to pivot to crypto metaverses, two years too late, and hooked up with the Yuga Labs (Bored Ape, Otherside) crowd. Lately, they're trying to do something with US Major League Baseball. Their solution to the cost problem is to only run special events that last a few hours, for which they can short term rent some huge number of servers from AWS or somebody.

There's still no good off the shelf solution for this kind of scaling, with big worlds and big moving crowds. Epic and Roblox were making noises about working on this problem a year ago, but not much has been heard recently. Now both are in money-losing and layoff mode.


To be fair, GW2 was able to do that by instancing the world per zone, with loading screens to switch server connections, rather than having a seamless open world.


And the techniques in the article are basically how we did it. (I say “we” but I’m just a system administrator who got to work with some cool people at Turbine.)

As Eumenes notes, the Asheron’s Call engine was significantly ahead of its time with the seamless zoning. The cost was high, though — we needed quite a few servers to run a world, I think many more than our competitors. There are business reasons why we weren’t quite as cost-conscious as we perhaps should have been.

The other factor involved in determining how often you persist is item duplication. If it’s possible to transfer an item between players without persisting state, and if there are known exploits that crash servers (not world, but individual servers), you wind up with an exploit that can duplicate items. But I’m sure that’s just hypothetical.


I love reading the source code of WoW private servers. They are not the official code but for example TrintyCore is quite nice c++ code.


The WoW development diary talks abit about their server code and how they handle the load.


Do you have a link? I’m intrigued.


Its a book https://whenitsready.com/wowdiary/ originally sold as a kickstarter by one of the original Blizzard mappers made from notes they wrote while working on original wow. They did some AMAs[1][2] with other wow developers that sort of touch on some of the topics discussed here.

[1] https://www.reddit.com/r/wow/comments/9huows/ama_former_wow_...

[2] https://www.reddit.com/r/classicwow/comments/9fb2bo/john_sta...


WoW is not in the same league as EQ, AC, DAOC.

WoW is a 2nd or if you count Meridian 59 as gen1, 3rd generation MMORPG.

It was not the 1st that had seemless maps, but IMHO it did that best. EQ2, while not having seamless maps, would be in the same league as WoW.


Does anyone know any more good resources for designing an MMO architecture? Would love to do a MMO as a side project but a bit daunted by the unknown of architecture development.


I made a video series on networking theory for virtual worlds that has been well received: https://youtu.be/0wOZusuMIIM

I've also been working on an engine for the past few years if you want some code examples: https://github.com/Net5F/AmalgamEngine


Wow that series is really good! Please do more, you explain things very clearly and the visuals are easy to follow.


I'd wager a guess that getting players will be the most difficult part by far, at least in the beginning. Make an MVP and focus on building a playerbase first, then come back to architecture when you're suffering from success if you get that far.


> then come back to architecture when you're suffering from success if you get that far.

Then it will be too late because you will essentially have to rewrite half of your project while your userbase is leaving due to unplayable game. Better to make good architectural decisions from the start, and make small optimizations when needed.


Just keep it simple and avoid optimization like the plague. you can spend 1 month building a prototype or >3 months perfecting a single piece. Just know that it is all smoke and mirrors and don’t worry about that.


The basic advice in the article is sound really, make writes to db very async and keep all your state in memory.

Also important for most online games: any core gameplay relevant actions need to be server authenticated - usually you do RPCs from client to server, resolve the result, then broadcast it to the relevant clients.


I've some other posts planned about this topic, I don't know when or even if im going to deliver, but you are free to follow the blog and receive the update if I ever do.


In most data architectures the DB is only the backing store, because no matter how fast your database is it's going to be slower than RAM.

Once you start caring about the performance the second thing you do is stick a cache layer of one sort or another in front of the database; the first thing should be making sure you have the correct indexes.

In any case, it sounds like a distributed cache problem. I wonder if you could just abuse redis for your game backend?


You can use redis or memcached, but every MMO or online game I've been involved with, unless it was a "web game", has eschewed those for the most part. The game server maintains the state, knows all the objects in the universe, or at least its portion of the universe, and is responsible for retrieving and updating those objects. Even redis and memcached would be considered slow by comparison. Those game objects/world objects/MOBs may eventually be pushed out to a key-store server, but generally are not. The only portion of the database on any MMO I've worked on that has cared about "proper indexes" has been the area dealing with account retrieval. Traditional databases, at least on the non-web MMOs I've been involved with, when it comes to game state, are not normally used. RDBMS are used for boring things like account management, customer management, and so forth. Our database on the current (non-web) MMO uses a few more web technologies than I have in the past for this particular problem, but once the shard is loaded, and the user is connected, it is back to tradition, for the most part.


What protocols are used to stream updates to client? Or it’s simulation on client and state dump on fails?


may help to read the article, redis is mentioned


Interesting aproach to data ownership philosophy, I/O techniques and source of truth fuckery in MMO-like systems


I liked the article! Do you know of MMOs that try to use database techniques like write-ahead-logs and log-sequence-numbers for persistence/replication? The nice thing about these techniques is that you can replicate state in a consistent way - so you could have multiple game state services all providing equivalent reads


Are you the author?


yep


…and you wrote a comment complimenting your own article?


haha, no I think they were giving a heads up to potential readers as to what they think is the interesting or unique points in the post. Still funny though :)


Yep, I noticed. It's an interesting approach tho xD


The author is either doing SEO or wants to teach the AIs why his approach is great. :-)


>And sooner or later, we will hit hell, an evil that lurks behind every MMO, the I/O bottleneck in the database.

When I worked on a MMO with a five figure concurrent player count we got by fine with a single database server.

The much bigger I/O bottleneck were with the load balancers.


DaFluffyPotato made a pretty good video on simple multiplayer game dev recently https://www.youtube.com/watch?v=_hh7Oe1ohQU He even goes over cost based on CPU usage.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: