Ask HN: What is the ops architecture like for AAA multiplayer game servers?

lwansbrough · on Sept 8, 2019

Halo 4 and 5 use a project developed “in house” at Microsoft called Orleans. Roughly speaking it is similar to Akka. It’s an actor based distributed system which attempts to hide the implementation of distributed networking. In essence, each match and each player get their own “network attached“ object (a “grain” in Orleans terms.) So:

    var player = new Player(123);

is actually instantiated _somewhere_ on the network (not necessarily locally on the calling server.) Then operating on that object, such as:

    await player.FireWeapon();

tells the server (“silo” in Orleans terms) which owns that player instance to invoke that method on it.

In this way, it is very easy to quickly update game state without obsessing over the throughput of a single machine.

ryanjshaw · on Sept 8, 2019

Orleans provides higher level abstractions than Akka. It is literally a plug 'n play distributed application kit and consequently very opionated.

It took me quite some time to shift my mental model and decide I prefer it over Akka. It's also worth noting that Orleans is open source. There is also an excellent operational dashboard named OrleansDashboard.

dillonmckay · on Sept 8, 2019

https://github.com/dotnet/orleans

berdon · on Sept 8, 2019

The dashboard is visually great but misleading in functional usefulness. Much like any distributed computing framework, it’s marginally useful, at best, to view a single computation’s flow or result. The dashboard shows grain (actor) success/failure calls but as there can be any number of typographical arrangement of grain execution calls it really is mostly an aesthetic dashboard.

I’m only pointing this out because it’s easy to spend a lot of time trying to use the dashboard as a dev tool instead of focusing on good logging techniques. There has been a lot of great work put into the dashboard and I certainly don’t want to take away from that.

berdon · on Sept 8, 2019

I’m pretty sure Orleans is only used for the score board (~90% confident).

I use Orlean’s, have contributed to it, and have interacted with the dev team quite a bit.

SeanBoocock · on Sept 8, 2019

We used Orleans for our service framework for Breach, a 4v1 asymmetric action RPG built with Unreal Engine 4. While all the real-time simulation was implemented authoritatively as part of an Unreal Engine 4 server, we used Orleans/SignalR for the service backend. Services included: matchmaking, (voice)chat, character/account persistence, quests, microtransactions, customer service, and friends among others.

It's a great framework and I hope others adopt it for backend services. My only complaint, and it isn't particular to Orleans, is that there isn't a lot of examples that demonstrate best practices for building large applications with it. There is definitely a "right" and wrong way to build with Orleans and it can take you a while, and a lot of refactoring, to discover that.

berdon · on Sept 8, 2019

I actually started putting together a book over Orleans’, it’s use, and best practices. I should probably finish it.

y0y · on Sept 8, 2019

> which attempts to hide the implementation of distributed networking

To be fair, this is precisely what Akka doesn't do, often citing A Note on Distributed Computing (1994) which explains why that approach is problematic.

ezekg · on Sept 9, 2019

Halo 5 has some of the best net code I've encountered recently. The hit detection really is top notch compared to other FPS games.

ididthisforXLM · on Sept 8, 2019

This is so sick. Wonder what was done for Halo

efokschaner · on Sept 8, 2019

Not FPS specifically, but you might enjoy Riot Games' tech blog https://technology.riotgames.com/ which has articles on a variety of game technology things such as service deployment, network infrastructure, game performance monitoring etc.

(Disclosure: I work there)

ptrott2017 · on Sept 8, 2019

Riot games tech blog is an awesome read. The fps (as in frames per second) performance monitoring on league of legends post recently was a great example of why its worth a read.

godelmachine · on Sept 8, 2019

I thought by FPS, the OP meant First Person Shooter

martijnarts · on Sept 8, 2019

I think that's true and gp was clarifying that the post he was referencing is about the other fps.

agersant · on Sept 8, 2019

Hello fellow Rioter!

Thorrez · on Sept 8, 2019

There's a slidedeck and presentation that's been posted to HN a number of times about Call of Duty's servers using Erlang.

https://www.erlang-factory.com/upload/presentations/395/Erla...

https://news.ycombinator.com/item?id=14120506

https://news.ycombinator.com/item?id=2671755

https://vimeo.com/26307654

dalailambda · on Sept 8, 2019

I can highly recommend this talk by Respawn, Multiplay, and Google on how Titanfall 2 does multiplayer server management. It's geared more at the infrastructure side as opposed to actual dev but worth a watch: https://www.youtube.com/watch?v=p72GaGq-B_0

markmandel · on Sept 8, 2019

If you want to learn more about this in person, in 2020 there will be the first "Online Game Technology Summit" at the Game Developers Conference.

(Disclaimer: I'm one of the summit advisors)

We're trying to grow the education in this space of game development, because currently it is very sparse. AFAIK, this is the only event that is dedicated to the technical aspects of online, connected or multiplayer games.

Details, and CFP, which is currently open: https://www.gdconf.com/summits/c4p

druerridge · on Sept 8, 2019

So happy to see this happening! I've also been advocating for more community development and knowledge sharing in this space. I believe there's a huge unfulfilled need to connect the industry more on these topics. I've also been organizing an event in the space, and looking at the possibility to land a special interest group in larger organizations that can support our community. I'll drop you a line!

midniteslayr · on Sept 9, 2019

I'm probably gonna submit a talk to this summit this year! Been going to GDC as a volunteer since 2005, I've always wanted to talk more about online services development for sometime. So glad to see this happening!

60654 · on Sept 8, 2019

I'm very glad to see this summit is happening. I missed GDC Online after that got discontinued a few years back.

druerridge · on Sept 8, 2019

Spent half a dozen years working in AAA realm for multiplayer games and an equal amount of time working on my own indie projects, and as you can see already in this thread, there's really two (often confused) pieces to this conversation. First, there's Multiplayer Gameplay Engineering. This is typically a single process handling the 2~64 people shooting each other in a single game. Second, there's Online Services Engineering for Games. This is typically orchestrating the above process, hundreds or thousands of times, as well as things like Matchmaking systems, party systems, Storefronts, etc.

Below are two articles which I think can be really valuable "Baby's first" for the topics of Multiplayer Gameplay Engineering and Online Services for Games. I wrote them beginning from my perspective of approaching these problems my first time as a college student many years ago, and continued with how my approach evolved through experience. I also include lots of links to seminal articles/talks on the topics that I read as I attempted to "do my homework" while making my own indie game.

Multiplayer Gameplay Engineering https://www.gamasutra.com/blogs/DruErridge/20181004/327885/B...

Online Services for Games https://www.gamebreakingstudios.com/posts/dedicated-game-ser...

dijit · on Sept 8, 2019

This question sounds like it's pointed directly at me.

However, I can only speak for one AAA gaming company, and my team operate a bit differently than most in the company.

My team operates the infrastructure for "Tom Clancy's The Division" video game series (1&2).

Most of the programming approach is spent on doing the cheapest (in terms of CPU) possible thing, everything is C++

Things like: matchmaking will happen ideally on a single machine with no shared state, everything will happen in memory, which is much faster and can be more reliable than any distributed state or failover.

(it's less reliable if you're in matchmaking and the server or service dies; But then everyone's client will reconnect to the newly scheduled matchmaking instance and populate in memory state.)

We use a lot of windows, nearly every machine that doesn't handle state is a windows server. This has pros and cons, from my ops perspective I try to treat windows like cattle, but windows doesn't like that. they have their own way of operating fleets of machines which include using SCCM and packaging things. There's nice GUI's, but we use saltstack and we removed AD, because it was a huge burden to: create a machine, link it to AD, reboot it, finally get a machine worth using.

From a dev perspective, Windows is good, IO Completion Ports is superior to the linux epoll in terms of interface and performance, so we can have machines that take 200,000+ connections.

How you decide which dedicated server you connect to is up to your client, it does a naive check as it's logging in where it will do a TLS handshake with a random gameserver in a region, for each region. (during the login phase we send your client a list of all currently active gameservers and an int to represent the region).

This works fine until there's packet loss on a particular path because your single ping might be fine but overall your experience could be better elsewhere; if you're not able to ping anything then we fall back on geoip.

That said, if you have friends on another datacenter, we try to put you on the same server. So that if you join groups or whatever then it's just a phase transition rather than a full server sync.

Everything is orchestrated with a "core" set of servers which handle player authentication and matchmaking centrally, then each of the gameserver datacenters (geographically distributed to be closer to players) is attached via ipsec VPN.

In Division 2 we spread out into GCP as well as having physical machines, so we developed a custom auto-scaler. The autoscaler checks the capacity of a region and how many players are currently in a region, keeps a record over 10 minutes and makes a prediction. If the prediction goes over the current capacity in 20 minutes or less, it will create a new instance (since making windows servers on GCP takes longer than linux servers)

If the prediction goes lower than the capacity of a server, it will send a decomission request to the machine, which takes up to 3hrs to complete (to give people time to leave the server naturally).

Idk, I've been doing this for 5 years now so I can talk at length about how we do it, but ultimately out biggest challenges are the fact that we can't use cool shit or new trends because latency matters a lot and we use windows everywhere.

--

As an aside; the overwhelming majority of other ubisoft games (excption: For Honor) use something very similar to what we released open source in collaboration with google to do matchmaking: https://agones.dev/site/

Slartie · on Sept 8, 2019

What is the rationale behind using Windows on the servers? I'm wondering because the argument I usually hear is that someone high up in the ops team hierarchy declares "but we need everything in the AD, including all the servers, otherwise I can't sleep well". In your case however this apparently doesn't apply, as you do not register them in the AD.

Is it so the game devs don't need to write cross-platform server code?

dijit · on Sept 8, 2019

Real reasons for:

* One platform to develop on; back-end coders tend to go back and forth between client and server programming.

* Faster iterations. (just hit F5 in visual studio)

* IOCP

Stupid reasons for:

* Old IT Director denied the use of virtualisation software. (mac address randomisation wasn't great and it caused a switch crash if two people had the same mac)

* Windows licenses are really cheap compared to developer time. (until we went to cloud, where Microsoft charges insane amounts for licensing)

speedplane · on Sept 8, 2019

I no longer do windows development, but I miss it. It's strengths are greater than you portray.

The C# / .NET libraries are really beautiful and well thought out. I most often program in Python, and while the Python language is more concise and elegant than C#, the Python libraries are not nearly as well organized and consistent as C# / .NET. Java libraries are better than Python but worse than C#, and the Java language is worse than both C# and Python. As far as dev enivronments, Visual Studio is far better than Eclipse and roughly equal to Python's PyCharm. Because C# is compiled, it's far more performant than Python and a bit better than Java.

The downside of course, is that once you start down that road, you are pretty locked into the MS world.

If I was starting a new project, and I was a scrappy startup, I'd probably go with Python. But if I had resources (ie $$$) to support MS licenses and performance actually mattered, I'd go with MS. Side projects for experimentation may lead to node or rust. But sorry Java, can't really imagine a scenario where I'd start with you.

brianpgordon · on Sept 8, 2019

I share your high regard for C# and the .NET libraries but not your disdain for Java. Java the programming language is pretty ugly but the JVM is robust and there are other languages which compile down to Java bytecode. In your "scrappy startup" scenario I would pick Scala or Clojure long before Python, mainly because I'm skeptical of the idea of using a dynamically typed programming language for large-scale backend development - for human scalability reasons, not performance ones. Personally I'd also have to pass on Windows even in the AAA bigcorp scenario, because I prefer the tooling and open ecosystem around Linux and epoll isn't that much worse than IOCP.

dfinninger · on Sept 8, 2019

Python supports gradual typing now, and it's made the experience for me much more manageable. Lots of good IDE integration too. It's not a replacement for Java's or Go's (etc...) type system, but it really increases the amount of Python you can support given the human scalability constraints.

scarface74 · on Sept 8, 2019

I was an exclusive Microsoft developer for a little over 20 years. It wasn’t until I started doing cloud deployments - like the original poster implied - that I started avoiding Windows as often as I could. Once you add Windows to the mix, everything gets worse - startup time, resource requirements, automation, snd costs.

Luckily, you can do C# without having to use Windows with .Net Core.

brianpgordon · on Sept 8, 2019

Ooh, you may be the perfect person to ask a question that I've been wondering about. Is .NET Core on Linux viable yet for serious use? I've done some tinkering with it and gotten a project working on MacOS but I did the development in Visual Studio on Windows. Is it realistic to do your development in Rider, with the open source version of msbuild for CI, and have it perform reasonably on Linux in production, without ever paying Microsoft anything?

throwaway8941 · on Sept 8, 2019

The kind of development we do probably wouldn't count as "serious" here on HN, but since the other comment didn't really answer your question, I'll pitch in. We've been developing using .NET Core/Linux/Rider for the past two years. It's been pretty great. Once you get to know the IDE really well, I'd say Rider beats VS in terms of functionality and usability.

.NET Core was developed with CLI usage in mind and common operations (like migrations/boilerplate generation) can be driven from CLI or Rider, you don't need VS for that.

You may have some problems with 3rd-party libraries. Not all of them have been ported to .NET Standard. I'd recommend checking your project's dependencies in advance. (We did have some problems with legacy COM-based cryptographic libraries and had to build a separate "microservice" hosted under Windows to offload this stuff to. COM stuff works fine under .NET Core, but only under Windows (obviously).)

scarface74 · on Sept 8, 2019

If you are just running a few static Windows servers the slight cost benefit of running Linux for .Net Core is really not that great. If you are doing anything more dynamic where you are rapidly bringing “servers” up and down is where you see the real cost and performance benefits. Of course you can’t run Windows instances with either Fargate or lambda but even with regular ECS (Docker) or autoscaling EC2 instances where you can, Windows costs more, takes more resources, and is slower to launch.

scarface74 · on Sept 8, 2019

My “Linux deployments” with C# have been Docker and Nginx with Fargate (AWS Serverless Docker) [1] and Lambda. I haven’t had to worry about performance, stability, or scalability. With either technology, you get scalability for free and with Fargate you don’t have to worry about cold starts.

I develop on Windows with Visual Studio and deploy to “Linux”. We use CodeBuild to compile and create the zip file (lambda) or Docker Container (Fargate).

CodeBuild basically is “Serverless builds”. It launches s prebuilt or custom Docker Container and you just run .net core “publish” command line and it gets all of your Nuget packages and then you run your standard packaging commands.

But Rider should be good. I’m a huge fan of R# and you can cross target Linux, Mac and Windows from either host using msbuild.

hermitdev · on Sept 9, 2019

Curious, how've your cross platform .Net Core experiences been? I've not done much with Linux, but when I was benchmarking writing to MS SQL with the same binary on Linux & Win10, .Net Core had about half the throughput to a remote SQL server instance than Win10. Not sure if others have seen this, but .Net Core seems to run significantly slower on linux vs Windows, in my experience.

scarface74 · on Sept 9, 2019

Most benchmarks show either similar or better performance overall.

https://stackoverflow.com/questions/44334125/net-core-on-win...

https://robertoprevato.github.io/More-about-Linux-vs-Windows...

But there are reported issues with sql:

https://github.com/dotnet/corefx/issues/24480

james-mcelwain · on Sept 8, 2019

Is C# more performant than Java? It was my understanding that HotSpot's JIT performs significantly more optimizations, almost bordering on esoteric magic, compared to the CLR.

I can understand how value types would help a lot, but it is possible to optimize Java to avoid object headers on the heap at great pain, i.e. avoiding classes and doing index arithmetic on flat arrays of values. Value types coming to the JVM will obviously help here.

Curious to hear about other advantages the CLR has.

Havoc · on Sept 8, 2019

As a complete noob yet somehow armed with a VS enterprise license...could you elaborate on

>* Faster iterations. (just hit F5 in visual studio)

Is this some sort of code sync to servers/cloud?

the8472 · on Sept 8, 2019

Does io_uring change the equation a little?

dijit · on Sept 8, 2019

Yes, but I can't talk about future developments.

sufficed to say: my job is getting easier in future (thanks mostly to Stadia)

vvanders · on Sept 8, 2019

Kinda surprised to see some one say a positive thing about Stadia. I did a fair amount of network programming back when I was in the industry and I just don't see how it can work.

I see enough congestion on just the local 2.4Ghz spectrum out here it doesn't really matter how fast the data center is. Without some form of dead reckoning you don't have much room for error.

dijit · on Sept 8, 2019

Oh, certainly. It's a serious undertaking for everyone, I'm skeptical to how it can work when we're often measuring screen latency and input latency in nanoseconds. (and obviously not getting sub-millisecond latencies on those significantly faster/less congested subsystems).

However, Stadia is Linux, so if you want to be on stadia, you need a linux version of the game, which means primitives need to be ported. Once primitives are ported for the client, they can be optimised for the server, then you have a linux gameserver. Which is good for me.

godelmachine · on Sept 8, 2019

>>(until we went to cloud, where Microsoft charges insane amounts for licensing)

Did you move to Azure?

dijit · on Sept 8, 2019

No, it's cheaper on Azure (and they give us some huge discounts because we're so MS heavy).

But they really gouge the other cloud providers (GCP/AWS) on licensing, more than a third of our infra costs are just windows licenses (when looking only at cloud).

kamyarg · on Sept 8, 2019

> more than a third of our infra costs are just windows licenses

Wow, that is brutal.

dom96 · on Sept 8, 2019

What I'd be curious to hear about is how your servers handle simulating the game world. How many players are simulated per world and what are some of the tricks that you use to ensure the simulation doesn't fall behind, or if it does how do you make sure the clients don't notice?

I'm currently writing a (much much simpler agar.io-like) multiplayer game and finding it quite challenging to make the experience smooth, I haven't implemented any lag compensation yet though so maybe that's why but I wonder if that is the only problem. Do you or anyone else have any resources that could help me?

exlurker · on Sept 8, 2019

I found this article very enlightening: https://www.gabrielgambetta.com/client-server-game-architect...

dom96 · on Sept 8, 2019

yeah, this is where I got the knowledge about lag compensation from :)

druerridge · on Sept 8, 2019

Speaking to language usage (just as one topic) for Online Services for Games. I now contract in the games services space (after half a dozen years at a AAA studio) and have gotten to see a bit of an overview of how folks approach it. For small and medium studios, I see a lot of Python/Django approach to the problem which is quick to develop in and has a lot of problems solved for you. I also think Python is a frequent choice because a lot of the engineers are pulled from gameplay engineering, and python is familiar to a lot of folks from scripting their tools or plugins (e.g. Maya). Node/express pokes its head out for some small studios as well for similar reasons to Django. Go is increasingly popular for both AAA and indie companies for pretty obvious reasons (online services is literally what it was built for). C++ and then Java/Scala are common choices for older and more established studios.

In my experience, all of these languages are fairly well equipped to the problem space - even the interpreted/garbage collected ones can handle these problems well as they are (in my experience) infrequently CPU bound, and frequently the scaling considerations come in around your usage of databases and caches (network or IO bound). When you have enough players to hit CPU boundaries, you also typically have enough money to buy some pretty beefy CPUs. The caveat to this is that concurrency gives huge wins, since a lot of the requests/data are not related to each other (beyond the 10-20 people in a game) and so languages or frameworks that don't solve concurrency and/or async IO well are at a disadvantage. This puts Python/Django and Node.js further down the totem for me. Of course, there are ways to resolve those problems even within those languages so they are far from ill-equipped.

When I get my choice, I choose something garbage collected in a VM, with mature web frameworks (basically Java & C#) for online services in games. Knowing an engineer's errant null pointer isn't going to tip over your whole process is pretty handy, and the CPU gains in C++ or Go (native code) aren't put to as good of use here as they are in gameplay engineering where you're trying to squeeze cycles out of client machines. Client machines don't really get more powerful the more players you have, but your servers can, haha.

mninm · on Sept 8, 2019

What's your take on using AWS Flexmatch & GameLift for matchmaking and autoscaling? Do you see any trouble spots in their approach?

bpicolo · on Sept 8, 2019

If they can do all of their matchmaking on a single box, doesn't seem like it's a big pain point.

yomly · on Sept 8, 2019

What is AD? Sorry - hard to google...

lubos · on Sept 8, 2019

Active Directory

cbartlett · on Sept 8, 2019

Some network related info from the pubg devs

https://steamcommunity.com/games/578080/announcements/detail...

ezekg · on Sept 9, 2019

This is a great read. I always wondered if AAA battle royale games did something like that, since updating distant players every frame, especially if they're behind the local player, seems like a waste.

godelmachine · on Sept 8, 2019

Not related, but relatable.

I often wonder between Games and Enterprise Software, who has a more complex backend architecture?

At the outset, it seems like Games are complex as hell, in the sense that when a gamer shoots another gamer dead, there has to be freeing and re-allocation of processes/ resources, and with tens or hundreds of persons playing the same game over the internet , both the gaming architecture and network infrastructure’s gonna be complex.

Would someone help me out here?

Slartie · on Sept 8, 2019

It is a very different kind of complexity. Games are super complex in a very limited set of boundaries (the entirely virtual world of that single game) and have very rigid timing requirements, but usually the requirements to a "released" game don't change that much anymore. Enterprise apps usually implement stuff that seems simple (buy some item in a web shop), but gets complex due to a nearly boundless world which makes it necessary for them to interact with nearly every other system out there (which also changes all the time after release, just like the requirements to our app do). Thus the complexity of enterprise apps usually can be found in their interfaces to other systems (including humans) and in the way in which the app handles requirement changes over time, while the complexity of games is found in the simulation of their specific game world (including graphical representation of that world). It's like games are introverts while enterprise apps are extroverts.

If you want to explode the complexity of each of these "classes" of applications, just add the thing that makes the other class difficult to the requirement:

- A game that allows players in its world to interact tightly with an entirely different game world (like "really different", with entirely different rules and game play) which also evolves independently over time and on an uncontrollable schedule

- An Enterprise app that implements super complex processes with thousands of different moving parts, modeled out into extreme detail and without the possibility to resort to "this is handled organizationally"

hermitdev · on Sept 9, 2019

Games are complex, but nothing compared to large enterprise systems, especially financial.

Typical (simplified) day for me: either download files or expect files to be delivered by start of business (typically 8am). Files need to be cleaned, uploaded. Changes need to be sent to untold numbers of other systems. 830am-300pm, various trading activity, new instruments being added, quants want new benchmarks, etc, more data loaded. 3pm, trading ends, mad rush to calc end of day returns, pricing and positions. All of this has to be sent to external accountants and auditors within about an hour.

Id imagine games largely dont have many external dependencies (maybe on a service such as steam, Xbox live, PS network, etc), but they dont have to process and or deliver files from/to numerous sources for tje game to work. They just need the service live. They dont have to wait 15 minutes to check Bloomberg SFTP to find out that their request failed for some untold reason.

I wouldn't be surprised to find AAA gaming and financial networking to be similarly, but differently complex for different and yet similar reasons. Both will want to reduce latency to the greatest extent possible, but for very different reasons.

s_kilk · on Sept 8, 2019

Just a thought : games have the advantage of being constrained by a particular purpose and design, while enterprise architectures are more free to (d)evolve into a sprawling, incoherent mess of systems.

Can you imagine anyone at a game company wanting to replatform the hit-detection system onto hadoop?

ljm · on Sept 8, 2019

I've thought about that, but at the same time I wonder just how much of each game (from a single studio) is actually reusable. There's the networking/infra side for multiplayer games, but even the engines themselves seem to get significant rework for each new game. It's all insanely optimised C++ (which I imagine can be reused in many places until the hardware is updated) and then a huge amount of scripting to piece the dialogue, story, cutscenes together, etc. right?

I imagine it must be quite difficult switching careers between games and (particularly larger scale) web projects. Imagine defaulting to Ruby on Rails to handle your multiplayer infrastructure and trying to model the game state through REST.

AdrianB1 · on Sept 8, 2019

I never worked with games infrastructure, but I can tell first hand about enterprise software (I am a manager & architect for a ~ 1000 server platform): I think it is easier for us. We have very few cases that are really time-sensitive (mostly in manufacturing execution systems), the responsiveness of the rest of the systems is quite relaxed, nobody cares about a few seconds here and there.

In the same time we are seen just as cost, not revenue enablers (no comment on this), so we are very limited in money and technologies we can buy. It is also a very limited market where most companies buy, don't build, so you are usually stuck with the 3-5 real offers for any area and I personally know 2 products we use that are considered industry best, but they are pretty bad. There rest of the similar products on the market is worse - about 10 years in the past.

godelmachine · on Sept 8, 2019

I work on BMC Software’s ITSM module and the most time sensitive application is Service Level Management.

For obvious reasons, SLA’s comprise an integral part in IT services industry, and if I am not wrong, BMC Software’s SLM module is still written in C++, at least that’s what the documents say.

So far, I have never really seen any C++ stack trace in any of the log files, but some day hope to. Furthermore, I have never even seen any C++ related file in any of the configuration files.

brbrodude · on Sept 8, 2019

I always think about how game programming must be so much more hardcore than business stuff in general.. Even the menu interfaces are all so custom & etc while in web & business it takes us so much even for generic stuff lol but then again in general the teams are much more specialized and probably better staffed in qty

lucifirius · on Sept 8, 2019

Here's some EVE Online links. Really amazing stuff.

https://www.eveonline.com/article/tranquility-tech-3

http://highscalability.com/eve-online-architecture

Accujack · on Sept 9, 2019

Seconded. Eve's server architecture is amazing given the original constraints of the network... they managed to find a way to make a single worldwide server work and to make the game treat everyone fairly no matter what their ping was.

The architecture used has limitations (the biggest one is that fps-style real time isn't really possible) but it's aged very well.

Negitivefrags · on Sept 8, 2019

I created an RPG game backend for a game called Path of Exile. Not an FPS but similar challenges. I don’t know how similar any of this is to other game backends, but I’ll supply a few details.

Our backend consists of a few somewhat large services that are broken up mostly around how they are sharded.

The biggest one is the account authority which contains most of the accout/character/item data and handles the vast majority of traffic.

We have 5 shards of that (sharding on account id) with 2 read only replicas of each one of those. All the read only requests go to one replica, and the other replica is for redundancy.

There are also other services like the party manager, ladder authority, instance manager, etc.

All of those shard on different things which is why they are seperate services.

The instance managers handle creating game instances which are the servers that the players actually play on.

We have a pile of servers which we call instance servers each of which runs an instance spawner. When it starts up the instance spawner adds its capacity to an instance manager and creates a process called a Prespawner. This prespawner loads the entire data set of the game and then waits for the instance spawner for orders.

When the instance spawner wants to create a new game instance, the prespawner runs fork() and then the new process generates it’s random game terrain which takes a few hundred milliseconds.

Because all the game resources are loaded before the fork they are all already available in memory that is shared between all of the instances running on the machine. Therefore each instance only takes 5-20 Mb of memory each which is mostly the generated terrain and monsters.

We typically run about 500 instances on the min-spec cheap Single processor Xeon servers we rent. This used to be around 1600 instances in the early days but the game got more and more CPU intensive over time as the game got more hectic over the years.

All the instances connect to Routers. There is one per instance machine, all of which connects to a few routers per data center, all of which connects to a set of core routers which also have all the backend services connected to them.

These routers are important because they know where everything and everyone currently is.

The routers work sort of like internet routers work, but instead of IP addresses, you address your requests to logical entities or groups which can move around and the router network is tasked with keeping track of.

So for example, when you whisper someone, you are sending a message to Account:123 and it will find it’s way to whatever server currently has Account:123 on it right now. If you send a message in global chat to GlobalChat:1 it will be multicasted through the network to all the servers which have currently registered an interest in hearing GlobalChat:1.

If you add someone to your friend list, then what that means is the server you are on will register interest in multicast group AccountSession:123 which is a group that account 123 will multicast all its status updates to like moving between zones or leveling up or whatever.

Parties, Leagues, Guilds, Etc, etc. All of these things have multicast groups associated with them.

If you have any more questions then feel free to ask.

pcnix · on Sept 8, 2019

Tala Moana, exile!

Very interesting to see a GGG answer here! I admit to being very curious about the Path of Exile architecture, and your answer has barely whet my appetite. I have some questions that might give me better clarity over architecture if you are up for answering them: 1. How is data replicated across regions? And how is trade across regions handled? Do the instance servers hand over character data to the account authority in the new region? 2. I remember speculation about some builds that caused extreme amounts of server side compute and slowed things down, was this compute performed on the instance servers? Like poison/chain/monster damage calculations? 3. Is there any sort of automated detection of inconsistent game states done by the instance servers? Duping protections or some such? 4. What is the scaling plan like at GGG? Does the system have obvious bottlenecks that are known or is it easy to scale for the near future?

Conall88 · on Sept 12, 2019

What kind of authorisation mechanisms do you leverage to achieve intra process/ intra server trust?

Has there been any protocols which were written from scratch because an out of the box solution simply didn't do what was needed?

Do you leverage any kind of out of the box solutions to co-ordinate services? e.g apache zookeeper?

Can you describe how nodes are added or removed to your swarm based on demand? I presume this is non linear?

Edit: Ive been playing since pre alpha. glad its still going strong.

dxhdr · on Sept 8, 2019

How do you structure your database schema and handle things like upgrades or versioning? Also curious how the game instances interact with the database and at what frequency and granularity.

Thanks for taking the time to write up that overview, very cool to read!

killerbunbun22 · on Sept 8, 2019

https://open-match.dev/site/

https://mobile.serverwatch.com/server-news/how-epic-games-us...

segmondy · on Sept 8, 2019

Check out this talk from Andrew who worked for Hulu & Riot Games. I enjoyed the session at QconSF https://www.infoq.com/presentations/microservices-pitfalls-l...

Kaivo · on Sept 8, 2019

I believe Activision uses an studio dedicated to multiplayer servers called Demonware: https://demonware.net/

Maybe they have some information on some of the methods they use, when googling for them (haven't searched myself)

gameswithgo · on Sept 8, 2019

at electronic arts the ancillary stuff like matchmaking, stats etc is C# microservices stuff in AWS.

SeanBoocock · on Sept 8, 2019

It varies quite a bit between studios and game. Some legacy games have C++ service backends. There is a lot of java throughout the org. I am currently working in node/typescript.

bullen · on Sept 8, 2019

I hesitate to comment because I don't have AAA experience (thank god) but I made this simple platform from scratch that would be good enough for AAA FPS: https://github.com/tinspin/fuse

Among the unique features are globally distributed hot-deploy of server side code/resources and globally distributed JSON database over HTTP. It's completely async. concurrent and all threads work on all memory which is not what most systems have if you look under the hood.

Only Java (begin the down vote without arguments) has good enough non-blocking IO and memory model for networked concurrency to be workable on the server side, but most implementations are bloated.

The real discussion we should have is tick vs event based protocols, I'm 100% convinced we need to move away from tick based protocols.

codetrotter · on Sept 8, 2019

> Only Java (begin the down vote without arguments) has good enough non-blocking IO and memory model for networked concurrency to be workable on the server side

I think you should post some sources or more detailed arguments if you are going to make that claim.