Hacker News new | past | comments | ask | show | jobs | submit login

Splitting matrix.org into smaller homeservers wouldn't necessarily help (even if we had account migration), as you'd just end up switching client<->server traffic for server<->server traffic. As New Vector (the startup we setup to support Matrix) we're also working on providing a homeservers-as-a-service though which should help.



That's of course true for the massive rooms, such as #matrix:matrix.org, but if there are enough 1:1 conversations to contribute significantly to the overloadedness of matrix.org splitting the servers up and 'randomly' assigning a server to users would decrease the load.

That's a number of 'ifs' there, though.


The problem is that the 1:1 conversations are a negligible weight relative to the massive 15,000 user rooms like #matrix:matrix.org - and spreading those 15,000 users over 15,000 servers rather than 1000 is going to just swap a client<->server API overload problem with an server<->server API overload...


Fair enough, that's also why I included: "but if there are enough 1:1 conversations to contribute significantly to the overloadedness of matrix.org"

Matrix.org is still running, so no complaints there. You'll figure things out soon enough.


Do you know what exactly the bottleneck is on Synapse? Is it syncing everything to users, or the actual room processing? I wonder how much of an impact JSON (de)serialization has, assuming you don't cache serialized requests.


We do. The main bottleneck is merge resolution when unifying your copy of your room with everyone else’s. If the room starts to fragment due to netsplits or unreliable servers then this can get incredibly resource intensive. https://github.com/matrix-org/synapse/pull/3122 is the fix which switches the algorithm from roughly O(N) to O(1).

For context, a typical synapse actually only uses around 300MB of RAM. It only spikes up to 1-2GB when trying to resolve state on big rooms like Matrix HQ, and then python doesn’t relinquish the RAM.

We do cache responses in JSON to avoid serialisation overheads.


Am I reading this right that this is the algorithmic fix, but it still needs a concrete implementation - and then the main server should be back down to "vertically scalable"?

Without going and reading the doc(yet); does this relate to:?

https://jneem.github.io/merging/

It would seem that a simpler, deterministic merge algo would be possible to parallelize - but I'm not sure if it's easy to match matrix idea of merges with what's discussed in that post/paper?


The algorithm is already being implemented in synapse (but got delayed by dealing with some security bugs). There's also a rust test jig for playing with algorithms around merge resolution at https://github.com/erikjohnston/rust-matrix-state/.

It's not directly related to the Categorical Theory of Patches paper - the merge resolution here is much simpler than reasoning about VCS patches, although the approach of taking a formal mathematical approach is similar :)




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: