Perf problems arise with Synapse if those 2000 rooms include massive ones with t...

3np · on Feb 27, 2021

Thanks for the pointers - on latest release with Python 3.8 and roughly 2~5 of those rooms are on the larger ends of the spectrum.

Sounds like I should tune some caches then - I have memory to spare if it turns out to make a difference.

BTW, I just noticed there is an option to add a Redis - would that be a significant improvement compared to just using the process caching?

Arathorn · on Feb 27, 2021

So you’ll want to try dialling up the overall cache factor a bit.

Redis is only useful if you split the server into multiple worker processes, which you shouldn’t need to at that size (and even then, doesn’t provide shared caching yet, although there’s a PR in flight for it - we currently just use redis as a pubsub mechanism between the workers).

Highly recommend hooking up prometheus and grafana if you haven’t already, as it will likely show a smoking gun of whatever the failure mode is.

Are the logs stacking up with slow state-res warnings? Stuff like:

    2021-02-25 23:15:26,408 - synapse.state.metrics - 705 - DEBUG - None - 1 biggest rooms for state-res by CPU time: ['!YynUnYHpqlHuoTAjsp:matrix.org (34.6265s)']
    2021-02-25 23:15:26,411 - synapse.state.metrics - 705 - DEBUG - None - 1 biggest rooms for state-res by DB time: ['!YynUnYHpqlHuoTAjsp:matrix.org (148.6s)']