Hacker News new | past | comments | ask | show | jobs | submit login
Message order in Matrix: right now, we are deliberately inconsistent (artificialworlds.net)
136 points by whereistimbo 9 days ago | hide | past | favorite | 117 comments





I'm the author of the spec issue this blog post is based on: https://github.com/matrix-org/matrix-spec/issues/852

In my implementation for the Conduit Matrix server, the /sync order is used for everything. The timeline is just one list that grows on one end for incoming events and on the other end for backfilled events.

I think it's important that the message order does not change, because that's very difficult to communicate to the user.


A few years ago, I started writing a Matrix client library for Kotlin. At one point, I had to make an API decision based on how messages are ordered. When I found this issue, I subscribed to it and planned on continuing with my library when the spec was clarified. Given how foundational this spec unclarity is, I thought it wouldn't take too long.

Well.


One idea of mine was to continue when Matrix 2.0 would be stable. Might still have some time.

Oh that’s neat (TIL), am also working on a HS that also does this [1].

Not only does it feel like the most correct (I don’t think there is a perfect) behaviour for the user but also makes implementation much simpler. Synapse has a LOT of ordering foo and magic in the code I still don’t fully understand and I’ve gone fairly deep into synapse at times for work.

[1] https://github.com/Beeper/babbleserv


This is something that many chat apps get wrong and I'm not sure this article is moving in the right direction. The UX is fairly clear in my mind:

1. All up-to-date clients should be displaying the same message order. 2. A single client should not send messages in the wrong order.

Yes a client may be out of date and therefore show something different, but once it becomes up to date it should be showing the same state even if that means amending history. Why? Because the humans reading it will be confused otherwise! An app getting more data is something we intuitively understand, but if my client shows something and yours shows something else, we will conclude different meanings from it.

Additionally there are some clients that treat each message input by the user as a retriable thing in isolation, which is also clearly incorrect. If I send two messages and the first fails to go through, I almost certainly don't want to retry the second until the first has gone through, otherwise my client has literally sent out of order messages! I use Beeper for chat and this is one of the most frustrating things it does.


Agreed, absolutely. I was surprised by the author's aside that mentioned a colleague that disagreed, claiming that it's ok or even useful (what?) that two clients (or even the same client in different situations) could show different message orderings... and acknowledging that this point is debatable.

It's not debatable! There is, actually, in reality, one true message ordering (at this time I think we can safely ignore relativistic effects), and that message ordering should be the one that is always displayed. Our technology and the nature of distributed systems may make it impossible to always faithfully determine what the true message ordering is, but at the very least, the implementation should decide on a single message ordering, and always present that same message ordering. (This does mean that clients connected to different homeservers might see different orderings; this is unfortunate but probably unavoidable. But clients connected to the same homeserver should all see the same ordering.)

Anything else is just terrible, awful, horrible UX, that will ultimately confuse users. And it will also reduce user faith in the system as a whole: if they notice the inconsistent ordering, they will assume the system is buggy and unreliable.

At the risk of sounding way too absolutist: this is not debatable, and anyone who thinks it is, is wrong.


> (This does mean that clients connected to different homeservers might see different orderings; this is unfortunate but probably unavoidable. But clients connected to the same homeserver should all see the same ordering.)

I don't think it's unavoidable at all, this is exactly the kind of problems CRDTs solve beautifully. If you model your chatting as a CRDT (basically, each message has a Lamport timestamp on it, and you have some sensible way to resolve ties deterministically), then assuming all peers have gotten all messages, the ordering should be perfectly consistent. What may happen is that you type a message, press "enter", and then some message might pop up before that when everyone finally sync, but I think that is acceptible UX, I see version of this in things like Slack and Discord frequently.

All peers having the same message order assuming they all have received the same messages is absolutely required for a chat application in 2024, distributed or not.


> clients connected to the same homeserver should all see the same ordering

Ideally, yes. In the real world you get to deal with such inconveniences as unreliable transport, slow networks, server-to-server communications, eventual consistency, routing glitches, reconnections, clock skew, queues, concurrency, retries, and always, ALWAYS the infuriatingly slow speed of light. Oh yes, also multiple client implementations. For the record, many years ago I was involved in writing both a chat server and a client. (Albeit those were two different projects.)

From pure UX standpoint, you want the client to always show any messages it has sent but has not received back. Even for a single server and two clients, all synchronised to the same clock, you can get ordering conflicts. Let's say that you are sending a message at fixed two-second interval, and the other client is sending messages at non-fixed, power law distribution intervals. That's your happy path.

Now consider the same with dozens, or hundreds of clients across hundreds of different networks, each with their own debatable quality.

You want to see the messages you've sent, so they need to be visible on your screen. Having your own messages disappear into the void and only appear once they have been sent back is terrible UX. So you keep a local order and interleave received messages as they come in. But once you receive the message back from the server, you obviously want to reorder the known quantities. With two clients you will the occasional "jump" where one of your messages is moved to its canonical position. With hundreds of clients each user will see those jumps constantly - and with sufficient volume a decent fraction of their own sent messages can "disappear" at any time from their screen as they are reordered and don't fit on the screen.

Now add lots of bad networks and latency floor in hundreds of milliseconds. Network connections/route fluctuate constantly, so even messages sent by the same client less than two seconds apart can arrive in different order at the server. (The client reconnected between the two, and the message sent over the first connection arrives several seconds later than the other one.) The user is confused, because the server is very clearly showing them their own messages in the wrong order.

For one inconvenienced user the server being wrong occasionally is mildly annoying. But when that can happen to any number of users, concurrently, at any time, the overall effect is outright infuriating.

> this is not debatable, and anyone who thinks it is, is wrong

Your server has a known order it sent the messages out. Any disagreement means the client must be wrong.

Each of the clients connected to your server has a known order in which they sent their messages out. Any disagreement means the server must be wrong.


> Yes a client may be out of date and therefore show something different, but once it becomes up to date it should be showing the same state even if that means amending history. Why? Because the humans reading it will be confused otherwise! An app getting more data is something we intuitively understand, but if my client shows something and yours shows something else, we will conclude different meanings from it.

That's interesting because I have the complete opposite take and would hard disagree with this. I intuitively understand that if we both write messages at the same time, we will see them in different order. Snail mail has worked this way for centuries, and I very much prefer this to an app silently altering the content as time goes. It is confusing when it happens under my eyes (something moved at the top of the screen while I was reading the bottom, what was it?) and easily leads to missed messages especially in group conversations (my buddy sent a message with a poor connection at 11am, it is retried and sent at 2pm and appears before the lengthy discussion others had at noon).


Snail mail has never claimed that a history of all messages, with that history having a current state, exists. If you send a paper letter, you don't have it yourself anymore. You might keep a copy, but that's a _copy_, not the letter you sent.

Messenger apps claim that such a history exists by showing you, well, that history. In the same way, messengers claim that a message order exists, by showing you the messages in that order. If something exists, then it is independent of the viewer. So the assumption that the message order is the same for all viewers is founded in how two people look at physical objects.


Messenger apps don't claim that this history should be global and consistent. The order in which messages were sent and received by my device is a perfectly fine (and I'd say intuitive) history. It is the order people (and their records, if they have some) would have had in mind in the old time.

I take a different conclusion from the way people look at physical objects: since your device (or even my other device) is a different physical object than my device, I'd be wholly unsurprised to find a different order there.


> Messenger apps don't claim that this history should be global and consistent.

The fact that we're talking about multiple people looking at the same chat - the fact that we do conceptualise it as "the same chat" and "the history" - implies that we think of it as a single thing. And I think messenger apps generally nudge us that way - e.g. setting the name of the chat usually sets it for everyone.

> It is the order people (and their records, if they have some) would have had in mind in the old time.

I don't think it is. If I pull my correspondence with person X out of my drawer or file, the only dates I have to order them by are the dates written on the letters - which are the letters they and I (if I keep carbons of the ones I send) wrote them on, not the dates I received them. If they sent me a postcard while on holiday and then a letter after returning that arrived sooner, I'll read them in one order on receipt and in a different order when looking back. Likewise if I have a memo of a phone call with them, that may be from before I received a letter that is nevertheless dated earlier.


> I think messenger apps generally nudge us that way - e.g. setting the name of the chat usually sets it for everyone.

That's a good point - maybe it's actually email that warped my mind.

> I'll read them in one order on receipt and in a different order when looking back

Also a good point, I was thinking more about business communication where the date the letter is received matters. Thinking back on it, I think the main difference is that the messenger apps might happily reorder message before (or while) I read them. And if only one order is to be available, the one of the most use to me for an instant messaging app is the one I received the messages in, but I get how for other use cases it would be different.


Well, this really depends on the protocol and architecture of the system.

If it's a system where the server is merely store-and-forward, where it forgets its knowledge of messages after the recipients have received them, then sure, your stance is reasonable. The client will decide on message ordering; it can either just display in the order received from the server, or use any timestamps stored in the messages to order them (including possibly reordering if messages arrive out of order). The client has no other source of truth it can draw from, and so different clients may order things differently. (Even in this case, though, for many systems like this I would expect the server to timestamp the messages, and for all clients to honor those timestamps, so in practice everyone should see the same ordering.)

But Matrix is not such a system: each homeserver is the system of record for what messages have been received, and what order they go in. In this case, I would expect all users to see the same ordering, assuming all clients are able to query and receive the full history from the server.


> I intuitively understand that if we both write messages at the same time, we will see them in different order.

I think you are thinking like a distributed systems designer. I would assume that if you asked 10 "random Americans" 9 of them would assume that someone managed to send their message first and would be surprised if their phone and their friends phone showed them messages in different orders.


Many people have had the experience of "I tried to call you, but you were calling me!"; I don't imagine they'd be surprised if "something weird" happened when you both tried to send a message at the "same" time.

I would assume 9 of them would not care either way.

I don't think you're arguing the same point. I agree with you that when two people write a message at nearly the same time, they may (initially) see those messages in a different order. But the server should decide what the ordering is, and inform the clients, which should update their view of the world.

The ordering the server decides may not be "correct" (for whatever definition of correct matters to you), but what is most important in this situation is consistency.


You need to know what ordering the other user is seeing, otherwise dangerous misunderstandings could result, for example you reply "yes" but it's in reply to a different message than the other side is seeing.

If I see that another message has arrived and my message could be misunderstood, I can correct that by sending another message. But if I don't know what ordering the other side is seeing then I don't know if my message is ambiguous or not.

The only way achieve that is to show a consistent ordering to both participants (or to force every message to be in reply to another message but that's too nerdy)


I don't use these apps so maybe my solution wouldn't work, but after reading the article, it seems that having a visual indicator of messages that are new but in the past would be a reasonable solution.

Especially if there were simple controls to flip into a mode that minimizes the ones already seen (collapsed and grey for example) while highlighting all of the ones inserted in the past.

Or, if there are many messages already seen and few inserted into hist, show the inserted ones with a small sampling of the already seen (so the user can anchor to already familiar data in the timeline) along with "72 messages hidden that were previously seen" type of thing in between the inserteds to condense the view.


> Additionally there are some clients that treat each message input by the user as a retriable thing in isolation, which is also clearly incorrect. If I send two messages and the first fails to go through, I almost certainly don't want to retry the second until the first has gone through, otherwise my client has literally sent out of order messages!

I don't think that's clearly incorrect. If you sent two messages you presumably want them to be two messages and they should be retried as such. If what you wanted to send was a single, multi-line message, surely you would have just done that?


Human communications are more naunced than DB transactions. If I forget to mention something important I send a new message rather than editing the already sent one, to make sure I catch their attention. Edits can go unnoticed. Imagine this scenario:

[12:00 / sent] Sell the house.

[12:05 / failed] Please feed the baby.

[12:06 / sent] Oh and the cat too.

Now the receiver's going to sell my cat [example inspired by The Art of Multiprocessor Programming].


They sure are, but I hate when slack combines my messages when I wanted two separate messages on purpose. If i send two messages, it's because I did it on purpose.

Not at all. The separation of messages is part of communication, not me trying to game a network protocol. Maybe it's to emphasise a point, maybe it's to time a joke, maybe it's to send a photo and a text message separately.

I think you are perhaps not aware of how people use messaging apps in the real world. Many people (myself included, sometimes) will break sentences or sentence fragments into different messages. If the messages are displayed out of order on the recipient's side, it would be pretty hard to understand.

And even in the case where people do tend to send one complete thought per message, it still matters: like maybe I send a message, and then have an extra thought, and send a follow-up message that clarifies my first message. If they are displayed out of order, that will be confusing.

Even if two messages are completely unrelated and completely separate thoughts (honestly this feels like a much less common case than the alternative), messages just should be displayed in the order they were sent, because that's what reflects reality best.


> If what you wanted to send was a single, multi-line message, surely you would have just done that?

No. danpalmer is correct; the break between messages is an integral part of the communication.


I break up my messages, as do many people.

Do you do so with the expectation that they might arrive out of order, or one fails?

Out of order no, failing and having to manually re-send which makes it out of order is acceptable.

I find that unacceptable and frustrating, personally. If a message fails to send, I want the client to hold back any later messages until the failed message is resolved somehow. It should auto-retry and (hopefully) eventually succeed, or I can manually delete it and "release" the following messages.

I do so with the expectation that they should arrive in order if no one fails, but apparently it is "debatable" if this is a reasonable expectation

How far back should you be able to amend history? What if a malicious client adds messages to a conversation that happened in the past? Imagine for example I'm at work and notice a critical mistake that I missed, and so I retroactively add messages to the old conversation to make it look like I'm not liable, should that be permitted by the protocol?

> How far back should you be able to amend history? What if a malicious client adds messages to a conversation that happened in the past? Imagine for example I'm at work and notice a critical mistake that I missed, and so I retroactively add messages to the old conversation to make it look like I'm not liable, should that be permitted by the protocol?

I believe that's impossible? At least if you design it correctly.

For ordering/interleaving purposes, what matters isn't the time you claim to send the message, it's the time the message is received by the server. If you want, you can display the claimed send timestamp beside the message (and prominently highlight it if it is e.g. out of order, or with a long delay, etc.), but that is irrelevant to the ordering.

The point here is that there should be a single consistent order on the server, and that's what all clients ought be displaying. Any messages not yet acknowledged by the server should be displayed differently so that users are aware they haven't been seen yet, and any messages that arrive before those are sent would obviously get inserted above those.


> what matters isn't the time you claim to send the message, it's the time the message is received by the server

There's no "the" server here. If you use the time the message is received by the server, you'll get different views on different servers, and you may see messages from months ago appearing as new, if connectivity breaks down and is later restored.


> There's no "the" server here.

Can't you assign every conversation to a single authoritative server for handling?

Also, how large of a time skew are you imagining would exist between different servers? That stuff ought to be accurate to at least milliseconds if not micro...


> Can't you assign every conversation to a single authoritative server for handling?

The whole point of Matrix is to be decentralised. In particular people should be able to keep talking when on different sides of a netsplit, by design.

> Also, how large of a time skew are you imagining would exist between different servers? That stuff ought to be accurate to at least milliseconds if not micro...

The question isn't how much time skew there can be between server A and server B, it's how long they can be cut off from each other over the network, which could be hours at least. (And even when things are working well, a normal ping is a few hundred ms, which is enough to change the order of messages)


> The whole point of Matrix is to be decentralised. In particular people should be able to keep talking when on different sides of a netsplit, by design.

OK but I still don't see the problem. Even with a fully decentralized system where the servers are just pure relays with no authority, you have two options:

1. Display messages in the order in which they claim to have been sent, or

2. Display messages in the order of arrival

Case #2 is the obvious/uninteresting one, there's nothing to say about it.

Case #1 is what people are saying is so impossible to achieve a global order for, but really, what's the big deal? If a client claims to have sent a message at an unusual time (say, > 10 seconds in the past, or after the app was already quit, or whatever criteria you want to set), then just insert it at that point in the conversation, and visually indicate to the user the discrepancy. And clock skews won't really be much of a problem because messages can easily indicate prior messages in the conversation, so that a mere clock skew doesn't insert them before preceding messages.

What's so hard to make user-friendly/intuitive here?


> If a client claims to have sent a message at an unusual time (say, > 10 seconds in the past, or after the app was already quit, or whatever criteria you want to set), then just insert it at that point in the conversation, and visually indicate to the user the discrepancy.

I don't know that OP would be happy with that, and certainly someone would need to a) actually design the UI for it b) figure out what information the client needs from the server to implement that, and whether it's possible for the server to provide that information.

I think you're probably right, FWIW, but someone needs to actually do the legwork of designing and implementing what you're suggesting rather than just handwaving it.


Why cant we have a sent time and the receipt time ?

The time the sender claims the client sent the message be appended to the message itself.

Let it reach 500ms or 2 seconds later.

If there is an acceptable skew between the sending time and receiving time, we just accept the sending time.

Edit: what this could do is, the sender when they sent the message, they were aware about x messages before and the clocks being in sync for existing messages, their message even if received 2 seconds later would be put in the origiNAL order of sender intention


Aside from the concerns with decentralized servers that the other poster mentioned, this has the disadvantage that your messages are going to get constantly reordered to not match the intended flow of the conversation when you have poor connectivity, which is a bad user experience

Wasn't the whole point here that the messages wouldn't get reordered? There would be one definite order that everyone would see. Again, if the message isn't timestamped by the sever, it would need to appear visually differently, so that everyone knows about this. And nobody says the server has to accept messages with arbitrarily delays either.

My point is that some limited reordering maybe should be allowed, but not too much. That is to say, the problem isn't as simple as just doing it one way or another way. Every approach has some disadvantages.

I know you were trying to reach that conclusion, but my point was that the design I suggested neither seemed to have the problem you suggested, nor is reordering a necessary outcome, from what I can tell.

You can amend displayed order for humans (what matters for 99% of usage), while still allowing anyone interested to see when the message actually arrived at the homeserver (making the suggested gambit impractical).

I don't think this is really a problem, at least in the case of client->homeserver connections. The homeserver should not be trusting the client's sent timestamp. The homeserver should consider the message sent at the time it receives it, and the client should know this and update the sent timestamp displayed to the user when it is finally able to connect to the homeserver, and the homeserver acknowledges receipt of the message.

The bigger problem is how to handle homeserver<->homeserver comms. My initial feeling is that the homeserver where a destination room is hosted (let's call this one "A") should have the final say, and if there are people on another homeserver ("B") that have joined the room, and are chatting while there's a break in connectivity between the two homeservers, then A should just append all the messages from B to the end of the record (with correspondingly "later" timestamps) to the "official" record, when B is able to communicate with A again.

But this feels messy too; presumably all of those new messages (a conversation that may have been going on for tens of minutes or hours) would be smooshed in to have their timestamps all appear nearly at the same time? No, that's not great either.

Or perhaps B just shouldn't accept messages for that room while it can't communicate with A? That doesn't seem great either.


Then instead imagine this: the user really is innocent and just happened to coincidentally send the message right after the start of a long period of poor connectivity (like a flight, or a road trip, etc). If you just allow it to go through after an arbitrary delay, with only a log of the received time for liability purposes, then the user wouldn't have any indication of this scenario occurring.

Wouldn't it be better in that case to show an error so that they can make sure the situation is addressed appropriately?


You have when the client claims it was sent (so where it goes in the displayed history) and can see when it was received. What else could you possibly do?

For example, you could reject the message and show the user an error but only if there's a discrepancy of >X minutes. But how much discrepancy should be allowed? I don't know, I only mean to show why I think the solution isn't as simple as it appears

> you could reject the message and show the user an error but only if there's a discrepancy of >X minutes

No you can't, not in a federated and decentralised system like this.

The sender can wait for a read receipt from a given receiver user, if the receiver is willing to make those public. But if the message left client A and didn't arrive at client B, there's no objective fact of the matter about whether the message "was sent" or not.


Seems like a design deficiency of Matrix then. When IRC federation breaks, everyone can see it, except for the rare people who aren't in a shared channel with anyone on the other side of the break.

> When IRC federation breaks, everyone can see it, except for the rare people who aren't in a shared channel with anyone on the other side of the break.

Well sure, you could see that something was going on, if you were paying attention. But how does that solve the problem? Does your IRC client stop you from sending messages if it detects a netsplit?


I don't think IRC is a good analogy here because there's no "message resync" that happens when the netsplit is resolved. If there are two people on opposite sides of a split, and they both send messages to a channel while things are still split, they will not see the other's messages when the split is over.

In the Matrix case, if a homeserver disappears for a while, it will sync any missed messages when it comes back online.


Which is a design deficiency when those messages are 6 months old, and are spam, from people who got banned 5 months ago.

I think so, but I'm not sure that OP would agree, given that they apparently want to see the exact same scrollback on all their devices.

> Wouldn't it be better in that case to show an error so that they can make sure the situation is addressed appropriately?

If you want a single centralised server then you can set things up that way. Presumably if you're using a setup with multiple servers, and took one of the servers on the flight/road trip, you wanted the people on the flight/road trip to be able to keep talking to each other over that server, even though that server is disconnected from the one in the office.


> How far back should you be able to amend history?

If user U1 sends a message M1 at time T1, then U1 must be able to modify/delete that message M1, in some reasonable sense, at any conceivable time from T1 forwards.

Any protocol that doesn't support some reasonable form of message modification/deletion in this sense, is a toy protocol, and will never be widely adopted.


I'm not talking about message editing but rather posting new messages with a backdated time.

I suppose every message has a few timestamps, including

- The timestamp that the user specified as part of the message

- The timestamp of the server that the user submitted the message to, directly

- The timestamp of the server that first received that message

- Any additional timestamp(s) of additional server(s) that received that message

The user can I guess backdate a time, but that would apply only to the first thing, and none of the others?


The first one is the only one that can accurately describe where the user intended for the message to land in the conversation. All the others are subject to network delays, so if you order messages by anything other than the first one, you are going to get a bad experience if you try to participate in a busy conversation with a poor connection.

The first one is entirely un-trustable, because the user-submitted timestamp can be at any arbitrary point in time, from t=0 to t=infinity. So the receiving system can use that timestamp as an important bit of signal, but it can't really treat it as strictly authoritative, at least not if it expects to maintain a coherent set of events overall.

Exactly, that's why I'm saying it's a more challenging problem than it appears and there's no one solution that always gives the best experience in every case. I personally think a hybrid approach of allowing some limited discrepancy between user and server timestamps is probably the best you can do.

This shouldn't be an issue for systems where a server mediates communication: the server should be timestamping messages, not the clients.

This could indeed be a potential problem for a decentralized system, or one where the server for some reason cannot (or cannot be trusted to) timestamp messages. In that case, I think the best behavior for a client would be to always display messages in the order they've arrived, regardless of any timestamp provided by the sender.

But this problem shouldn't exist for a system like Matrix. Matrix is (somewhat) decentralized, but each homeserver can still decide on the message ordering it will present to its own clients.


If you don't allow the client to specify the time they sent the messages, then anyone who has a poor connection is going to be subject to an annoying behaviour where their messages are constantly going out of the intended order during busy conversations.

Obv it depends, but one way to "solve" this it is to show an edit history, or at least the latest edit timestamp along with some visual indicator that the message was edited recently

I'm not talking about edits, I'm talking about sending new messages which are backdated to appear as part of an older conversation

icic, yeah that definitely shouldn't be allowed

Matrix allow to edit weeks old messages, already.

But there's a flag which indicates they've been edited and you can see the edit history, right? So that's not useful for this scenario.

I'm throwing some shade here, but this reeks of backend engineers not caring about UX.

this reeks of backend engineers not caring about UX designers who don't understand the problem while the UI designers who do understand are barred from attending meetings for bad behavior. I'm not throwing shade.

I don't agree. There's no technical reason why the different API endpoints can't return the same ordering. The current top comment here[0] is from someone who has implemented this (IMO) correctly in a different homeserver implementation.

[0] https://news.ycombinator.com/item?id=42325737


Having dealt with this problem at work for several years now, I feel the pain of keeping different clients in sync - it's extremely difficult. Not sure if it's possible in Matrix, but consider having a message ID that increments by one on every message in a room. That lets the client know pretty quickly if there's a gap or a misordering.

Not really getting this point though:

  The /sync API returns events in an order "according to the arrival time of the event on the homeserver".

  The spec for /messages says it returns events "in chronological order. (The exact definition of chronological is dependent on the server implementation.)".
Why would those two return different results? When does the chronological order of two messages differ from the arrival time of the event on the homeserver?

What I think you're missing is that Matrix runs as a distributed system. There's no central authority to assign IDs to messages, and it's possible for a single group chat to run in a split-brain configuration if two homeservers lose connectivity to each other. When those homeservers reconnect, users connected to each one will see messages appear "in the past" which were sent by users on the other side of the split.

Perhaps I'm wrong about how Matrix works, but my understanding was that at least public rooms still had a "primary" homeserver, like for example I can connect to #debian:matrix.org from any number of federated servers but matrix.org is still where that room "lives".

If that understanding is correct, then IMO the answer is simply that the canonical timeline is what that server says it is. Poorly connected users or those on other servers experiencing issues or delays with federation may temporarily see a different sequence of events but once everyone's had a chance to sync back up the state should generally be what the primary server for the room saw it as.

Perhaps there should be some sort of flag for "this message has been reordered during a resync" that clients which initially had a different state due to whatever reason could store to make it clear what happened, and likewise if the central homeserver receives messages with a timestamp significantly off real time it could flag those messages as possibly having been received out of order while still displaying them in the order they were received.


AFAIK there is no primary homeserver. The human-readable name of a channel has a homeserver part, but this is only for discovery purposes and maps to an alphanumeric random ID.

Split-brain scenarios can be resolved using an odd number of nodes (or voters) to achieve a majority consensus to agree on the state of the system, stopping the services on the minority side to prevent conflicting operations. Once communication is restored, the stopped nodes can rejoin the cluster and synchronize their data. Vector clocks are a great abstraction for ensuring correct ordering as well.

yeah, having eventual consistency for messages across homeservers makes the work on the client harder. I guess they just have to accept that messages will "appear in the past" as you said.

But at least for messages sent within the same homeserver, I would think that those two apis should return the same data


I think you basically want a partial order for federated chat: messages should arrive after the messages that cause them but not necessarily after messages that didn’t cause them. In the case of a network partition, this allows people on either side of the partition to continue communicating at the cost of non-determinism when the partition is resolved.

I'd maintain that an important property is for the system to be eventually consistent with regards to history. You don't want a transient network event to potentially result in two users permanently seeing messages in a different order.

I don’t think you can prevent that without centralizing on a single server

You can, but it results in the situation the article is complaining about.

During a netsplit, people chatting on opposite sides of the netsplit continue to be able to chat (by design), but will (obviously) see a different history from each other. So when the netsplit heals, you have a dilemma: either you splice the history from the other side in, giving eventual consistency at the cost of changing the history that people have already read, or you keep permanently different histories on servers that were on one side or the other.


You could put the other side into something that looks visually like a thread. Each side will have a different history. They will also have a marker that says the history was split here and click here to view the other side.

If you can come up with a good design for what a client that does that should look like, and what information it would need from the server to do that, please do write it up and publicise it. I think ultimately something like that has to be the solution, but it would have to be actually fleshed out into something that's possible to implement.

That makes the problem harder, but not impossible.

I'm pretty sure this is actually impossible in a distributed system with independent operation, and if it were possible, it would be terrible UI anyway.

Problem one is if you want to order events chronologically, you need to precisely decide what the time of the event means. Probably not the time the client hit send, because you can only measure that on the client and client clocks are at best approximately accurate. You could consider the time the server received it, and assume your server times are accurate, but that's still problematic because even in a well functioning system, if a user sends message A to server.wdc around the same time as a user sends message B to server.lax, users connected to server.wdc will get A then B, and users connected to server.lax will get B then A, and this leads to problem two:

Problem two is messages generally display in order of receipt. If you get a message that slots in earlier in the thread, you may need to scroll up to see it. In a busy theead, it's going to be hard to read all the messages because of the back and forth. If you send a message, it may need to be reordered too. If you go back to the thread later, new messages may be in different places. This is more disorienting IMHO than different message orders for different viewers.

Problem two gets even worse when you don't just have distance between servers, but also some network or other operational issues. If a server accepts a message, but is unable to forward it immediately, you probably want it to forward it whenever it can... if there's a significant delay, now the message is again going to be displayed in a place where it's difficult to see.

You can kind solve this by forcing messages to a group to go through a single queue which forces an ordering, but that makes accepting messages for a group a lot more difficult.


As long as the speed of light remains constant for all observers, who cares if everyone agrees on simultaneity? Distributed systems don't need to know what time it is, just what happened.

Well known systems are implemented this way, and the UI is great, people barely even notice.


The problem is that we don't just want to know what events happened, but also the order of events.

Between the posters in this group, we've got some ideals we'd like to meet

a) messages should be displayed as soon as possible when they arrive (this one isn't written much, but I think it's generally agreed)

b) messages should be displayed in a globally consistent order

c) message order should not change after display / newly arrived messages must be at the bottom/top

Unfortunately, we can't meet all these ideals unless instant messaging becomes actually instant instead of just pretty fast messaging subject to the speed of light and other sundry delays.

You can get all the properties if you serialize messages through some single queue somewhere, but of course that means additional delay and a spof.

You can most likely get b and c if you compromise a, and just don't display messages for long enough that you probably have everything and can order it according to the gloablly consistent ordering algorithm.

If you compromise b, you can definitely do a and c. Messages go to the end when received. Easy peasy. You can't leave placeholders for messages that will be sorted earlier but haven't arrived yet, because you generally won't know they exist until they arrive; although there are some cases where you could know. If a user receives message X and sends message Y, but due to delays and what not, you receive Y before X ... Y could indicate the presence of X, and a reciever who gets Y first could reserve a place for X... but that doesn't work for simultaneous messages.

If you compromise c, you can do a and b, messages are inserted into the ordered list on arrival based on the consensus ordering algorithm. Easy enough.

Of course, there's not widespread agreement on b and c. So half of the thread is people saying b is clearly non-negotiable, another half is saying c is clearly non-negotiable, and the other half is saying why can't we just have everything we want.


Give up on the idea of "the timestamp" of an event. There is no such thing. Clocks are unreliable, and even if every clock in a system is perfectly in-sync (via atomic transponders or whatever) they're still subject to speed-of-light discrepancies that make it impossible to define "the time" of any event.

Two nodes separated by 10000km require ~33ms to send information in one direction, and ~66ms to do a roundtrip. Send X=1 to node=A from a client that's 5ms away from A at client-local time T1, and then send X=2 to node=B from a client that's 4ms away from B at client-local time T1-1ms -- when were these values sent, and what is the value of X? There is no answer, X is both 1 and 2, depending on when and who you ask.

You can define a leader node C, which receives updates from child nodes A and B, and that leader node can serialize updates in a way that produces a single linearizable sequence of updates, sure. But then that sequence of updates as defined by C needs to be propagated to child nodes A and B, which takes (let's say) 66ms round-trip minimum. So when your client sends X=1 to node=A, it has to wait for at least 66ms before it can make a correct read from that same node -- X may actually be 2!

Logical ordering of events in a distributed system is a solved problem. The solution is vector clocks (or something like them).


> Logical ordering of events in a distributed system is a solved problem. The solution is vector clocks (or something like them).

Vector clocks don't solve this problem, as they only provide a partial ordering. When multiple messages are in flight in the same 'simultaneity window', different observers may receive them in different orders and vector clocks can't determine a consistent order of those events.

Vector clocks could be used determine the order is inconsistent. But what do you do with that information? You might likely have follow on events where A and B are unordered, and A1 was sent only seeing A, B1 sent only seeing B, and C was sent seeing A and B without seeing A1 and B1. It only gets more complex from there.

I'm not a UX person, but I can't imagine how to show this to users without causing massive confusion and information overload. There's a very small set of people that have studied distributed communication that would get this. And I haven't seen any UI that shows similar information in a coherent way... maybe git graphs, but I don't see how you make that fit on a phone screen where you're having a group chat.

Maybe just some indicator that says other people may see these messages in a different order, but then if it's not an immediately obvious signal, it's not really going to help users understand.


> When multiple messages are in flight in the same 'simultaneity window', different observers may receive them in different orders and vector clocks can't determine a consistent order of those events.

That's right! Vector clocks only provide partial order. But partial order is the only actual truth in any distributed system. Total order is a fiction, which only exists in the context of a specific node, based on that specific node's experience of reality (message receipt).

In any non-trivial distributed system, there is no consistent (total) order of events, at least not without a consensus protocol. There are lots of ways to hack a (fake) total order, and many of those approaches are enormously successful nearly all of the time. But, still, you know.


I don't understand why you seem to agree with me but also said

> Logical ordering of events in a distributed system is a solved problem. The solution is vector clocks (or something like them).

A partial order solves the problem when messages are not simultaneous and so the partial order provides a total order. That there is in fact no total ordering of simultaneous messages doesn't solve the problem that users would like to have messages arrive in a consistent order without delay. This is unsolved, because it's unsolvable, therefore vector clocks aren't the solution.


Vector clocks provide a deterministic partial order, but partial order doesn't provide any kind of total order. That's true, yes.

But (as I'm sure you're aware) there is no single deterministic total order of events in a distributed system. The system can assert some specific total order, based on some specific criteria -- say, LWW based on node identity -- but that's system-specific and arbitrary.

That "users would like to have messages arrive in a consistent order without delay" is a nice and valid expectation, but is literally impossible, in the general case. Vector clocks solve a lower-level problem; nothing can solve the higher-level problem.

(Again, as I'm sure you're aware.)


There are relatively straightforward decentralized consensus algorithms for ordering events if we assume cooperation. If we assume malicious peers, then we're in the space of the byzantine generals problem, but there are solutions to that too.

Now there's some property that you have to give up, for example an immutable ordering. You might think the message came in one order, then reconnect with the network and discover the order was flipped. So long as the UI can handle that an update, there are consensus algorithms that will deliver a consistent view even in the edge cases.

You don't need a single timestamping queue.


> This is more disorienting IMHO than different message orders for different viewers.

the parent post is arguing this is less disorienting, and I agree.


I feel like when an important message comes in out of sequence, but you had already sent a response to the chat with what was visible at the time, it will be very confusing when that gets reordered.

Ex:

A@T0: User X is abusing our service, we should send them a sternly written letter.

B@T60: Yes, I'll do it right away.

C@T2 (received later): No, we should just shadowban them.

When B sent their message, their intent was clear to them. But when they review their message after C's message is received, if the display ordering is changed, the meaning of the communication has changed, and how can B show that sending the warning was reasonable when they clearly said they were going to shadowban the user. (Maybe this group should use something else with a guaranteed ordering to track abuse and response, but that's a different question)

If C's message is displayed earlier than B's in some cases and not others, that makes for a confusing situation, but each person can look at their messages and easily see what they saw when they argue about a breakdown in communication in the aftermath.


It makes an incrementing message ID impossible.

Only if you're not ok with eventual consistency and renumbering in the case of discovered conflicts or net splits.

What's mysterious here? One ordering is dictated (arrival time), another left for the consideration of the server (likely allowing for stuff like pinned messages, etc, that break the strict ordering).

If a Matrix server allows to delete messages (by the poster or by a moderator), then increasing IDs with no gaps become impossible. If the server allows editing of existing messages, then a sequence with no gaps is not sufficient to reflect all changes. Ideally a server does not do either, but uses more messages to augment existing messages, or mark some as deleted; with that, a sequence with no gaps would suffice.


/messages might be a legacy endpoint compared to a newer /sync. I know Matrix has been working hard on their sliding sync api.

Non-monotonic clocks?

In general we certainly want to be able to change things "in the past". When there is unpleasant spam in a groupchat, you want a moderator to be able to remove or at least hide it, in a way that means people scrolling up won't be exposed to it unless they explicitly want to. (You could argue for having the client deal with all of that, but I don't think there's much benefit).

And if, as in the example at the end, clients on different homeservers will inevitably see different views, then I don't think always showing the same history to the same client, or clients on the same server, solves the "gaslighting" problem - if anything it could make it worse. Maybe clients should make it obvious when messages have been "retconned" into the scrollback, and maybe servers should have certain features to support that. But the idea of having a consistent linear timeline is one of those answers that's clear, simple, and wrong.


This article makes a logical step that I think is incorrect - that message order from the server is the order that a client then displays them in.

Surely that's a presentation issue - you should display messages chronologically, regardless of what order you got them from the server? The author does touch on this a little bit, I don't see how that isn't the "obviously" correct approach?:

> An alternative is to continue providing events in any order, but add some kind of order number that allows clients to sort events into /sync order. MSC4033 proposes this.


That "presentation issue" is quite an issue though; how do you sanely present to a user the fact that there are new messages to read, but they're scattered throughout the history of the message log arbitrarily far back?

Granted, tacking them all at the end isn't necessarily good, but at least the user will see them, and timestamp indicators can help make sense of it.

And I don't even see how placing them chronologically would be particularly useful - given a netsplit there's not gonna be any relation between the previously-present and delayed-received messages at that time interval anyway, you're just interleaving two entirely different discussions for no reason. (ok maybe there can be some unidirectional delay where it could maybe be useful, idk)


> how do you sanely present to a user the fact that there are new messages to read, but they're scattered throughout the history of the message log arbitrarily far back?

Like this[1]. It's how Element does it, and it's perfectly fine. Show the unread messages with a different background color until the user dismisses them if you must distinguish them from previously read messages. Alternatively, add a "jump to last unread message" button (and change the "jump to first" icon into a double up-arrow) that marks said message as read after jumping to it so you can just keep clicking it to hop to each one. If there's anything I'd change about this UI element, it's to display the number of unread messages in the green dot.

[1](https://imgur.com/UMboT1o)


This might not sound entirely awful in theory, but I'm fairly certain it'd be entirely awful in practice for a good amount of users.

Having colored backgrounds might look ugly and thus immediately make that a non-option, and, even if viable, are still problematic if new messages are added while reading, as you'd need multiple colors for the reading pass the user's on.

Having to click a button repeatedly is also likely to be entirely unacceptable UX for most users, never mind that you're essentially recreating the receive-order message list, just without displaying it in a sanely traversable format.

This problem is hard enough with linearly added messages to the end, making it 10x worse is awful.


And then there's the issue on switching between clients (e.g. PC and phone) or otherwise having client state lost - if the complete reading progress isn't saved server-side, then it becomes entirely impossible to restart reading; whereas with a receive-order log you can at least manually scroll to the message you read last and continue downwards.

That said, a similar problem already exists with edits, and a universal solution for arbitrary insertion should also be able to improve arbitrary edit notifications.


There are two things that I'm pretty sure are true:

- People think that messages are a timeline and are in chronological order. For non distributed messages (Teams, WhatsApp) that's the reality, and people don't read message timestamps. Old messages being put at the bottom is confusing.

- Matrix is distributed so message can arrive out of order (sometimes with quite a delay)

Those two things together means that the client HAS to solve this display problem.


I'd add:

- People think that messages they haven't read will be below ones they have.

which is in direct conflict with your first point. Which of the two is stronger probably depends on the person.

And indeed it's a tradeoff between confusion about chronologicality vs confusion about what's been read, and which is more important will again depend on the person, and the specific discussion. If not reading timestamps is a large concern, there could always be some explicit separator in the log (e.g. a line of "In the past at 2024-12-05 12:34:56"; in my client I have a funny "X (hours|days) earlier..?" when there's a negative time delta between messages, as opposed to a "X (hours|days) later..." on a large positive time delta, though a date would probably be more sensible; and for what it's worth I also have a `/sort` command to reorder the visible log chronologically).


That's something that Telegram always seems to get right, I've never seen messages out of order in different clients, and if I do something like upload a video then immediately send more text messages before it's done, it will shove the video in between the messages where it should be when the upload is done.

I know it's a much harder problem without a central server managing things. But consistency is very important for messages, out of order they could have a very different meaning and be very confusing.


> I know it's a much harder problem without a central server managing things.

In got example it is easy if the structures relating to the video and text contain some what to identify the source node, or just that they belong to the same lineage (you could have a per-thread-per-source-node value, produced from a salted hash of the real information, if source host Id is considered sensitive) and a timestamp taken at that node.

(Caveat: I know little of the specific protocols that are relevant here, so don't know if they do contain any such datum)

Where message ordering gets difficult to the point of impracticality (if not impossibility) is where you are ordering messages from many different sources that may not have fully synchronised clocks. You can make it easier with "in reply to" and "sent after" priorities (in each case, the value being a message identifier) so any given message can be sorted by its context, but the order of sibling messages may still not have a single possible ordering. And you have to decide, if using a "sent after" value, if you have the last message received at the time of sending, the last message received before this message was stated, the latest opened messages, etc, all of which could give different results.

To a certain extent you have to get to a point where ordering is good enough and you give up on it being exact & unambiguously consistent, or you'll spend so much time working out the ordering and have no time to send you own messages :)


i had a hilarious argument with the significant other when my messages appeared a very lame response to messages i didn't receive.

i think the mental model should be what is most useful in court. if a netsplit occurs the state of the room doesn't exist anymore, conversation can continue but it should be a different room populated with working available clients. The main room can be restored and the missed convo can be a 3rd room


I remember when iMessage/Apple Messages did this, back in its first days. Everybody hated it.

My preference would be to avoid even attempting to force all into a single chronology. Instead, imagine something like the output of `git log --graph`, where the network split/rejoin moments are also displayed by lines. It would allow people to tell that two independent conversations were going on, and that certain messages were written while another was not known.

This sounds like a pretty good use case for a consensus algorithm like Paxos or Raft

Those are CP which is impossible in a distributed messaging system where it has to obviously be AP. Otherwise you'd have to guarantee that everyone involved is always online (no partition) to make progress on sending messages!

I think I have this right anyways. (CAP theorem for anyone curious.)


I think the most important property to preserve is causality; that is, if a user sends a message B after they have read (i.e., received) a message A, then B should come after A for everyone, because B depends on A. Basically use a Lamport clock.

It's infuriating how the client must be stateful and have local storage, for both the access_token and the last message recieved. That's right you must remember as the client where the last events [1] you've seen (even if you already told the server to mark it as read) was or else the server will happily send you the same messages over and over again across restarts of your client. I kind of miss making IRC bots where things were much simpler and ... quicker honestly (latency wise).

[1] https://uhoreg.gitlab.io/matrix-tutorial/sync.html#:~:text=w...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: