If I get it right, MQTT is just a PubSub protocol, i.e. it essentially carries the messages, but doesn't care about their contents.
Is there a standard on how chat layer (the content of published messages) is implemented? (I haven't looked into source code, but guess it's not just a plain text, but it has some structure to it.)