Will the challenge be that you cannot hook to low enough UI/human interaction events to build an alternative for DOM? E.g. Paste menu is impossible to do (needs CTRL+V in Google docs for Firefox), cannot get information about keyboard typing mode in multilingual setups and so on.
IIRC Flipboard once tried this approach, rendering everything to a <canvas>. As it stands right now, that will never be a viable solution for the reasons you mentioned, along with the fact you throw literally all accessibility features out the window.