If you had one that was character based (instead of the weird encoding they tend...

If you had one that was character based (instead of the weird encoding they tend to use), you could directly sample without e.

Though I'm not sure its output would make much sense, and you might have to use beam search (or something like backtracking).

I wonder how you would train a model to directly speak without e. Perhaps you use the general model like above with beamsearch, and then train a new model to directly predict the first models beamsearched-predictions.