Great question. Right now I'm working mostly on intuition, though I certainly could formalize the process.
One thing I'm planning to do for my current project is to make the solution match the linearized solution for short times. While I haven't proved this, it seems obvious to me that particular terms which are being approximated would need to have the same linearization as the original term. (Incidentally, the approximation I used in my PhD did not have this property, but in that case I wasn't interested in short times.)
Otherwise, the process is going to be trial and error. I'll look through the catalog of solveable nonlinear ODEs, try to fit my square peg into a round hole that might be plausible and see if it works.
If I get multiple possible approximate solutions I'll compare them in terms of accuracy (compare with the numerical computation) and ease of writing the solution out. Some solutions will be more irritating than others. Some could be longer than others. I also would prefer an explicit algebraic equation as the solution to an implicit equation. The case in my PhD started out as an implicit algebraic equation which I later was able to write explicitly through the use of a special function (which fortunately is implemented in Python, Matlab, etc.).
Most likely my new project will be converted to a single nonlinear autonomous ODE. The question then is which autonomous approximation is best.
Also: The approximation I used in my PhD was only valid for a particular range of values of one parameter. It took some physical intuition to recognize the approximation, which I later realized other people had used long before me in different contexts.