To add on to this - they "specifically consider the case
of adding new nodes to pre-train a stacked deep autoencoder", by basically keeping track of when certain layers cannot reproduce their input and then adding more nodes+retraining with both new (not reproduced) and old data. It is quite intuitive, basically the most naive and obvious first attempt at the problem (not meant in a condescending way, just want to point out it's not that generalizable and is pretty ad-hoc).