It is not the ML model that is updated with information, but the predictive modeler herself is updated. She now finds parameters that make the model perform well on that specific test set. This gives you overly optimistic estimates of generalization performance (thus unsound science, and, in business, it is better to report too low performance, than too high, because a policy build on a model that is overfit like this can ruin a company or a life). For smarter approaches to this problem, see the research on reusable holdout sets.