The difference between interpolation and extrapolation is almost the most important concept in all of machine learning practice.
It's infinitesimally rare (from what I've seen so far) that a practical machine learning model can perform high quality extrapolation, for many different metrics of quality.
There's almost always far, far too many confounding variable.
Depends on how you define interpolation and extrapolation.
In high dimensional spaces basically everything is extrapolation including in pixel space and embedding space.
> The notion of interpolation and extrapolation is fundamental in various fields from deep learning to function approximation. Interpolation occurs for a sample x whenever this sample falls inside or on the boundary of the given dataset's convex hull. Extrapolation occurs when x falls outside of that convex hull. One fundamental (mis)conception is that state-of-the-art algorithms work so well because of their ability to correctly interpolate training data. A second (mis)conception is that interpolation happens throughout tasks and datasets, in fact, many intuitions and theories rely on that assumption. We empirically and theoretically argue against those two points and demonstrate that on any high-dimensional (>100) dataset, interpolation almost surely never happens. Those results challenge the validity of our current interpolation/extrapolation definition as an indicator of generalization performances.
> The location of decision boundaries inside the convex hull of training set can be investigated in relation to the training samples. However, our analysis shows that in standard image classification datasets, all testing images are considerably outside that convex hull, in the pixel space, in the wavelet space, and in the internal representations learned by deep networks. Therefore, the performance of a trained model partially depends on how its decision boundaries are extended outside
the convex hull of its training data.
The problem with the second paper is that having a "test set" is meaningless when you can access that test set during training - "you" as in the researcher, who develops a system. This is especially so for machine vision datasets like the ones in the paper that have been "done to death". Basically anyone who has access to the test set of a popular benchmark and wants to get their paper published will do everything that can be done to ensure their system does well on the test set.
That is a big flaw in machine learning research in general, but that's for
another conversation, I guess. My point above is that if neural nets could
generalise well, they wouldn't need so much data. In a sense, even if trained
neural net models can generalise to instances outside the dense region of
instance space circumscribed by their training set that is not that important,
if that region has to be gigantic for this generalisation to be possbile in the
first place. For one thing, at that point it becomes difficult to separate what
is "training" and what is "test", especially so when test sets are four times
the size of training sets as in typical practice.
I think that the parent comment is referring to extrapolation in the semantic space (ex: use brown cats as training data and see if the ML algorithm can recognize albino cats).
Edit: or take photos of brown cats indoors from the front and see if the model recognizes albino cats from the side outside.
You still have to deal with the precise definition of interpolation vs. extrapolation.
And it does not matter what space you are using as long as you operate under the convex hull definition of interpolation vs. extrapolation you will need exponentially more samples as intrinsic dimensionality of the space increases.
This means that even under the manifold hypothesis, as long as intrinsic dimensionality is reasonably high i.e. in low hundreds, models will be doing extrapolation.
I think we've just jumped from my "stochastic software engineering tip of the day" to a postgraduate level examination of validation and test set decision boundaries vs dataset boundaries.
The thing is also, any individual hyper-dimensional case can be outside of the training set's convex hull itself and be correctly classified.
However, you would still have to quantify what the relationship of the dimensions with the highest feature importance were to said space. Which is why the second paper is so fascinating.
From the perspective of issues product/engineering teams face in the field, I'd definitely maintain that fire alarms should start sounding once you see any sort of extrapolation and you should dive deeper.
Unfortunately the maturity level of this space is still at the point where peer review of data-set transformations before deploying to production and committing Jupyter Notebooks to GitHub is a heated in-office discussion.
The majority of the commercial world is a long way from that kind of best practice.
The difference between interpolation and extrapolation is almost the most important concept in all of machine learning practice.
It's infinitesimally rare (from what I've seen so far) that a practical machine learning model can perform high quality extrapolation, for many different metrics of quality.
There's almost always far, far too many confounding variable.