I'm not so sure. Gradient Descent refers to reaching the best solution available. Xkcd seems to be referring to more like the Bonferroni principle where you are looking for patterns without a hypothesis and post facto justifying it.
I think the point is about how data manipulation is about 80% of machine learning work. If an algorithm is giving you crap results for some data set, once you've fiddled with its hyperparameters through cv and so on there's not much you can do besides data manipulations like PCA, ICA and the like, to try and get a better result. Most algorithms work pretty badly with raw data.
I'm guessing that, in the sciences, that is a big no-no. Imagine if doctors, seeing that a new drug being trialled is failing to cure a disease, simply started chucking out the sick subjects until all the ones that were left where healthy, declaring the sick ones to be "noise" and the trial a success. Somehow, I don't think that would fly...
This is confusing "cherry picking" (selecting some data and discarding other data) with "data transformation". Transformations are applied for perfectly good reasons, such as dimensionality reduction, or to enable the subsequent application of certain statistical tests that rely on assumptions not valid for the data in its original form.
> I think the point is about how data manipulation is about 80% of machine learning work. If an algorithm is giving you crap results for some data set, once you've fiddled with its hyperparameters through cv and so on there's not much you can do besides data manipulations like PCA, ICA and the like, to try and get a better result.
Data manipulation is different from data transformation. Manipulation changes the nature of data, transformation does not change the nature of data. PCA is data transformation, not data manipulation.
Tell me what you mean by "the nature of data" and I'll tell you if I agree with your definition. I don't see why the dimensionality of a dataset is not its "nature" for example.