But if diagnosis are multimodal and rely upon large, multidimensional analysis of symptoms/bloodwork/past medical history, wouldn't adding more dimensions just increase dimensional sparsity and decrease the useful amount of conclusions you are able to draw from your variables?
It's been a long time since I remember learning about the curse of dimensionality but if you increase the amount of datapoints you collect by half you would have to quadruple the amount of samples you have to retrieve any meaningful benefit, no?
I did mean samples (n size) not the number of features. But also, no your point isn't right. If you have a ton of variables, you'll be better able to overfit your models to a training set (which is bad). However, that's not to say that a fairly basic toolkit can't help you avoid doing that even with a ton of variables. What really matter is the effect size of the the variables you're adding. That is, whether or not they can actually help you predict the answer, distinctly from the other variables you have.
Stupid example: imagine trying to predict the answer of a function that is just the sum of 1,000,000 random variables. Obviously having all 1,000,000 variables will be helpful here, and the model will learn to sum them up.
In the real world, a lot of your variables either don't matter or are basically saying the same thing as some of your other variables so you don't actually get a lot of value from trying to expand your feature set mindlessly.
> if you increase the amount of datapoints you collect by half you would have to quadruple the amount of samples you have to retrieve any meaningful benefit, no?
I think you might be thinking about standard error. Where you divide the standard deviation of your data by sqrt of the number of samples. So quadrupling your sample size will cut the error in half?
It's been a long time since I remember learning about the curse of dimensionality but if you increase the amount of datapoints you collect by half you would have to quadruple the amount of samples you have to retrieve any meaningful benefit, no?