The paper claims: >To avoid benchmark contamination, we follow Guo et al. (2024)...

godelski · 2024-02-07T00:12:36 1707264756

So I think we're in agreement and I find very little discussion about this within the community (being a researcher myself). This wouldn't particularly bug me if we were saying that the measurements do not distinguish the ability to recall with generalization, but I find that the discussion is always about generalization and AGI, leading to a very confused public.

Unfortunately I'm just not aware of any metric that can adequately quantify meaningful similarity between data. Curse of dimensionality I suppose. Personally I try not to lean too hard on benchmark results not only because the aforementioned spoilage, but due to metric limitations as well. Personally I think our progress has out paced our ability to properly measure and it feels like we've only become more reliant upon them rather than more nuanced in our evaluations (am I alone in this?). I am wondering if this will create a stall or plateau (or even reversal) in practical performance as our measurements become less meaningful as our quality increases. I'm in vision, so a good example is how it is common to think that the L2 distance between a norm layer of a classification network (even if better than InceptionNet) is an accurate measurement of visual fidelity. Or to even think we have such metrics in even special cases (I guess PSNR or SSID are closest but that's more accurately described as reconstruction quality).

Btw, I think you might like the second paper I linked. It's a META/Stanford paper and mostly deals with vision (LAION) but a bit with C4. The short of it is that they can prune about 40% of LAION and still get good "Zeroshot" ImageNet accuracy. I actually found the results for random pruning quite enlightening, especially around all the toy datasets (Fig A4).

Zeroshot in quotes because it's pretty dubious to call ImageNet out of distribution (same with COCO) when a model is trained on LAION considering all the classes (at least an abstracted version of the class since LAION is more specific. i.e. ImageNet _distribution_ ⊂ LION _distribution_).

Another pet peeve of mine is arxiv links direct to PDF ;)