Let's say I have a computer that can only store 10 bits worth of information, but generates 100 bits of information. Then obviously the information cannot entirely originate in the computer.
if none terminate on the target bitstring of length 100, then we've eliminated the computation hypothesis for 10 bits and N runtime
if we continue this approach and eliminate all the available storage and time available, then we eliminate the computation hypothesis altogether for our scenario
Yes, I agree there is always a faster Turinng machine for a particular target. With this sort of argument, all bitstrings have a compression of zero bits. I.e. you pick a TM that has the target bitstring as output for the null string. And if it starts with the target on the output tape, there is also zero time. Thus, every bitstring can be generated with zero space and time, with some TM.
So, thats why when talking about compression in general, we keep the reference TM fixed.
When testing, there is the possibility with a single test maybe we have the wrong reference machine. But as the number of tests grows, that probability approaches zero. So, with enough testing we can eliminate the linear speedup theorem loophole, at least with high probability.
As a practical application of this sort of thing, check out the 'normalized information distance'. It is based on algorithmic mutual information, which in theory is not approximatable, as you argue. However, in practice it works surprisingly well, even without prior knowledge of the domain.