The only way to properly replicate a CS paper is to re-implement the code from scratch. Simply running someone's code again is most likely just going to give you the same buggy output they got (or just as likely a bunch of unrelated compile errors). But frequently, that isn't even what you want. Many CS papers are of the form "we were able to build software that does XYZ using this design", which isn't really a falsifiable statement in the first place. It just serves to give future researchers and practitioners data points they can use when building their own software
That's true but it also allows me to look for selection bias in their data.
If I see a paper claiming remarkably good predictive capabilities of (say) the performance of a basic block, and no discussion of it's flaws you bet I'm assuming they didn't test it well enough.