Have you never tried to use a tool from a cs paper and found that their code didn't even build or barely worked? This isn't a question of rigorous and non-rigorous fields. This is a challenge with artifact evaluation in all fields.
I'd encourage you to read some of these 'papers', there is often nothing to evaluate, nothing to make predictions on. Usually, just a survey and a statistical correlation test.