They could have done that but then the data wouldn't have been the same as it would be when they were trying to solve real CAPTCHAs. The OpenCV part where they found the characters in the CAPTCHA leads to some messiness in the training data, which will also be there in the 'real' CAPTCHA data when the system is tested. I'd say training the model on this messy data would lead to better results, especially for the case where the letters overlap.