Why harder? My point is, by using some DL framework, you would make this simpler. You don't need to have a big NN for that, or even any NN at all, and could still use some DL framework, and I would assume this is still easier and probably faster. E.g. finding what pixels are a specific color, this would be sth like: