- Using data augmentation to turn the smaller amount of examples into enough samples for appropriate representation within the dataset.
- Add a weighting coefficient to the model's cost function to make misclassifying these examples more expensive.
Note: you can do serious harm to your model with either of these approaches if you don't know what you're doing. The safest solution is to collect more examples of the infrequent class.
Grayscale? Dunno, maybe if they are identical to some other stop light in meaning but have different colors you would include them their so hopefully the NN figures out blue or green (or something). You might be able to copy paste the traffic light over other images as well.