Hacker News new | past | comments | ask | show | jobs | submit login

What is the best practice for handling infrequent data points like the blue stoplights he mentions?



Two that come to mind are:

- Using data augmentation to turn the smaller amount of examples into enough samples for appropriate representation within the dataset.

- Add a weighting coefficient to the model's cost function to make misclassifying these examples more expensive.

Note: you can do serious harm to your model with either of these approaches if you don't know what you're doing. The safest solution is to collect more examples of the infrequent class.


Grayscale? Dunno, maybe if they are identical to some other stop light in meaning but have different colors you would include them their so hopefully the NN figures out blue or green (or something). You might be able to copy paste the traffic light over other images as well.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: