Usually when someone says “synthetic data” they mean CGI, not simply transformat...

TeMPOraL · on May 12, 2019

I thought "synthetic data" it's something that rarely shows in training image recognition, and is more like randomly generated user data (name, surname, etc.) or data generated from simulations of some processes?

johnnycab · on May 12, 2019

>I thought "synthetic data" it's something that rarely shows in training image recognition

On the contrary, it is used to train models but it cannot adequately capture the long tail of weird events in the real world. Hence, it cannot be relied upon, as alluded to by the parent commenter. With reference to using data collected from a simulated environment vs real world ─ this subject was discussed at some length by Elon Musk and Andrej Karpathy, at the Tesla Autonomy Day event a few weeks ago.

skybrian · on May 13, 2019

I generally agree that there is no substitute for experience running in production and more data is better - or at least should be, if you can figure out how to take advantage of it.

The thing is, when it comes to weird events, historical data can't be relied on either. The next weird thing may never have happened before.

Predicting the future is hard no matter what you do. Gathering more data and learning more efficiently from what you have are both important. Training on artificial challenges can also be useful.