I am very skeptical about pretraining which seems to be the key point of Nanonets. Sure, it will help to work work better than initialization from random weights, but, you will always do better if you collect more data for your problem. This may be fine for problems which do not need optimal classification and fast performance, but I am struggling to see any use case for that.
There is need of custom model in a lot of businesses like you want to identify only a specific kind of product from rest of similar looking ones or find only defective pieces and where you cannot collect 10's of thousands of images from the beginning. Also pretraining is not only for initialization but also to improve generalization with less data.