Open-Sourcing Bit: Exploring Large-Scale Pre-Training for Computer Vision

lsb · on May 22, 2020

I've been going through the fast.ai course, and pre-training is like witchcraft, it's so spookily effective.

You can take a 15MB mobilenet model, add a layer at the end, fine tune with half a dozen examples of a few different image classes (in a few minutes on a consumer grade laptop), and recognize lots of different examples in real time with a web app reading continuously from a webcam.

The advances made in Computer Vision in the last ten years are mind blowing.

ricklamers · on May 22, 2020

Interesting experiments. Too bad they're not releasing the JFT pretrained models. I guess the cutoff point of what's too valuable to share has been reached.

gwern · on May 22, 2020

GB has been pretty good about releasing models (especially compared to, say, DeepMind), such as EfficientNet.

JFT is the exception. I find JFT interesting so I pay close attention to anything using it, and as far as I've noticed, no model has ever been released based on JFT, going back to 2015 at least when it was much smaller. It's always either held back or the released model is based on public datasets (eg BigGAN - released G was on ImageNet though the paper notes that the JFT BigGAN completely avoided divergence problems, which is very interesting). I've wondered if legal/copyright issues block any release: there's always someone who tries to argue that a model is a derived work, and nothing in the JFT-300M papers mentions having licenses covering public redistribution.

jcjohns · on May 22, 2020

I don't think Google has ever released models trained on JFT. But if you're interested in large-scale vision models, you can check out these models from Facebook trained on 940M Instagram images (several times bigger than JFT!)

https://github.com/facebookresearch/WSL-Images

londons_explore · on May 21, 2020

No comments, probably because everyone is trying to fire up the demo code in colab and trying to make "whose dick is it?" classifiers...

fxtentacle · on May 21, 2020

Well for my humble needs, the first layers of a pretrained VGG16 were already good enough, so I have little use for yet another even more resource hungry visual encoder.