Sub 1-bit has been done at least as far back as 2016 for VGG style networks (my work).
I was able to get 0.68 "effective" bits.
The idea is that in each forward pass you add noise to each weight independently drawn from normal distribution, and when you calculate snr, it's sub 1 bit. Points to the idea that a stochastic memory element can be used.
I was able to get 0.68 "effective" bits.
The idea is that in each forward pass you add noise to each weight independently drawn from normal distribution, and when you calculate snr, it's sub 1 bit. Points to the idea that a stochastic memory element can be used.
https://arxiv.org/abs/1606.01981