Hacker News new | past | comments | ask | show | jobs | submit login

I don’t know if this mega-chip will be successful, but I like the idea. Before I retired I managed a deep learning team that had a very cool internal product for running distributed TensorFlow. Now in retirement I get by with a single 1070 GPU for experiments - not bad but having something much cheaper, much more memory, and much faster would help so much.

I tend to be optimistic, so take my prediction with a grain of salt: I bet within 7 or 8 years there will be an inexpensive device that will blow away what we have now. There are so many applications for much larger end to end models that will but pressure on the market for something much better than what we have now. BTW, the ability to efficiently run models on my new iPhone 11 Pro is impressive and I have to wonder if the market for super fast hardware for training models might match the smartphone market. For this to happen, we need a deep learning rules the world shift. BTW, off topic, but I don’t think deep learning gets us to AGI.




It's also my impression - from my modest exposure to DL over the past two years as a student taking courses - that deep learning must be overcome to reach AGI.

Specifically gradient descent is a post hoc approach to network tuning, while human neural connections are reinforced simultaneously as they fire together. The post hoc approach restricts the scope of the latent representations a network learns because such representations must serve a specific purpose (descending the gradient), while the human mind works by generating representations spontaneously at multiple levels of abstraction without any specific or immediate purpose in mind.

I believe the brain's ability to spontaneously generate latent representations capable of interacting with one another in a shared latent space is functionally enabled by the paradigm of neurons 'firing and wiring' together. I also believe it is the brain's ability to spontaneously generate hierarchically abstract representations in a shared space that is the key to AGI. We must therefore move away from gradient descent.


Don't forget the human brain takes about 7 to 8 hours off every day to rejiggle itself, to use a scientific term. The brain's architecture is better than having a training stage but it's by no means able to continually learn without stops and starts.


You see this in young puppies (3-6 month old) a lot as well. They get irritable/exhausted after 15-30 minutes of training, and usually dont seem to learn anything at all during the training activity itself. Then they pass out ("nap") for 30 minutes and when they wake up they do the trick/skill perfectly.

Same thing as humans, just more obvious/visible.


Commodity deep learning might be a lot closer than you think. Nvidia won't bring us there (without kicking and screaming), but AMD might. You can pick up a Radeon VII for about 600 dollars and use it in your data center without licensing issues (16 GB, about the training speed of a 2080Ti for ImageNet ConvNets). AMD use ROCm (now fully upstreamed into TensorFlow instead of Cuda - https://www.logicalclocks.com/blog/welcoming-amd-rocm-to-hop... ). Disclaimer: I worked on getting ROCm into YARN for Hopsworks.


Deep learning is not going to get us to AGI. But the hardware techniques definitely are going to get us a bit closer.

I did the numbers a while ago and honestly I don't think we need smaller transistors to get the computation volume of our mushy brains -- although ofc, more and smaller transistors is always very nice. I believe the only thing stopping AGI at this point is architecture -- we really have no idea how to connect and structure something as complex as our brains -- and cognitive maturity. The last part is my way of saying "two weeks for training a NN? Wait until you have a kid and have to work on training the little human for decades....".

TBH, the ethical implications of AGI seem insurmountable to me. Life is a game --meaning the universe doesn't care about us, nor we owe anything to it--, and for now, it's our game. So, I would rather we put all that computing to improving human life -- including mind upload,-- and put AGI right there with nuclear weapons.


Cerebras is a reaction to the recent Deep Learning trend. Larger networks, supposedly better performance. As someone doing distributed training regularly, I've seen some super inefficient models that take 3x more resources (time / compute / bandwidth) for a 2% bump. I think we'll see a big wave in NN optimization in the near future.


Artificial 'neurons' used in deep learning networks are absolutely worlds apart from real biological neurons. I don't think anyone in the field seriously believes we will get to AGI via DL or our current models.


> I have to wonder if the market for super fast hardware for training models might match the smartphone market.

Recently Intel acquired Habana Labs for $2B [1] and Intel could possibly integrate this into upcoming CPUs (Intel sells more CPUs than iPhone for sure). However, this was on the inference side - unlike Cerebras which is making training faster. The most likely products to benefit from this would be Azure or AWS.

1. https://newsroom.intel.com/news-releases/intel-ai-acquisitio...


Habana does both training and inference. Gaudi is for training, Goya is for inference.


> I get by with a single 1070 GPU

Amazon/Google/Microsoft will gladly take your money for time on their nVidia GPU instances, but they charge tens of cents per hour.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: