Hacker News new | past | comments | ask | show | jobs | submit login
Understanding Deep Learning (udlbook.github.io)
415 points by georgehill 11 months ago | hide | past | favorite | 99 comments



Most comments here are in one of two camps: 1) you don't need to know any of this stuff, you can make AI systems without this knowledge, or 2) you need this foundational knowledge to really understand what's going on.

Both perspectives are correct. The field is bifurcating into two different skill sets: ML engineer and ML scientist (or researcher).

It's great to have both types on a team. The scientists will be too slow; the engineers will bound ahead trying out various APIs and open-source models. But when they hit a roadblock or need to adapt an algorithm many engineers will stumble. They need an R&D mindset that is quite alien to many of them.

This is when an AI scientists become essential.


> But when they hit a roadblock or need to adapt an algorithm many engineers will stumble.

My experience is the other way around.

People underestimate how powerful building systems is and how most of the problems worth solving are boring and require out-of-the-box techniques.

During the last decade, I was in some teams and I noticed the same pattern: The company has some extra budget and "believes" that their problem is exceptional.

Then goes and hires some PhDs Data scientists with some publications but only know R and are fresh from some Python bootcamps.

After 3 months, or this new team no much was done, tons of Jupyter notebooks around but no code in production, and some of them did not even have an environment to do experimentation.

The business problem is still not solved. The company realizes that having a lot of Data Scientists not not so many Data/ML Enginers means that they are (a) blocked to do pushing something to production or (b) are creating a death star of data pipelines + algorithms + infra (spending 70% more of resources due to lack of knowledge on Python).

The project gets delayed. Some people become impatient.

Now you have a solid USD 2.5 million/year team that is not capable of delivering a proof of concept due to the fact that people cannot do the serving via Batch or via REST API.

The company lost momentum, competitors moved fast. They released an imperfect solution, but a solution ahead, and they have users on it and they are enhancing.

Frustration kicks in, and PMs and Eng Managers fight about accountability. VP of Product and Engineering wants heads in a silver plate.

Some PhDs get fired and go to be teachers in some local university.

Fin.


Would you see these as analogous?

The people who create the models and the people that use them.

The people who create the programming languages and the people that use them.


I think because it's a relatively 'younger' field, there is a bit more need to know about the foundations in AI than in programming. You hit the perimeters a bit more often and need to do a bit of research to modify or create a model.

Whereas it's unlikely in most programming jobs you would need to do any research into programming language design.


I agree with you. Being a corporate department head, I've led exactly one project that's had me digging through my DS&A textbook. But it's much more common to need to go beyond the limits of an off-the-shelf deep learning algorithm. Plus many of the cutting edge deep learning advancements have been fairly simple to implement but required serious effort to create, and being able to understand an Arxiv paper can have a direct impact on the job you're currently working on, whereas being able to read all of TAOCP will make you a better coder, but in a more abstract way.


This sounds like a sell-pitch for an AI scientist.


This sounds like a dont-buy-pitch for an AI engineer...

The point the commenter is making is that both schools of thought in the comments are valuable and unless you perform both roles, i.e. an engineer who is familiar with the scientific foundations, both are symbiotic and not in contention.


I guess this message is delivered by an AI scientist, sure.

It's almost self-exploratory that when you hit a roadblock in practice you go back to foundations, and good people should aim to do both. In that case I don't see where ML engineer/scientist bifurcation comes from except for some to feel good about themselves


Not at all. It's something I've seen in practice over many years. Neither skill set is 'better' than the other, just different.

There is a need for people who are able to build using available tools, but who don't have an interest in the theory or foundations of the field. It's a valuable mindset and nothing in my original comment suggested otherwise.

It's also pretty clear that many comments on this post divide into the two mindsets I've described.


as a friend from at&t dallas told me, tis cheaper to turn a mathematician into a programmer than a programmer into a mathematician.


As someone who missed the boat on this, is learning about this just for historical purposes now, or is there still relevance to future employment? I just imagine the OpenAI eats everyone's lunch in regards to anything AI related, am I way off base?


The most important thing to learn for most practical purposes is what the thing can actually do. There's a lot of fuzzy thinking around ML - "throw AI at it and it'll magically get better!" Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience, but getting good practical working knowledge that's a level deeper is tough without working through it. You don't have to memorize all the math, but it's good to get a feel for the "interface" of the components. What is it that each model technique actually does - especially at inference time, where it needs to be well-integrated with the rest of the stack?

In terms of continued relevance - "deep learning", meaning, dense neural nets trained to optimize a particular function, haven't fundamentally changed in practice in ~15 years (and much longer than that in theory), and are still way more important and broadly used than the OpenAI stuff for most purposes. Anything that involves numerical estimation (e.g., ad optimization, financial modeling) is not going to use LLMs, it's going to use a purpose-built model as part of a larger system. The interface of "put numbers in, get number[s] out" is more explainable, easier to integrate with the rest of your software stack, and more measurable. It has error bars that are understandable and occasionally even consistent. It has a controllable interface that won't suddenly decide to blurt corporate secrets or forget how to serialize JSON. And it has much, much lower latency and cost - any time you're trying to render a web page in under 100ms or run an optimization over millions of options, generative AI just isn't a practical option (and is unlikely to become one, IMO).

I don't have a significant math or theoretical ML background, but I've spent most of the last 10 years working side by side with ML experts on infra, data pipelines, and monitoring. I'm not sure I could integrate the sigmoid off the top of my head, but that's not what's important - I've done it once, enough to have some idea how the function behaves, and I know how to reason about it as a black box component.


Terrific explanation, and it matches my experience running a data science team. I encourage my team to start with the simplest possible approach to every problem, which requires understanding how different algorithms work. Does this project require a t-test, XGBoost, a convolutional neural network, something else? What if we recode the dependent variable from numeric to binary?


> Sources like Karpathy's recent video on what LLMs actually do are good anti-hype for the lay audience

Which video is this?


I believe op means Intro to Large Language Model

https://youtu.be/zjkBMFhNj_g?si=XQQ3p92ajuQYOyqN


Yep, that's the one I meant - sorry, should have linked.

His series on making a GPT from scratch is also great for building intuition specifically about text-based generative AI, with an audience of software developers.


Awesome response, and a reasoned take.


This is about deep learning, of which LLMs are a subset. If you are interested in machine learning, then you should learn deep learning. It is incredibly useful for a lot of reasons.

Unlike other areas of ML, the nature of deep learning is such that its parts are interoperable. You could use a transformer with a CNN if you wish. Also, deep learning enables you to do machine learning on any type of data, text, images, video, audio. Finally, it can naturally scale computationally.

As someone pretty involved in the field, I lament that LLMs are turning people away from ML and deep learning, and following the misconceptions that there’s no reason to do it anymore. Large algorithms are expensive to run, have slow throughput and are still generally poorer performing than purpose built models. They’re not even that easy to use for a lot of tasks, in comparison to encoder networks.

I’m biased, but I think it’s one of the most fun things to learn in computing. And if you have a good idea, you can still build state of the art things with a regular gpu at your house. You just have to find a niche that isn’t getting the attention that LLMs are ;)


I started off being really excited to learn, but as time went on I actually lost interest in the field.

The whole thing is essentially curve fitting. The ML field is essentially an art more than a science and it's all about tricks and intuitions on different ways of getting that best fit curve.

From this angle the whole field got way less interesting. The field has nothing deeper or more insightful to offer beyond this concept of curve fitting.


I've found this fun way to think of it: the goal is to invent a faster form of evolution for pattern recognition, learning, and autonomous task completion. I think one needs to consider it more like biology and a science than pure logic and math. We can discover things that work, and then after that we can study them to learn why they work, just like we don't fully understand the brain yet.

I think there are some really cool problems, such as:

    1. Is synthetic data viable for training?
    2. How do you make deep learning agents that can do task planning and introspection in complex environments?
    3. How do we efficiently build memory and data lookup into AI agents? And is this better/worse than making longer context windows?


Although it fundamentally is curve fitting, I'd venture to say that at some point, having to handle millions of parameters makes the curve fitting problem unrecognizable... A change in quantity is a change is nature if you will.

IOW: to me, fitting a generalized linear model is very different than fitting a convolutional network.


Are deep learning and neural networks just curve fitting? I thought those were significantly different.


You could argue all the building blocks are forms of curve fits, but that isn't a terribly useful statement even if true. If you can fit a curve to the desired behavior of any function, or composition of functions (which is a function) then you can solve any problem you can express the desired behavior of. Including the expressing of desired behavior for some other class if problems. Saying it is just curve fitting is like saying something is just math. The entirety of reality is just math.


By that logic, anything that is predictive is curve fitting, including entire academic fields like physics and climatology. You could say that all automation is curve fitting. I don’t think there’s much to be gained by being that reductive.

From a technical standpoint, it’s not correct analogy either, because it assumes you have a curve to fit. What curve is language? What’s curve is images? No answer, because there isn’t one. Deep learning is about modeling complex behaviors, not curve fitting. Images and language for instance are based in social and cultural patterns and not intrinsic curves to be fit.

At best, it’s an imprecise statement. But I’d disagree entirely.


Highly relevant if you want to work on ML systems. Despite how much OpenAI dominates the press there are actually many, many teams building useful and interesting things.


From an application perspective, it's more important to understand how the overall ML process work, the key concepts, and how things are fitted together. Deep Learning is a part of that. Lots of these are already wrapped in libraries and API, so it's a matter of preparing the correct data, calling the right API's, and utilizing the result.


Someone will dominate the AI as a service marked, but there are so many applications for tiny edge ai that no single player can dominate all of them.

OpenAI is for example not interested in developing small embedded neural networks that run on a sensor chip that real-time detects specific molecules in air.



It's like calculus, nothing new in the last years, is it still important? The answer is still "Yes".

After a glance, looks like too much for one book. Probably it was compressed with the assumption that reader already knows quite a lot. In other words it's not an easy reading.


This would be like learning how your CPU / Memory works, even though JS is eating everyone's (web front-end) lunch.

So yes if you are prompt engineering, and wondering why X works and sometimes it doesn't, and why any of this works at all, it is good to study a bit.


Maybe last week's drama should have been a left-pad moment. For many things you can train your own NN and be just as good without being dependent on internet access, third parties, etc. Knowing how things work should give you insight into using them better.


Which drama of last week are you referring to? The one about the openai guy saying it's all just the data set? Or something else?


Their CEO was fired, hired by Microsoft, took a bunch of people with him, and is now back at the company


Did the "bunch of people" also return (from Microsoft to OpenAI)?


I must have missed the "dataset" news you're referring to, could you elucidate?


i suppose its not news, but there was a long thread about it but it came from this post from june:

https://nonint.com/2023/06/10/the-it-in-ai-models-is-the-dat...


I wonder if using APIs was more of a first to market move


I came here with the same question. After reading and learning these materials, will I have new job skills or AI knowledge that I can do something with?


This book looks impressive. There's a chapter on the unreasonable effectiveness of Deep Learning which I love. Any other books I should be on the lookout for?


This presentation from Deep Mind outlines some foundational ML books: https://drive.google.com/file/d/1lPePNMGMEKoaDvxiftc8hcy-rFp...

For the impatient, look into slide #123. Essentially, the recommendations are Murphy, Gelman, Barber, and Deisenroth.

Note these slides have a Bayesian bias. In spite of that, Murphy is a great DL book. Besides, going through GLMs is a great way to get into DL.


Reality has a well known Bayesian bias…

Joking aside, these slides are excellent! Is there an associated video or course that they were a part of?


No, these were part of a conference held at Tubingen in 2020.


What is a "Bayesian bias"?


I meant the presenter is discussing ML from a Bayesian point of view, which is interesting, but not something you need if your aim is just to understand deep learning.


Yes, it looks very impressive indeed and it has the potential to be the seminal textbook on the subject.

Fun facts, the infamous Attention paper is closing in to reach the 10K citations, and it should reach this milestone by the end of this year. It's probably the fastest paper ever to reach this significant milestone. Any deep learning book written before the Attention paper should be considered out of date, and needs updating. The situation is not unlike an outdated Physics textbook with Newton's laws but devoid of the infamous Einstein's equation of energy equivalence.



I wish it wasn't an X post. Can't see responses at all without an account.


Use nitter to go around X authwalls: https://nitter.net/suhail/status/1728676402864812466


If I start now and start reading up on AI, will I become anything close to an expert?

I'm worried that I'm starting a journey that requires a Master's or PhD.


From reading this book you’d have a very good grasp of the underlying theory, much more than many ML engineers. But you’d be missing out on the practical lessons, all the little tips and intuitions you need to be able to get systems working in practice. I think this just takes time and it’s as much an art as it is a science.


The only guide post to use in this world of ever increasing information to learn is to ask yourself "do i find learning this stuff enjoyable?", questions like "can i become an expert" are vague and not good guideposts.


Very hard to answer without knowing what your goal is. Do you want to be a practitioner of DL, or do you want to be a researcher?


Not the OP, but I’d like to hear you answer and reasoning for the “practitioner of DL” case.


You probably won't become an expert, but I'm not clear why you'd want to!


I spent a decade working on various machine learning platforms at well known tech companies. Everything I ever worked on became obsolete pretty fast. From the ML algorithm to the compute platform, all of it was very transitory. That coupled with the fact that a few elite companies are responsible for all ML innovation, its oxymoronic to me to even learn a lot of this material.


>machine learning platforms

Machine learning platforms become obsolete.

Machine learning algorithms and ideas don't. If learning SVN or Naive Bayes did not teach you things that are useful today, you didn't learn anything.


Agreed. Look at the table of contents of this book. Whatever fundamental machine learning concepts you learned with SVM or other obsolete algorithms is still useful and applicable today.


Nobody is building real technology with either of those algorithms. Sure, they are theoretically helpful, but they arent valuable anymore. Spending your precious life learning them is a waste


>Spending your precious life learning them is a waste

So you really did not learn them.

There is nothing wrong with being user. You don't have to know how compilers work to use compiler. But then you should not say you understand compilers.

In the same way, you probably would benefit from a book "Using deep learning", not "Understanding deep learning".


I know them and am a founder of a vc funded ai startup. Nobody is deploying naive bayes algorithms


> Nobody is deploying naive bayes algorithms

Exactly my point. You are so into user perspective that you think you are arguing against me.


Yes, they’re not deploying them. That doesn’t mean it doesn’t still help to know the fundamentals of the field, especially when you’re trying to innovate.


Yeah. But you didn’t build a plane without knowing physics right?

Nobody deploys a textbook algorithm because everyone knows textbooks algorithms and there are no advantages. So, no, there is real value in learning the fundamentals, dear founder.


This brings up an important question: Is a topic useful to learn if you will never use it in your life?

To attempt answering this question, we can look at LLMs as an analogy. If you include code in the training set for an LLM, it also makes the LLM better at non-coding tasks, suggesting that sometimes learning something makes you also better at other things. I'm not saying the same necessarily applies for learning these "old school" AI techniques, but it's a decently analogy at least.


I started my journey in machine learning fifteen years ago. Ironically, at that time, my professor told me that neural networks were outdated and trying them wouldn't result in publishable research. SVMs were popular and emphasized in my coursework. I concur that SVMs don't hold as much practical significance today. But the progress in AI and ML is generally unpredictable, and no one knows what theory leads to the next leap in the field.


So what? The same fundamental machine learning concepts are still relevant to deep learning.

It's almost like arguing that everything you learned as a Java developer is completely useless when a new programming language replaces it.


Quite a lot of techniques in deep learning have stood the test of time at this point. Also new techniques are developed either depending on or trying to solved deficiencies in old techniques. For example Transformers were developed to solve vanishing gradients in LSTMs over long sequences and improve GPU utilization since LSTMs were inherently sequential in the time dimension.


Sure, but if you were an expert in LSTM, thats nice, you know the lineage of algorithms. But it probably isnt valuable, companies dont care, and you cant directly use that knowledge. You would never just randomly study LSTMs now.


There are plenty of transferrable skills you get from being an expert something that gets made obsolete by a similar-but-different iterative improvement. Maybe you're really good at implementing ideas from papers, you have a great intuitive understanding of how to structure a model to utilize some tech within a particular domain, you understand very well how to implement/use models that require state, you know how to clean and structure data to leverage a particular feature, etc.

Also, being an "expert in LSTM" is like being an "expert in HTTP/1.1" or "knowing a lot about Java 8". It's not knowledge or a skill that stands on its own. An expert in HTTP/1.1 is probably also very knowledge about web serving or networking or backend development. HTTP/2 being invented doesn't obsolete the knowledge at all. And that knowledge of HTTP/1.1 would certainly come in handy if you were trying to research or design something like a new protocol, just as knowledge of LSTMs could provide a lot of value for those looking for the next breakthrough in stateful models.


FYI, LSTMs are not obsolete. They are still the best option in many cases and are being deployed today.


Transformers have disadvantages too, and so LSTMs are still used in industry. But also it's not that hard to learn a couple new things every year.


Highly, highly disagree.

If it became obsolete, then y'all were doing the new shiny.

The fundamentals don't really change. There are several different streams in the field, and there are many, many algorithms with good staying power in use. Of course, you can upgrade some if you like, but chase the white rabbit forever, and all you'll get is a handful of fluff.


Very few things stay the same in Technology. You should think of technology as another type of evolution! It is driven by the same type of forces as evolution IMO. I think even Linus Torvalds once stated that Linux evolved trough natural selection.


So, what fundamental stuff should I learn ? I understand ML has some general principles that keeps on bein valid throughout the years. No ?


What would you recommend someone read instead?


Better to understand the bounds of whats currently possible. And then recognize when that changes. Much more economically valuable


Do you think there's a better way to do this than spending some time playing around with the latest releases of different tools?


Even better: change the bounds of whats possible ;)


> Even better: change the bounds of whats possible ;)

... which will be easier it you have a solid grasp of the foundations of the field. If you only ever focus on the "latest shiny" you'll be lost and left floundering when the landscape changes out from underneath you.


It's very hard to judge a book like this... (based on a table of contents?)

Who is the author ?

Have they published anything else highly rated ?

Are there good reviews from people that know what they're talking about?

Are there good reviews from students that don't know anything ?


I can highly recommend the author. His last book "Computer Vision: Models, Learning, and Inference" is very readable, approaches the matter from unorthodox viewpoints + includes lot of excellent figures supporting the text. I'm buying this on paper!


Some Google-fu for you:

based on a table of contents? You can download the draft of Chapters 1-21 (500+ pages) from the linked site.

Who is the author ? Simon J. D. Prince is Honorary Professor of Computer Science at the University of Bath and author of Computer Vision: Models, Learning and Inference. A research scientist specializing in artificial intelligence and deep learning, he has led teams of research scientists in academia and industry at Anthropics Technologies Ltd, Borealis AI, and elsewhere.

Have they published anything else highly rated ? Author of >50 peer reviewed publications in top tier conferences (CVPR, ICCV, SIGGRAPH etc.) https://scholar.google.com/citations?user=fjm67xYAAAAJ&hl=en

Are there good reviews [...] The book has not been published, this is literally a free draft that you are looking at. The book is listed on Amazon as a pre-order for 85USD.


> based on a table of contents?

The entire pdf is available as a free download on that page. First link at the top.

https://github.com/udlbook/udlbook/releases/download/v1.16/U...


Marcus Hutter on his [Marcus' AI Recommendation Page]: "Prince (2023) is the (only excellent) textbook on deep learning."


Hopefully not a dumb question: how do I buy a physical copy?



The PDF figures for 'Why does deep learning work' seem to point to 'Deep learning and ethics' and vice versa.


No chapter on RNNs, but one on transformers is interesting, having last read Deep learning by ian goodfellow in 2016


RNNs have "lost the hardware lottery" by being structurally not that efficient to train on the cost-effective hardware that's available. So they're not really used for much right now - though IMHO they are conceptually sufficiently interesting enough to cover in such a course.


That is not completely true. There are RNNs with transformer/LLM-like performance. See e.g. https://github.com/BlinkDL/RWKV-LM.

They are less popular, and less explored. But an interesting route ahead.


> RNNs have "lost the hardware lottery" by being structurally not that efficient to train on the cost-effective hardware that's available.

Which suggests two obvious paths forward:

1. Don't bother learning / using RNN's

2. Co-develop new hardware / new RNN architectures that work together to provide great performance per unit of price.

Now of course nobody is saying (well, I am not saying) that (2) would be easy... or even necessarily possible. But somebody should at least be intrigued by the idea. And in the world we live in today where FPGA's and other devices make it easier than ever to experiment with custom hardware architectures... it might be worth taking a stab at it.


Yeah, content looks interesting.


Simply great work and making it freely available is outstanding!!


Reading through it, and it def looks accessible.


lit


All machine learning is Hopf convolution, analogous to renormalization. This should come as no surprise, renormalization can be modeled via the Ising model which itself is closely related to Hopfield networks which are recurrent networks.


That's an interesting point, are there any resources to learn about this? I have a CS background, in that we generally only cover 1st-year physics and very little theoretical math beyond linear algebra, etc.


There is some yeah, there is a recent book on deep learning via renormalization https://arxiv.org/pdf/2106.10165.pdf.

I have a discord https://discord.cofunctional.ai


Don't know any of these terms, but you gave me some interesting topics to google about. Thanks!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: