Hacker News new | past | comments | ask | show | jobs | submit login
A non-technical explanation of deep learning (parand.com)
304 points by tworats on April 25, 2023 | hide | past | favorite | 130 comments



Does stuff like this help anyone?

I still haven’t forgiven CGP Grey for changing the title to his 2017 ML video to “How AIs, like ChatGPT, learn”. The video is about genetic algorithms and has nothing to do with ChatGPT. (or with anything else in modern AI)


I read this to see if it would be useful to share with my 9 year old. After reading it, I think it is not any more useful (alone) than watching the 3b1b video on this topic. The video is longer, but has more visualizations.

I think that perhaps reading this description after watching the video might make the process more memorable. My guess is that if I had my daughter read this first, it wouldn't do much to make the video easier to parse. Reading this real-world example after watching the video could help solidify the concept.

Disclaimer: I don't know a lot about AI/ML, so it's possible that I am 100% wrong here!


> I think it is not any more useful (alone) than watching the 3b1b video on this topic

This one? https://youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_...


Yep! I've only watched the first two in the series so far.


Sorry if this is a personal question, but why would you get a 9 year old into machine learning ?


Fair question. She's very curious about all sorts of things and always wants to know how they work. I also assume she'll find out about ChatGPT in the next year or two, at school. I figure she will probably ask me about how chatbots work, whether they are actually smart, etc. So for now I've been keeping an eye out for explainers that would help her understand things as I've been learning myself.

Sorry if this is a personal question, but how did you choose your username?


Why not? Get them into a little bit of everything, and let them dig further into the topics that they find exciting.

Kids are just as capable as most adults (if not more). Give them a foot in the door and they have all the time in the world to build in that knowledge.


You could just let your daughter see it. To what extent can you "protect" her exposure to the world?


Huh? It’s about efficiency and not wasting time on something that’s not very useful. Should she see A and B, both (in what order), or neither? That’s the question.


I've barely forgiven him for explaining genetic algorithms and acting like they have any relevance to contemporary ML research.

The footnote video was an alright explanation of backprop. If that were part of the main video that would have been reasonable.

I really like his history/geography videos but anything technical leave a lot to be desired. And don't get me started on Humans Need Not Apply.


> And don't get me started on Humans Need Not Apply.

Well now you have to tell us. :) Many of the concrete examples in that video are exaggerated and/or misunderstood but the general question it asks - what to do when automation makes many people unemployable through no fault of their own - seems valid.


> what to do when automation makes many people unemployable through no fault of their own - seems valid

Unfortunately the video doesn't answer its own question directly.

The answer for the past 40 years or so seems to be "move them to lower-paying service jobs, or out of the job market entirely."


Another part of the answer over the last 40 (or 200) years, is to repeatedly create totally new industries that employ lots of people, including a large fraction of HN readers.


Yeah, but what does the Venn diagram look like there?

Yes, new technologies create new jobs.

But it’s not usually the people from the old jobs who are taking those new jobs.


> it’s not usually the people from the old jobs who are taking those new jobs

https://news.mit.edu/2022/automation-drives-income-inequalit...


That can be terribly hard on people, while great for other people. What would you suggest would be better?


A progressive income tax that does not exclude or favor capital income, so that funding ideally targeted transitional assistance, UBI (with a rate ratcheting up with sustained increases in per capita revenue), or, ideally, both so that the adverse effects of labor market shifts which shuffle or concentrate labor demand or shift from labor-intensive to capital-intensive methods are buffered.


How do you prevent UBI being captured by landlords in rent, and similar non-discretionary spending?


> How do you prevent UBI being captured by landlords in rent

Progressive taxation + UBI compresses income ranges, rather than adding it, there’s not really anything to capture unless the market was sharply segmented before and after the change, its dubious even before UBI and to the extent “lumpiness” in income distribution might allow market segmentation, UBI reduces that and weakens capture opportunities.


Competition, both between similar units in similar places and between alternative arrangements. Maybe having a roommate or living with my parents isn't worth it for $X/mo but it is for $2X. Certainly it is for some X for most people. Also the presence of a UBI directly increases "housing" in the sense we mean. We don't really have any shortage of housing - we have a shortage of attractive housing, where a huge part of "attractive" is "sufficiently close to sufficient income" and a UBI makes more things sufficient. And for those in the cities, by elevating the bottom of the income distribution we make developers more interested in building the affordable housing we need.

Will some of the increase still be captured? Probably. More in some areas than others. Will we be right back to square one (or worse)? I really don't think so.


That is an interesting argument: increasing rent might make it financially attractive to move in with roommates or family even if you could afford the rental.

I'm not sure that's always an option if you have to live near your workplace, but it could shake things up a bit.


Yeah, it's certainly not always an option, but it just needs to be enough of an option for enough people in order to put some downward pressure on rents (compared to full capture of the increased income).


In the current US system, much of income is captured by rent and health insurance, regardless of where it comes from.

But I share the same concern that UBI would immediately be added to rent.

And that it might combine poorly with existing perverse incentives:

"Across the nation, landlords with units in poor neighborhoods average nearly $100 a month in net profit, compared to about $50 in affluent neighborhoods, and just $3 in middle-class areas."

https://www.bloomberg.com/news/articles/2019-03-21/housing-e...


Tax robots. At a minimum, at the highest income tax rate bracket.


How? What should a Roomba owner have to pay? Or a dishwasher owner? Or someone who has a calculator?

Computer used to be a job description, do we all owe?

Where would you draw the line?

How would you enforce it?

Edit/Append: Also, how do you calculate the income to be taxed?


> What should a Roomba owner have to pay?

The same amount that someone with a stay-at-home spouse who cleans the house pays.


Robots don’t have income.

Are you going to pay this tax for your washing machine, dishwasher, car, word processor, electric lightbulbs? The first two are robots, and all of them put a lot of people out of work.


I find this discussion fascinating, but YouTube is one of the last places I’d go for that discussion, unless it’s a debate between two highly-regarded minds on the topic (like Chomsky vs Foucault back in the day). I’m not very interested in listening to random people tell me their ideas without any good pathways for critiques or questions.


Humans Need Not Apply is one of the most phenomenal videos on YouTube, what do you think is wrong with it?


What a strange word to use in that context, why would he need to be forgiven by you? How has he wronged you? Seems at worst, an honest mistake in a complicated topic.


You're not using "at worst" correctly. What you describe is an "at best". Worse would be that CGP Grey deliberately picked a misleading title in order to optimize views, algorithm, etc.

This is, I think, the case. But I don't begrudge them too much, YouTube is cutthroat.


Maybe GP is a non-native English speaker? This construct would be pretty common way for a native French speaker to say they are angry at something. Not sure if it's common in English as well.


This is a pretty common phrase in English as well, it is not meant to be taken literally.


Yep, it may be more common in some English-speaking cultures, but in the midwest of the US it's extremely common to say stuff like, "I'll never forgive him for <thing>" and it's not meant to be taken literally. A more literal translation would be, "I'm very disappointed in this person's decision to <thing>."


The interesting things that one can learn about English usage on HN. ;-)


I have met people who think they understand a particular topic I am versed in, but actually don't. Similarly, I am often wary that I get superficial knowledge about a topic I don't know much about through "laymen" resources, and I doubt one can have an appropriate level of understanding mainly through analogies and metaphors. It's a kind of "epistemic anxiety". Of course, there are "laymen" books I stumbled upon which I think go to appropriate levels of depth and do not "dumb down" to shallow levels the topics, yet remain accessible, like Gödel's Proof, by Ernest Nagel. I'd be glad to read about similar books on all topics, including the one discussed in this thread.

Knowledge is hard to attain...


I find the best way to learn technical topics is to build a simplified version of the thing. The trick is to understand the relationship between the high level components without getting lost in the details. This high level understanding then helps inform you when you drill down into specifics.

I think this book is a shining example of that philosophy: https://www.buildyourownlisp.com/. In the book, you implement an extremely bare-bones version of lisp, but it has been invaluable in my career. I found I was able to understand nuanced language features much more quickly because I have a clear model of how programming languages are decomposed into their components.


I find the best way to learn technical topics is to build a simplified version of the thing. The trick is to understand the relationship between the high level components without getting lost in the details. This high level understanding then helps inform you when you drill down into specifics.

I agree but that's a good guide to build a technical understanding of a complex subject, not sufficient-in-itself tool set for considering questions in that complex subject.

Especially, I'll people combining some "non-technical summary" of quantum-mechanics/Newton Gravity/genetic engineer/etc with their personal common sense are constant annoyance to me whenever such topics come here.


> constant annoyance to me whenever such topics come here.

I'll say that thinking about things at the edge of my understanding, where "Eureka!" moments are low hanging fruit, results in the highest dump of dopamine out of any other activity. Having silly fun speculating (and I make it clear when I am) my way through some thought process is literally the most fun I can have. Seeing those types of conversations, full of genuine curiosity, thoughtful speculation, and all the resulting corrections/discussions/insight, etc, is why I love HN so much, and I hope it's always a place to nerd out.

One mans trash is another mans treasure, I suppose. :)


Thanks for the link. Could you give an example of something you learned better/easier after having implemented a simplified version?

Side question: Is there a entry level build your own language model or gan type learning tool out there as well?


There are a million e.g. number parsing (image to digit) neural network type programs on GitHub. Go pick one in your preferred language and break it apart, and rebuild it, looking up the concepts behind parts you don't understand. After you finish up with the above, look up 'the xor problem' to see a common practical problem (which creating a network to replicate xor illustrates, rather than is) and you'll be well on your way to a nice fundamental understanding, built from the ground up.

One of the most interesting things about this topic is that the fundamental concepts and implementations are all really simple. It's the fact that it actually works that's mind boggling. In any case, the above is not a months like affair - but like one week of dedicated work.


I've noticed that the learning curve stary fairly flat when it comes to understanding weights, and layers, and neural networks, heck, even what gradient descent is for... but then when it comes to actually understanding why optimization algorithms are needed, and how they work, things just spiral into very hard math territory.

I do think that maybe it feels inaccessible because we transition from discrete concepts easily digestible by CS grads into some complicated math with very terse mathematic notation, yet the math might not be as hard if presented in a way that doesn't scare away programmers.


> an appropriate level of understanding mainly through analogies and metaphors

I think it's actually worse than that - somebody who doesn't know actually realizes that he doesn't know, but somebody who _thinks_ he understands through analogies and metaphors will confidently come to the incorrect conclusion and then argue with somebody who actually does understand the topic - often managing to convince innocent bystanders because his reasoning is easier to grasp and the so-called expert seems to be getting more and more flustered (as he tries to explain why the analogy is actually correct, but oversimplified).


I am fascinated by this phenomenon, and the double-edged sword that metaphors are.

On the one hand they're jargon used as short hand to technical concepts understood well by domain experts. And the concision they afford can lead to deeper understanding as they transcend their composite or adapted meanings and become base terminology in and of themselves (I think of e.g. Latin in English legal terminology. "Habeas corpus" has a literal meaning when translated, but the understood jargon has a deeper, and more specific meaning). At that point, they are powerful because of the precision of meaning and concision of expression they afford.

On the other hand, they lift intuitive terminology from a base language that is understood in vaguer terms by a broader audience. And this creates invisible disconnects because the abstraction created by these terms leaks like a sieve unless you know the precise semantics and have the model to use them.

By translating a discourse into a higher metaphoric level, we increase precision and efficiency amongst mutual understanders, but at the same time, we increase the level of ambiguity, the number of possible interpretations, and the availability of terms familiar to (and thus, handles to grab on to) non-understanders. And that latter situation allows non-understanders to string together what sound superficially like well-formed thoughts using jargon terms, but based on the base language semantics. But without the deeper knowledge required to understand whether a given utterance scans or not.

That's how I've been trying to wrap my head around it at least. I hope it doesn't sound like moralizing or condescension, I don't mean it to. I know I'm "guilty" of trying to manipulate metaphoric models that I don't actually understand, based on the lay-semantics of their jargon.


this is how we teach students in all discipline. displaying a model and then offer better one, again and again.


You're basically describing a lot of generative AI developers who are applying their technology to fields they don't really understand


Can you please suggest other books similar in spirit to the Nagel book? Would love to read some over summer


> This is how neural networks work: they see many examples and get rewarded or punished based on whether their guesses are correct.

This description more closely describes reinforcement learning, rather than gradient based optimization.

In fact, the entire metaphor of a confused individual being slapped or rewarded without understanding what's going on doesn't really make sense when considering gradient optimization because the gradient wrt the to loss function tells the network exactly how to change it's behavior to improve it's performance.

This last point is incredibly important to understand correctly since it contains one of the biggest assumptions about network behavior: that the optimal solution, or at least good enough for our concerns solution, can be found by slowing taking small steps in the right direction.

Neural networks are great at refining their beliefs but have a difficult time radically changing them. A better analogy might be trying to very slowly convince your uncle that climate change is real, and not a liberal conspiracy.

edit: it also does a poor job of explaining layers, which reads much more similar to how ensemble methods work (lots of little classifiers voting) than how deep networks work.


Well said re: gradient optimization vs. "getting slapped". However, note that since NN optimization is almost always nonconvex, we are NOT guaranteed to arrive an a optimal (or even close-enough) solution. A major limitation of gradient based optimization on nonconvex problems is that they are very susceptible to getting trapped in local minima.

But, for now it's the best tool we have, so we just have to hope that we get close enough, or just empirically run lots of times to find the best local minimum we can. Incidentally, this actually is more like a brute-force approach, but at the ensemble level, which is quite different than the article means it.


If anyone is looking for a quick overview of how LLMs are built, I highly recommend this video by Steve Seitz: https://www.youtube.com/watch?v=lnA9DMvHtfI.

It does an excellent job of taking you from 0 to a decent understanding without dumbing down the content or abusing analogies.


This really was excellent and just what I was looking for to explain what LLMs are to non-CS people. Thanks!


Funny. In the game black&white you would slap or pet your avatar to train it. The lead AI programmer on that was Demis Hassabis of deepmind fame.


The description made me think of Black & White as well. I still have memories of smacking my creature around every time he ate someone.


Somehow he knew AI would be our Gods.


Actually, the god is the AI trainer. The AI is the god tool and us poor villagers are forced to worship the first.


False idols maybe


I have a few funny analogies that I think kind of work.

1. "gradient descent" is like tuning a guitar by ear and listening to the beat frequencies ("loss") and then decreasing these by tuning a string up or down.

2. the best I can come up with for "backpropagation" is to imagine a clever device that can tirelessly optimize a Rube Goldberg machine for you but as a science, not an art.


I love this, but Im always confused in these kinds of analogies what the reward / punishment system really equates to...

Also reminds me of Ted Chiang warning us that we will torture innumerable AI entities long before we start having real conversations about treating them with compassion.


Don't love it, it's not correct.

> what the reward / punishment system really equates to

Nothing, and least as far as neural network training goes. This is an extremely poor analogy regarding how neural networks learn.

If you've ever done any kind of physical training and have had a trainer sightly adjust the position of your limbs until what ever activity you're doing feels better, that's a much closer analogy. You're gently searching the space of possible correct positions, guided by an algorithm (your trainer) that knows how to move you towards a more correct solution.

There's nothing analogous to a "reward" or "punishment" when neural networks are learning.


>There's nothing analogous to a "reward" or "punishment" when neural networks are learning.

Well deep reinforcement learning.


Yeah but even in that case, "reward" is just the thing a NN is trying to predict. The NN itself is not receiving the reward (or any punishment). Instead, it's following gradient signals to improve that estimate of reward, which is then used as a proxy for an optimal policy decision.


> what the reward / punishment system really equates to

Well, in the article, it says the punishment was a slap. On the other hand, he just says "she gives you a wonderful reward"... so you're left to use your imagination there.


Totally aware that this isn't a fully formal definition of deep learning, but one interesting takeaway for me is realizing that in a way, corporations with their formal and informal reporting structures are structured in a way similar to neural networks too.

It seems like these sort of structures just regularly arise to help regulate the flow of information through a system.


There is research claiming the entire universe is a neural network: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712105/


You can find theoretical physics research claiming a lot of bonkers things that almost certainly are not true, that's sort of the nature of the field.


That's the sort of blue sky research I'm glad exists.


Indra's Net is, in fact, a neural network


Frankly, stuff like this makes me more skeptical of the ML community. Remember when people thought Brains were just really complicated hydraulic systems?


Vanchurin is a physics professor, but ok https://twitter.com/vanchurin?lang=en

I actually think that interdisciplinary work like this can shed light into common structures across physics, biology, neuroscience, CS, etc. If anything, I wish there were more attempts to explore the connections between these disciplines.


The author is basically a crank, it looks like they held some teaching positions in an unrelated subject at various universities yielding the “professor” title, and originally studied/published actual research in cosmology decades back.

Might as well be skeptical of the math community because of circle squarers


> The author is basically a crank

Really not sure why do you say this. Is it something personal against him? Are you against people exploring weird interdisciplinary topics?

He has tons of publications [1], including co-authorship with people like this guy: https://en.wikipedia.org/wiki/Eugene_Koonin

Koonin is certainly not a crank, and is a rather well-known interdisciplinary researcher at the intersection of biology and physics.

[1] https://scholar.google.com/citations?hl=en&user=nEEFLp0AAAAJ...


Uh,

The similarity of corporations and neural nets is pretty much only that both are information processing systems. An operating system or missile guidance system is far more like a corporation than a neural network.

Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.


Plenty of ways to think about this stuff. IMO neural networks don’t inherently do anything, it’s just a data structure.

Different ways you can interact with that data structure can however provide meaning and store information in the weights etc.


    > Neural networks have no memory and generally don't seek particular goals, they simply recognize, predict and generate similar instances.
Sounds exactly like every corporation I've ever worked in.


Neural networks do have a memory of sorts, that’s why they improve with node size. A mathematician proved something along these lines recently.

The memory isn’t digital bits like we think of now though, but abstractions in higher dimensions.


The problem with deep learning is opposite. You can understand most of it with just high school math. Advanced math is mostly useless because of the dimensionality of neural nets.


> Advanced math is mostly useless because of the dimensionality of neural nets.

It depends what you mean by advanced math. There is a lot of math that only really comes into play because of the high dimensionality! For example math related to tensor wrangling, low rank approximations, spectral theory, harmonic theory, matrix calculus derivatives, universality principles, and other concepts that could be interesting or bewildering or horrifying depending how you react to it. Of course some of it is only linear algebra of the 'just high school math' kind but that's not how I would normally describe it. If you look at the math in the proofs in the appendices of the more technical AI papers on arxiv there is often some weird stuff in there, not just matrix multiply and softmax.


Yes but do you have examples of "higher" math not being just a curiosity and actually making it into real world models and training algorithms?


Well I suppose that in some sense you are right. You can do deep learning without even knowing any math at all, by plugging together libraries and frameworks that other people wrote.

Also maybe you will say that "higher" math is by definition a curiosity and if it's practical then it's not "higher".

But if those aren't your arguments, then you can consider one example that the tensor 'differentiable programming' libraries used in deep learning use automatic differentiation and matrix calculus. Matrices are taught in high school, and calculus is taught in high school, but matrix calculus generally isn't as far as I know. Or at least not at my high school. https://en.wikipedia.org/wiki/Matrix_calculus


That's like saying you understand state-of-the-art CFD code because you can read Fortran.

There are many aspects to learning systems that we still don't have any kind of grasp on, and will take more than a little advanced math (statistics/probability theory, transport theory, topology, etc.) to understand as a community.

Dunning-Kruger is probably more common in spaces like this one, where people carry social capital for being able to "spin up quickly". But the true meta-skill of upskilling is turning unknown unknowns (UU) into known unknowns (KU), and then into known knowns (KK). It's not enough to just jump from UU to KK through osmosis by reading blog posts on a news aggregator, because there will still be a huge space of unknowns not covered by that approach.


Yes, it’s really rather like alchemy in some sense. Stuff works, and often nobody knows exactly why.


"I don't follow the latest ML scaling and theory research" does not in any way equate to "these things are unknowable".


Hm, watching Neel Nanda videos recently and I do get the feeling that there are lots of unknowns in ML and also in what trained networks have learnt.


can you elaborate further on what you mean by 'dimensionality of neural nets'? Thanks!


Yes, I mean the huge number of trainable parameters.


Non-technical, non-accurate. "Truthy", buzzfeed/huffpo quality.


I asked ChatGPT:

Sure, I can explain deep learning using the analogy of mixing paint colors.

When you mix different colors of paint, you can create new colors that are a combination of the original colors. Similarly, in deep learning, you can combine different types of data, such as images, text, and audio, to create a more comprehensive understanding of the information.

In a neural network, which is the basic building block of deep learning, each layer is like a different color of paint. The input layer represents the initial data, which is like a blank canvas. As the data passes through each layer, the network extracts increasingly complex features, similar to how mixing different colors creates more complex hues.

Each layer in a neural network can be thought of as a different color that is mixed with the previous layer's color. As the colors mix, the resulting color becomes more complex and nuanced, just as the data becomes more sophisticated and detailed as it passes through the layers of the neural network.

At the end of the process, the output layer represents the final result, which is like a finished painting. The deep learning algorithm can then use this final output to make predictions or classifications based on the input data.

So, in essence, deep learning is like mixing paint colors to create new and more complex hues, but instead of colors, the mixtures are made up of data and the neural network layers are the different colors being mixed.


Certainly! Here's an explanation of how a neural network is trained using the analogy of mixing paint colors.

Initialization: The weights and biases of the neural network are like the starting colors of the paint mixture. Just like how you need to start with some base colors to mix new ones, the neural network needs some initial weights and biases to begin the training process.

Forward Pass: During the forward pass, the input data is fed into the neural network, and the network produces an output. This output is like the mixed color of paint that results from combining different colors.

Backward Pass: During the backward pass, the error between the predicted output and the actual output is calculated using a loss function. This error is like a measure of how close the mixed color is to the desired color. Just like how you would adjust the mixture of colors to get the desired hue, the neural network adjusts the weights and biases of the neurons to minimize the error between the predicted output and the actual output.

Repeat: Steps 2 and 3 are repeated many times with different input data points, and the weights and biases of the neural network are updated after each iteration. This process is like adding more colors to the mixture and adjusting the amounts until the desired color is achieved.

Validation: After several epochs, the neural network is evaluated using a validation set of data to determine if it's overfitting or underfitting the training data. This step is like checking the mixture of colors to see if it matches the desired color under different lighting conditions.

Testing: Once the neural network has been trained and evaluated, it can be used to make predictions on new data. This step is like using the paint mixture to create a painting.

In summary, just like mixing paint colors to achieve a desired hue, the neural network adjusts its weights and biases to minimize the error between the predicted output and the actual output until it accurately predicts the output for new data.


As someone who knows barely enough to be dangerous, I like this. I'm sure it leaves enough out to make most experts angry, but it makes a lot of sense to me.


> I'm sure it leaves enough out to make most experts angry

It's not that it leaves out details, it's that the articles metaphors are not actually correct in regards to the way deep learning works.

This post mostly confuses both reinforcement learning and ensemble models with deep learning. If you only enough "enough to be dangerous" then this post will steer your intuition in the wrong direction.


> they see 3 spreadsheets of numbers representing the RGB values of the picture.

This needs expanding: it's the sort of thing that's easy for a programmer to say, but few non-{programmer,mathematically trained person} are going to see that an RGB value has 3 parts and so a collection of RGB values could be sliced into 3 sheets.


...or know what Ruth Ginsburg Bader has anything to do with it all.

The RGB color model and representation of images in it is already technical. Anyone who knows what it means also wouldn't need to be told the following quip:

>Also note that computers see things as multi-dimensional tables of data. They don't look at a "picture" - they see 3 spreadsheets of numbers representing the RGB values of the picture.

...which is the only time RGB is mentioned in the article.

That's before we get to the part that "multidimensional" here is extraneous, and doesn't even match the typical usage (where RGBA is stored as a single 32-bit value). Everything is a tape of 1's and 0's, "multidimensionality" comes from interpretation of data.

The dimension of image data is still 2: each pixel is a sample a 2D projection of a 3D world, and is related to other pixels in a way that's different than, say, those of letters in a line of text, or voxels (letters don't have a a well-defined "up" neighbor, voxels have more well-defined neighbors than pixels do).


Why is violence and praise being used to illustrate gradient descent? Why does each person get to see the entire input data?


This is the funniest refutation of the Chinese Room argument that I’ve seen. Note that at the end, it’s still the case that none of these people can recognize a cat.


Doesn't that mean it supports the Chinese room argument? I'm not sure I follow your reasoning.

(also, popular conciousness forgets that technically the Chinese Room argument is only arguing against the much narrower, and now philosophically unfashionable, "Hard AI" stance as it was held in the 70s)


> the Chinese Room argument is only arguing against the much narrower, and now philosophically unfashionable, "Hard AI" stance as it was held in the 70s

Searle has stood behind his argument in the 70s, but in every decade since then too.

The main failure is that most people fundamentally don't believe they are mechanistic. If one believe in dualism, then it easy to attribute various mental states to that dualism, and of course a computer neural network cannot experience qualia like humans do.

I don't believe in a soul, and thus believe that a computer neural network, probably not today's models but a future one that is large enough and has the right recurrent topology, will be able to have qualia similar to what humans and animals experience.


Searle's argument in the Chinese Room is only that passing the Turing Test isn't enough to prove evidence of Mind (capital 'm' to distinguish it at the philosophical jargon term, and all it entails). He does hold the stance that he doesn't think Computationalism (in the style of Dennet) is correct. I'm not sure if he personally feels the Chinese Room argument refutes that stance as a whole, but I believe the general consensus is that, as originally formulated in his essay, it does not stand as a total refutation of Computationalism, without reading between the lines or squinting your eyes a bit. Searle does have a wider stance that he does not think computations can have things equivalent to mental states, especially intentionality. Obviously there is a whole separate debate to his correctness there, but I'm skipping over it to just discuss the Chinese Room

That passing the Turing Test is not enough to exhibit evidence of Mind is not that controversial today. GPT-4 could easily pass the Turing Test as it was originally formulated. There are not many out there that think it possesses conciousness or intentionality or any mental states at all really. We'd generally agree now that passing the Turing Test is only a step towards creating an actual artificial mind (how large or small a step is still up for debate).

Anyway, all this is a tangent as I still don't understand why the original commenter feels this article provides a refutation of the Chinese Room argument when it seems (to me) to reinforce it. I'm just curious on that perspective and was interested in hearing more.


I find the argument to be essentially assuming the premise (if a system acts conscious, but you look inside it and don’t find anything conscious in there, then it can’t be conscious) but honestly I wouldn’t say I’m sure I understand the argument. Given my understanding, I thought this illustrated the circularity. But I see I’m wrong, because if you buy the premise then you won’t find the situation analogous.

That is, imagine there was a famous philosopher who insisted that cats can only be recognized by some non-computational mechanism, and any computational cat recognizer might “simulate” recognizing cats but could not be said to actually recognize cats. Then you build a neural network that recognizes cats, and they open it up and point out that nothing in it can recognize cats, so therefore it isn’t “really” recognizing cats.


I understand the Chinese Room argument to be that because the human in the room doesn’t understand Chinese, the system doesn’t understand Chinese. In this case, none of the humans can recognize cats, but the collective can.


Thats not the Chinese Room argument. The argument says just because a system processes X doesn't imply it has consciousness of X.


The flaw is the unsupported assertion that the whole system being conscious of X depends on a part of the system being conscious of X. The same assertion would fail here in the same way.


At these levels of discussion everything is asserted as axioms to see what the consequences are.

If a human isn’t conscious of Chinese, but the arrangement of paper rules is, one would have to assert that paper + human is conscious.


Are you talking about access consciousness or phenomenal consciousness?


I see, so neural network works like Naruto's Kage Bunshin technique where the learning of the clones will pass to its origin


> They respond that this sounds very convoluted and they'll only agree to do it if you call them "colonel".

Cute.


Made me laugh because it's true, funny. Well done!


This has Three-Body Problem vibes :)


Does this article imply there are circumstances where a spreadsheet is a cat? What a poor example of technical writing.


He’s saying that the spreadsheet represents the “picture” of the cat in terms of pixels and RGB values etc.

The algorithm/workers are not really “looking” at a picture of a cat, they are analysing and looking for patterns in the data that defines the picture of the cat.


Yes I know. The spreadsheet analogy does not work.


Not the author, but to the author's defense, it is meant to be non-technical. And the first paragraph reads interesting to me.


Most non technical people would think there are zero circumstances where a spreadsheet could be a cat.


It's obvious from context that it's the content of the media. To me at least.

If I play you a song on Spotify and say, "Is this a saxophone?", you wouldn't say, "No, it's a iPhone running Spotify."

If a policeman holds up a photograph of a person and says, "Is this the person who attacked you?", the victim doesn't say, "No, it's an 8 by 10 glossy print."


Yes but you had to think about it, as did I, and there was a moment where you went "hangon, did this person imply a spreadsheet could ever be a cat?" and thought you were 95% sure they did this very bizarre thing, it was distracting.


mais ceci n'est pas une pipe!


That's part of the explanation. It might not make sense at first, but you'll figure something out to avoid being slapped.


Nothing about LLMs?!


Yeah, I need something to explain me about those Transformers things. I know it was published by Google in 2017 and that it is 'magic'.

End of knowledge.

Maybe I should ask ChatGPT?


2-hour video posted a month or two ago in a comment here: "Let's build GPT: from scratch, in code, spelled out."

https://www.youtube.com/watch?v=kCc8FmEb1nY

(I haven't gotten around to watching it yet)


> Maybe I should ask ChatGPT?

You actually should, it spits out a pretty good explanation (sometimes).


The most concise and intuitive line of explanation I've been given goes along the lines of this:

1 - We want to model data, representative of some system, through functions.

2 - Virtually any function can be expressed by a n-th order polynomial.

3 - We wish to learn the parameters, the coefficients, of such polynomials.

4 - Neural networks allow us to brute-force test candidate values of such parameters (finding optimal candidate parameters such that error between expected and actual values of our dataset are minimized)

Whereas prior, methods (e.g. PCA) could only model linear relationships, neural networks allowed us to begin modeling non-linear ones.


You don't need neural networks to do polynomial regression. Polynomial regression, perhaps surprisingly, can be implemented using only (multivariable) linear regression. You just include powers of your predictor x as terms in the regression formula:

  y = a + bx + cx^2 + dx^3 + ...
The resulting model is linear, even though there are powers of x in your formula. Because x and y are known from the data. They're not what you're solving for, you're solving for the unknown coefficients (a, b, c, d...). This gives you a linear system of equations in those unknown coefficients, which can be solved using standard linear least squares methods.

So fitting polynomials is easy. The problem is that it's not that useful. Deep learning has to solve much harder problems to get to a useful model.


Hm, I don't think that's quite it. I went through my own process of learning how neural networks work recently and wrote this based on my learning: https://sebinsua.com/bridging-the-gap

As far as my understanding goes, you can represent practically any function as layers of linear transformations followed by non-linear functions (e.g. `ReLU(x) = max(0, x)`). It's this sprinkling of non-linearity that allows the networks to be able to model complex functions.

However, from my perspective, the secret sauce is (1) composability and (2) differentiability. These enable the backpropagation process (which is just "the chain rule" from calculus) and this is what allows these massive mathematical expressions to learn parameters (weights and biases) that perform well.


Mentioning polynomials is a pretty poor way to explain it for two reasons:

- It requires some mathematical understanding so will exclude some part of the non-technical audience

- It is the incorrect analogy. Non-linearities in neural networks have nothing to do with polynomials. In fact, polynomial regression is a type of linear regression, and for the most part, it sucks.

Also, as someone mentioned, all the “serious” alternative ML methods prior to the deep learning revolution allow modeling non linearities (even if just through modification of linear regressions, like polynomial regression).


Thanks for the correction. It's been some time since I actively thought about the theory (evidently I didn't digest it correctly the first time!).


> Virtually any function can be expressed by a n-th order polynomial.

But there are many things that are not functions. Like circles. And they tend to crop up a lot in the real world, no pun intended.


Well, technically a circle can't be said to be a function but not for the reason you mean. A circle is a collection or a set of points, for example in a 2d plane, that are equidistant from a center point.

Probably what you are trying to say is that "a circle is not the image of a function", but that is also not true. You're assuming since in cartesian coordinates you can solve for y = +/- sqrt(R^2 - x^2), the fact that y is multi-valued means it's not a function. This is what they teach in highschool pre-calculus anyway.

But for example, we can associate the points on a circle with the image of the function e^{i theta}. Or equivalently, with the R^2-valued function f(theta) = (cos(theta), sin(theta)).


> Whereas prior, methods (e.g. PCA) could only model linear relationships,

Prior methods also allowed modelling of non-linear relationships, eg. Random Forests.


Except gradient descent is about as far from brute force as it gets


Sure, under the assumption that your parameter space is convex.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: