Hacker News new | past | comments | ask | show | jobs | submit login
Edge detection doesn’t explain line drawing (aaronhertzmann.com)
246 points by KqAmJQ7 on Aug 2, 2023 | hide | past | favorite | 168 comments



Ten years or so ago I was working on a video chip that had an upscaler feature. While prototyping and simulating it, we first started by applying a mathematically-correct (i.e. information preserving) FIR filter to do the upscale. Then we compared the result with other solutions and found that ours looked worse. We asked our colleagues to blind-test it and they all picked third-party-scaled images over ours.

At first we assumed that we must have had a bug somewhere because the Fourrier transform told us that our approach was optimal, but after more testing everything matched the expected output. Yet it looked worse.

So we started reverse-engineering the other solutions and, long story short, what they did better is that they added some form of edge-enhancement to the upscaling. Information-theory-wise it actually degraded the image, but subjectively the sharper outlines were just so much nicer to look at and looked correct-er. You felt like you could more easily tell the details even though, again, in a mathematical sense you actually lost information that way.

I don't think it makes a lot of sense to reduce human vision to edge detection (we can still make sense of a blurry image like this one after all: https://static0.makeuseofimages.com/wordpress/wp-content/upl... ) but it's clear to me from empirical evidence that edge-detection is a core aspect of how we parse visual stimuli.

As such I'm a bit confused as to why the author seems to see this as a binary proposition. That being said, I could just be misunderstanding completely the point the author is trying to make.


I don't think it's just subjective in this case. The theoretical signal processing approach assumes that the signal is band-limited to frequencies less than two pixels wide, and it's not. There are lots of sharp edges that have higher frequency components than that.

Another way of looking it, more along the lines that you're talking about, is that it depends on your error model. The traditional way of measuring error is RMS pointwise in pixels. Doing some sort of interpolation on pixels gives a pretty good result for that. However, another way to look at it is that it may be better to have a positional error, i.e. a particular color or intensity level is in the wrong spot, than to have an intensity/color error, i.e. you have a pixel that has an intensity/color that's not present in the source signal.

This same basic issue was the basis of a big divide in font rendering for many years, where the Mac would render fonts with the exact geometry of the letters, but then anti-aliased, while Windows would use font hinting to make the shape snap to the pixel grid. Personally I thought that the Windows approach was a lot easier to read on a screen, but the Mac approach had the advantage that the geometry of the text would be exactly the same in print as it was on the screen, back when print was something that was important, especially for Mac users.


Sounds like using the wrong metric, the upscaled image should be compared against the original full resolution one, not the downscaled one. Obviously you can't know what the full resolution one looks like when actually upscaling (vs testing), but you can make an educated guess.


You can remove the guesswork. You can start with a high resolution (or even raster export of a vector image for an extreme example), downsample it (various methods and downsampling factors for completeness), then attempt to upscale it.


That's exactly what GP is talking about, the 'guesswork' comes in when you upscale as a function of only the downsampled version.


The difference between the data in the image, and the information? If for instance you upscaled text so large that it became blurry and unrecognizable, you lost information.

Our cortext is all about interpreting what we see. Almost before our brain proper has the data, nerves have begun extracting information (edges etc). Probably because it was the difference between hitting and missing the animal with the spear. Or seeing or missing the tiger in the grass.


Precisely! I also find it interesting how, from an information theory standpoint, audio processing and image processing are effectively the same thing (audio resampling is fundamentally 1D image scaling for instance) but because humans process sounds very differently from images we end up doing things pretty differently.

For instance when we want to subjectively make images more attractive we tend increase contrast and sharpness, whereas for sound we tend to compress it, effectively reducing "audio contrast".


The old habit of reaching to “increase contrast”[0] as a means of making an image more attractive exists in large part because 1) the dynamic range of modern display media is so tiny compared to the dynamic range of camera sensors and our eyes[1], and 2) the images most people typically work with are often recorded in that same tiny dynamic range.

If you work with raw photography, you will find that, as with audio, the dynamic range is substantially wider than the comfortable range of the available media: your job is, in fact, to compress that range into the tiny display space while strategically attenuating and accentuating various components—just like with raw audio, much more goes into it than merely compression, but fundamentally the approaches are much alike.

[0] Which actually does much more than that—the process is far from simply making the high values higher and low values lower.

[1] Though “dynamic range” is much less of a useful concept when applied to eyes—as with sound, we perceive light in temporal context.


> I'm a bit confused as to why the author seems to see this as a binary proposition

The author mentions this twice:

> This hypothesis is compatible with Lines-As-Edges, while answering many of these questions.


Surely if you're upscaling pixel art you're loosing information when you create gradients between pixels. It doesn't seem to me that your metric of information loss was ideal.


Conservation is not just about preserving info, it's also about not adding information that's not here. If you upscale without those gradients (effectively sharpening to the max with nearest neighbor extrapolation) you introduce high frequencies that could not exist in the original data. You've created new information out of nowhere.

But of course you're correct that in this case it may be the desirable outcome. I still think that this idea of creating information using algorithms in order to get a subjectively more pleasant result is really one of the biggest issues of our time. Not a day passes where I don't see AI-colorized pictures, AI-extrapolated video footage, AI-cleaned family portraits, AI-improved smartphone footage etc...

It's both amazing and a bit scary, because in a certain way we rewrite history when we do this, and since the information is not present in the original it's very difficult to ascertain how close we truly are to reality. We're creating a parallel reality, one Instagram filter at a time. Maybe that's the true metaverse.


> in a certain way we rewrite history when we do this

History is sort of inherently rewritten. Memories are (very!) imperfect and even without realizing it we interpret events through our individual biases. Maybe the more precise concern is the increasing _willful_ departure from reality, but we do that naturally too, overlooking parts of reality that would be intolerable if they were always in our face.


Quite, no new information so no “loss” but not the information that needs to be there.

It’s putting an 8oz coffee brew in a 20oz cup & giving it to the customer as a large saying they had no coffee loss. While true, it’s not the same as delivering a 20oz coffee.


The upscaled image stores more information than the original image, so it must be possible to keep all the information while still doing edge enhancement!


Stores more data, but the same information, and if there is any interpolation then some of the data is modified, meaning that you lose a little data. In fact even without interpolation I think you change the data.

If you imagine a hard edge that aligns with a pixel [] boundary then you imagine upscaling in various ways, I think it's QED. You change data about the sharpness of the edge.

[] I use an Android phone, with Google's keyboard it genuinely rendered "pixel" with a capital letter. I've never written about Google's device of that name. Silly Google.


Well consider nearest neighbor upscaling. Since we are upscaling, every pixel in the source image will determine one our more pixels in the result image. Consider one of the source pixels that turns into multiple result pixels. If you manipulate one of those while leaving the other(s) intact you can still recover the source image (assuming you know which pixels are still good), meaning no data was lost.


Surely anything other than duplicating each pixel x times both horizontally and vertically (so 1 turns into 4, or 9, or 16, ...) adds information?

(This submission is going to have me reaching for my old textbooks.. about time really!)


No. Nearest neighbor, bilinear bicubic etc. is just encoding the same information in different ways.

You could add noise, or generate new details with an ai upscaler. That would create new information.


Ah, right, anything that's a function purely of what's in the image - no randomness, no external context/'knowledge' to interpret it semantically, is as you say 'encoding the same information in different ways'?

If it can be computed (deterministically) from the image alone, then it was already there.


Works in games too. Adaptive contrast will increase noise, but after a while games without it will look blurry and undefined.


It's like playing a familiar song on an old cassette tape on a modern system. The user will likely be tempted to crank up the treble, trying to recover content that isn't there. In the image-scaling example, the HF content wasn't there to begin with -- since you upsampled it, it couldn't have been! -- but there's a strong psychological expectation for it to be there.

With images, it's a bit easier to put the high frequencies back in via judicious use of edge enhancement, perhaps because you have two dimensions to work with rather than one dimension in the audio case.


A lot of smart TVs do that. Being able to spot it (played a lot with edge detection when discovering computer vision) is a curse.


I would put it like this:

Sharpening increases the high frequencies to cover the loss of even higher ones lost in downscaling.


Yes. Similar to pre emphasis in baseband telecommunication standards in band limited media.


> Problem #1: What about all the other features?

Is this necessarily a problem in the argument? If we consider e.g. color, and construct a blurry image that has a color distribution very similar to a real face, but doesn't fire the edge detection in the same way, we can still recognize it as a face. This could just mean that a sufficiently close match to other examples of the same class on any one of the strong "dimensions" of the image our brain processes is sufficient to make us recognize it, no? The author makes it sound like if edge detection is what makes us recognize line drawings, this implies that our brain discards every other feature in attempts to recognize visual input, but I don't think that's a sound conclusion.


The "Problem #2" that they mention also has an easy solution.

If the "intermediate" variable / internal representation is just "the input image but with edges only," then of course you can see internal representations.

When you compute that intermediate variable for the line drawing, it will just happen to behave like the identity function for that particular case. So if you have already filtered out non-edges then the transformation is basically a no-op.

The "types" mismatching as they mention is not a concern because the type is just "image" i.e. a big vector of HSL values or something. Edge detection is just a convolution filter so it's going to have approximately the same type as the input.


This stood out to me as well. If I close my eyes and walk into a wall I will still perceive the wall.

Our various senses help build an understanding of what's going on. It's not like we fail to understand anything once you remove one part of the system.

Maybe this person is trying to say something deeper or more nuanced and I fail to understand the meaning behind it.


Yes that stuck out to me as well. The author seems to be setting up a straw man, as if people are arguing that the brain can't distinguish between a line drawing and the real thing (or a line drawing and a photograph).

The other information isn't necessarily discarded. It's just used to identify that this is a line drawing and not the real thing. It's still remarkable that just the lines themselves (I make no claim as to whether it really is edge detection or something else) are still enough information to be able to identify the representation, but it doesn't mean the brain is discarding the other information.


Very frustrating article to read. The article is setting up a straw man and attacking it. He is acting like everyone else thinks:

1) edges are the only important features in images and 2) line drawings can only represent edges.

Who are these brainless absolutists that he is attacking?

Then he's acting like he is the only one with other bright ideas that nobody will listen to.

I think it is obvious to anyone who thinks about this that:

1) edges are a useful feature for recognizing objects in images but not the only useful feature 2) lines in line drawings can and often do represent edges, but there are a lot of other things they can represent. Light and shading and texture of various kinds.

It would be fine to write an article that goes in to depth on the different nuances, but it is annoying that this author pretends that most other experts have naive and simplistic views, with "uncritical certainty", and "no one seems to question it", and the author "has a hard time convincing them otherwise". It is a very condescending tone that comes off sounding like the author is presenting themselves as some brilliant but misunderstood outcast, and the only one who can see the light of truth.

we could do without the drama!


One of the problems right off the bat is not understanding that the classical "edge detection" algorithm doesn't actually detect edges. It detects rapid change in contrast. To then claim this computer algorithm's flaws are somehow proof that a psychological theory is therefore wrong is itself the categorically wrong thing.


The author comes across as ignorant at best, but then to present his own work as the Realism Hypothesis of Hertzmann, it leans more towards arrogance.


There is a subset of the tech bro that believes everything can be reduced to a problem with clearly defined taxonomy and as such every problem can be solved by an engineer with no subject knowledge. This article very much reads like one of those people wrote it.


To me that’s the very definition of tech bro.

My favorited comments on this site are mostly this phenomenon. It’s annoying because there’s a built in default assumption that one is such a great thinker there’s no need to waste time seeing if an expert has already solved the problem. One of my favorite examples is the software engineer that spent significant time testing his shower mixer valve and writing a “manifesto”[1] on how to make a better one. A few minutes of googling would lead him to realize why a mixer valve might have such a wide range (inlet temperatures and pressures are not a given) and also to the actual, existing solution (thermostatic mixing valve).

[1] https://news.ycombinator.com/item?id=34611335


Yeah, I have been guilty of that myself sometimes. This XKCD is a reminder to me: https://xkcd.com/793/


A manager once told me he'd never hire a PhD because once they complete the specialized work we hire them for, they inevitably get put on something outside their specialty - like your linked xkcd - and then their acceptance as experts along with that behavior causes real problems.

Another time I had an older PhD moved to my area (outside his) where we were trying to meet a number of objectives. He said in a meeting that "it is mathematicaly impossible" to achieve one of our performance goals. I quietly went back to the lab and ran my new control algorithm and documented hitting that goal. Never refuted him, just filed the incident away in my head.


Yeah, bad PhDs, bad!

Edit: People being arrogant or know-it-all is probably not especially correlated with having obtained a PhD, but more with overall frame of mind, and I find this comment to be a uselessly negative ad-hominem.


Also people bringing this up remember the one time the PhD was wrong, while discounting the 99 times the PhD was right and kept them from doing a lot of fruitless work.


Sorry, but I think it's correlated in two ways. One is that very bright people, which I think includes most PhD-havers, are especially used to being right. When they have the rare experience of being ignorant and wrong, they may struggle with it much more than others. Two, academia is a bubble. I think that's great; I love that we have a place where people who are deeply interested in something can focus entirely on that. But it necessarily means that they're less likely to know about things outside that bubble.

That's not to say it's a perfect correlation. I know plenty of people with PhDs who don't have the problem in the XKCD cartoon. But I too am careful hiring PhDs in tech jobs. Professional work is just very different than academic work. It takes time to learn it for people whose main focus is the theory. After all, "In theory, theory and practice are the same. But in practice..."


I once worked with a PhD who claimed that basically any novel bit of coding was a "research problem", and thus not worth bothering. Using a hashtable to speed up an algorithm? Research problem. Using raw TCP instead of HTTP for a long running connection? Research problem. Implementing a graph algorithm you could read up on Wikipedia? Research problem. I think it was only when I solved three of those "research problems" in one week that he finally shut up.


But you actually don't have a proof you solved these tasks! Where is the arxiv preprint? Make sure all LaTeX is syntactically correct, and double-check that your chosen citation style is according to its latest edition!


I would be interested in your clearly defined taxonomy of tech bros.


>Yet Lines-As-Edge supposes that the vision system discards all of this other information present in an image, for just this one special case. Why?

The "other information" is not discarded, it's just not processed yet

I think this is a misrepresentation of how the visual system works and views the vision system as more "batch" than the continuous process it is.

So if you think about it as a time series problem, when light hits the retina then "inference and processing" starts with a kind of "fast and rough" inference and then proceeds to fill in details and contextual coherency follows

I'd have to go pull out the textbooks but if you look at visual interpretation sequence it's something like:

Movement > edges > color > details

So your visual system acting as an object detector - from sensor to inference - makes inferences about movement first, then infers the edges, then infers color and finally additional details in the last few ms

Nothing is "discarded" it's just less relevant in the first pass and additional refinements happen mostly sequentially - this all happens in nearly imperceptible time.


As a person that has been practicing and studying line drawing for 15+ years, this article seems like its way off the mark. Its just not asking the right questions.

The brain uses all kinds of context sensitive cues to try to link what its looking at to whats its already seen before. If you happen to look at something thats completely new, it doesnt matter if its a line drawing, 3d 4k image or whatever, your brain will be confused. OTOH, when you're looking at something that you've seen before the brain will do all kinds of tricks and cheats to make that thing as real as possible.

Anyone that draws will know you can use plain black and white lines to depict everything from textures, depth, color etc..

Line drawing works because literally anything will work. You could see a distorted, flat black silhouette of one of your family members, and you will instantly know who it is. Its your brain that "makes" things work because its job is to take incomplete information and make it fit an existing mental model.

Line drawings carry so much visual information about objects that its really not that impressive your brain can "figure" out what its looking at. It can do so much more with so much less.


The last picture in the article is an interesting counter-example: a line drawing that's constructed in a fairly natural way that is nevertheless almost impossible to interpret. I'm sure there are "line art rules" that picture breaks which are almost second nature to someone who's been in the game for as long as you have; the interesting thing is how that translates to the visual cortex and why those rules work.


That one isn't a line drawing, its a shadow relief drawing with depth removed. Then with with outlines applied to the shadows.

Even then, the only reason why its so "confusing" to us is because they applied some kind of random black shape to the background which extends above his head to the left.

The black background blends with the man's body making an amorphous blob. If it was removed you'd be able to instantly tell thats a bald mans head.

You are correct about the "line art rules" they're all just basic visual rules that the brain uses that artists have been playing with for 100s of years. Its no different than those illusions that look like a Grandma or young woman at the same time. All it takes is one conspicuously placed shape and your whole perspective can be thrown off.

No different from how camo works: https://en.wikipedia.org/wiki/Disruptive_coloration

Anyway, my point being, this isn't some new phenomenon, its a well understood at this point.


The brain's ability to do "shadow removal" is really impressive. You can see it happening in the final example depicted in the article. I think this ability is what was used to explain "the dress". It also comes up in a classic optical illusion [0] where a checkerboard of light and dark gray squares has an object sitting on it and casting a shadow. The shadowed light gray squares are exactly the same shade as the unshadowed dark gray squares, but if you point this out to someone, they'll have a really hard time accepting it because they are so good at unconsciously accounting for shadows.

[0] https://en.wikipedia.org/wiki/Checker_shadow_illusion


Hey thanks, this one is new to me. 100% it is impossible to convince myself that those two squares are the same color.

I have a camera looking over my driveway with motion detection that triggers all the time because of the shadow of a tree waving in the wind. My current weekend project is to use FastSAM to detect cars / animals / people in the driveway instead of just looking at a threshold of changed pixels.

1: https://github.com/CASIA-IVA-Lab/FastSAM


It's because the squares are not the same color.

The alternating squares are different colors... in absence of a shadow.

Your brain is correctly adjusting for the shadow to correctly identify the "true" color.


Eh. To me it looks like the authors extends "lines-as-edges" explanation of why we can understand line drawings to the claim that the visual cortex works almost only on edge detection, which is obviously bunk and so easily refuted by him. Then later he shows a shillouette image we can understand even though the edges themselves are not understandable, and somehow this is supposed to be evidence against lines as edges. No, it just means that we have other methods of understanding images: shillouette, color, texture etc.

In my opinion these kinds of arguments trying to decompose the brains functioning into a couple of distinct techniques become quite obviously pointless once you look at the activation patterns of neural networks. Just looking at the features that neurons detect throughout the different layers of an inage classifying neural network tells you more than these kinds of papers ever will.

You see that edges, shillouettes, circles, circles with holes, textures, shinyness patterns, grid patterns etc. up to complex patterns and then real things like heads or arms are detected.

There is some manifold of all the images a being is somehow likely to see on this world and it has a complicated structure. You can extract the major features of the geomatry of this manifold and you come across the usual patterns. At simple complexity you find things like edges and textures, at higher complexity things like eyes or appendages. You try to find these features in an image and hope one of them works well. Maybe edges work, maybe shillouettes work or maybe both or neither.

Look I know ANNs and NNs are quite different, but the experimental evidence with NNs shows that what I described above, a mix of feature detectors that just approximate the structures of the data to deeper and deeper detail and are just all somehow applied to see what works, is much more plausible than some constructed algorithm a neuroscientist or philosopher would write down.


> Look I know ANNs and NNs are quite different, but the experimental evidence with NNs shows that what I described above, a mix of feature detectors that just approximate the structures of the data to deeper and deeper detail and are just all somehow applied to see what works, is much more plausible than some constructed algorithm a neuroscientist or philosopher would write down.

Precisely. Deep CNNs were directly inspired by studying the structure and behavior of visual cortical tissue.

The algorithm is just recursive feature detection, where the visual elements are transformed into abstractions, based on both sensory input and projected expectations. If anything, that's the real exciting part of the visual system, why do illusions occur, how does expectation affect perception, etc. Not "how do line drawings evoke similar concepts to images". That's just bypassing the first few layers of filters. Basic ass tiny MNIST nets can do this.


Thanks, was looking for this.


The word is spelled "silhouette".


"shillouettes" are what politicians use ;-)


thank you :)


I have a different hypothesis. I think line drawings are the representation of vectors. The details we focus on are the vectors with the highest magnitude of change, but that change is not always visual. For example, momentum often has an outsized representation in line drawings. Edges can be a high magnitude change as well, but it's not the only thing.


You’re on to something here. In the past I did a bunch of work on extracting line drawings from images, and the fundamental goal was always to vectorize the lines - then you get an abstract representation of the figure that you can scale up or down.


Interesting; can you elaborate? Also could you clarify what you mean by changes that are "not always visual"? How would non-visual information exist in a purely visual medium?


The example I mentioned was momentum. That's not visual information, that's extrapolated information of position over time. It can be represented in line drawings as motion lines in comics, for example. Interestingly, by simply implying motion, I hypothesize that the brain deprioritizes processing detail on the object that is implied to be moving, and focuses instead on the interactions that will follow.

If I were a researcher, my contrived test of this would be to simply have people recreate drawings of "static" objects, and have others recreate drawings of objects implied to be in motion.

Other non-visual information would be emotions. The shape of eyes and mouth lines are highly critical to passing emotion. I suspect that people's interpretation of emotion directly impacts how strong the emotional representation of those parts of the face would be drawn. For example if a test subject is told to draw the face of a model in front of them, but they are told the person is experiencing an emotion, I hypothesize that the group of people who are told the person is happy would more frequently bias their interpretation of the eyes, eyebrows, and corners of the mouth towards a "happy" representation than those who are told the person is experiencing great inner turmoil.

To be clear though, I'm not saying we only draw based on non-visual information. I'm saying the sum total of all vectors has an influence on the drawing. Colors, in my opinion, have as much of an impact as edges. And it would be interesting to compare the drawings of a person with less common color sensitivities to more common color sensitivities.


Wouldn't a simple explanation of line drawings be that segmentation of images into shapes is an important part of vision, using whatever information is available? The reason it would be somewhat invariant across color and lighting is that those change so much. (eg. we can see by moonlight or faint illumination, when color signal is absent)

In some cases, simple segmentation fails (like with the shadowed face in the article), and you have to rely on other features.


As someone (probably like many here) who graduated from a university which taught computer vision and peripheral neuroscience courses, with such titles as "Computational Neuroscience of Vision", I always felt that trying to understand the human brain as a kind of algorithm was a bit of an artefact of computer scientists as they approach biology.

The truth is the visual cortex is vast, and not sufficient to explain the human classification and perception of objects visually. Never mind individual neurons or edge perception. Edge detection is an interesting isolated example for study and learning, but you will never come close to explaining human recognition and cognition in such simple terms.


ah, someone who paid attention in lecture!

(incidentally; there’s a fairly deep literature of historians of science that have carefully documented that we describe ourselves as analogous to the most sophisticated technology of the day: see “to lose one’s temper”, “to blow a gasket”, “i got my wires crossed”, “sorry, cache miss”, … as metaphors and idioms of mental state through the centuries that reflect the cool tech of the time in which they were coined )


And also the universe itself is often seen through the lens of contemporary technology.

Are we living on an island that floats on a giant turtle’s back? Or are the heavens like giant clockworks? Or maybe it’s all a computer simulation?

These cosmological speculations are separated by thousands of years, but they are all simply a reflection of what the person finds most awe-inspiring in their everyday life.


Reasoning by analogy is one of the ways we solve the framing problem.

So, when explaining the universe we imagine it's an act of will by a conscious entity (ie., like how we invent). When explaining the mind we suppose it's like one of our inventions.

Absent an analogy of some kind it's quite hard to determine what features are salient. Objects have an essentially infinite number of properties.


> So, when explaining the universe we imagine it's an act of will by a conscious entity

Unless I’m misunderstand you, that line of reasoning assumes one is religious.


Not necessarily, see for example Nick Bostrom's Simulation Hypothesis.

Maybe one could argue that too requires adherence to some religious dogma (Scientism? Reductionism?)


I don’t think that’s strictly required — atheists/agnostics can still “imagine” the universe is an act of will


We here means "our species".


And absent an understanding of electric fields and meteorology, that lightning bolt over there must have been hurled by Zeus!


Humans are the universe. It’s not surprising the universe uses available metaphors to explain to itself why it might exist.


Each of these are true in a sense. There's no turtle, but we are living on an "island" floating through space. The heavens do follow predictable, clockwork rules. Computer simulations are at least a good way to describe the universe.


The difference is turtles all the way down was literal, but heavenly clockwork and computer program perspectives are clear metaphors.


> turtles all the way down was literal

Was it? What evidence do you have for that? If anything it sounds like the kind of verbal slapdown someone in authority would subject someone trying to be a smart alec. It is short. Easy to understand. And closes the kind of questioning.

I would be very surprised if someone have considered it the literal truth, but of course have seen stranger things.

> heavenly clockwork and computer program perspectives are clear metaphors.

I don’t know about the clockwork. You will need to find someone who talks about that and ask them if they meant as a metaphor or not.

On the other hand I know about the computer simulation one. That for me is not a metaphor. I seriously think that it is within the realm of possibilities that this universe we live in (including us) is a literal simulation.

There would be possible physical experiments which depending on their results could make me increase or decrease my confidence in that statement. But I don’t consider it a metaphor.

Now of course that is only the viewpoint of a single human, at a single point of time. So it might not matter much. But it shows that it is not that “clear” that everyone considers that view only as a metaphor.


> I seriously think that it is within the realm of possibilities that this universe we live in (including us) is a literal simulation.

Notice how you dropped the “computer” part. Without that qualifier, the “universe is a simulation” hypothesis goes back at least to Descartes and his evil demon [1].

That’s the GP’s point. The “demon,” “clockwork,” and “computer” are just metaphors to help illustrate the point. Hundreds of years ago it was a trickster demon, now it’s computers - the simulation part is the same.

(The world floating on a turtle idea traces to the world turtle and several different creation myths, so it’s safe to say their believers took them a bit more literally)

[1] https://en.m.wikipedia.org/wiki/Evil_demon


Pretty funny to see a constructive comment downvoted faster than sweaty blather. Not a good day for HN.


> now it’s computers - the simulation part is the same.

I don't think so. I do understand this lineage of thought, and I agree with you that they are somewhat similar. But I must insist in saying that what I'm talking about is different.

The trickster demon metaphor talks about a being (you) whose senses are replaced by the demon. But that means there is a you outside of the demon/computer simulation.

I believe if this universe is a simulation, then I am part of that simulation. My mind is not an external observer hooked up to the simulation, but just matter simulated by the simulation according to the rules of the simulation. The thing Descartes was talking about is a Matrix situation. (Or rather to say the creators of Matrix were paraphrasing Descartes) Neo thinks he is living his life, but in truth his body is laying in a pod in the goo. I don't believe in that. I don't think that is likely true. If this is a simulation then me (and you, and this computer, and all of the people, and the butterflies and the stars) are in the simulation. And not the way how Neo is in there, but the way a cubic meter of minecraft sand is inside a minecraft world. Inside the minecraft world it is a cubic meter of sand, outside of it is just a few bytes in the memory of some program.

Let me illustrate what I mean when I say that I don't speak about the universe being a computer simulation as a metaphor. Imagine that it is a simulation. What does this computer simulation have to do? Well, it seems that there are particles, and there are forces between them (gravity, electric, weak/strong nuclear force) In every iteration of this simulation it would seem that you need to calculate which particles are close to others so you can update the forces on them, so you can calculate their new state.

To do this you need to inspect the distance between every two particle. That scales with ordo N^2 with the number of particles N. If the universe is a computer simulation it probably runs on a computer of immense power. But even then N^2 scaling is not good news in a hot path. Funny thing is that if the universe you want to simulate is relatively sparse (as is ours), and has an absolute speed limit (as ours seems to have), then you can shard your workload into parallel processes. And then you can run the separate shards relatively independently, and you only need to pass information from one shard to an other periodically.

Now if our universe is a simulation, and it is sharded this way, then you would expect anomalies to crop up on the shard boundaries. Where the simulated mater is moved from one executor "node" to an other. We could construct small spacecraft and send them far away (perhaps other solar systems?). We would furnish these small automated spacecraft with sensitive experiments. Microscopic versions of a Newton's cradle, or some sort of subatomic oscillator, or a very precisely measured interferometric experiment. And the craft would autonomously check constantly that the laws of physics are unchanged, and work without glitches.

If we don't see any glitches, then we shrug. Either we don't live in a computer simulation, or the computer simulation is not sharded this way, or the edge cases are very well handled, or the instruments were not sensitive enough, or the shards are even bigger (perhaps we should have sent the same experiments to a different galaxy?) If we see glitching, then we should try to map out exactly where they happen, and how they happen, and that would be very interesting. And if we see glitching of this kind that would increase my confidence in us living in a computer simulation.

Does this make sense? You cannot design an experiment to test a metaphor. It doesn't even make sense. But I think of this as a possible literal truth, in which case you can formulate hypothesises based on it and you can check those with experiments.

> so it’s safe to say their believers took them a bit more literally

I believe you. Did anyone ever propose to solve a famine by sending a hunting party to cut a chunk of the turtle's flesh? Or to send gatherer's to collect the dung of the turtle to fertilise the land? Or to send holly people to the edge of the world, to peer down at the turtle to predict earthquakes? If the turtles are meant to be literal turtles these are all straightforward consequences. If nobody ever proposed anything like these, then perhaps the turtles were more of a metaphor?


The "we're living in a simulation" theory is silly and self indulgent. If it's a simulation, then a simulation of WHAT that is REAL and exists OUTSIDE of a simulation? You still have to explain THAT. It's just as stupid and self-justifying and needlessly complex and arbitrarily made-up as any religion.

That is different from "we're living in a computational medium", which doesn't claim it's simulating something else, and is the only level of existence. (i.e. Fredkin et al)


I’m sorry. Writing select words with all-caps and calling the idea names is not making your point more persuasive.

> You still have to explain THAT.

I see your point there. Sadly the universe is not obliged to be easy to understood. “If X then I have further questions, therefore not X.” Is not a logical reasoning I recognise.

What I am saying is that you can’t argue that we are not in a computer because that would bring up a host of questions.

> That is different from "we're living in a computational medium" which doesn't claim it's simulating something else

Interesting. The way I use these they are synonymous in my mind. I don’t claim that there is something else out there which the simulation mimics. If you have some state representation and some rules to describe how the state propagates then I would describe a computer program which calculates new states based on the old one a simulator. This is the sense how I use the word when I say “we might be living in a simulation”. If this bothers you feel free to just imagine that I am saying “we might be living in a computational medium”.

> and is the only level of existence

Now, why exactly do you belive that? Why not 2 levels? Or 3? Why do you feel believing that there is only one level of existence is more justified than those other arbitrary numbers?


I believe the universe we live right now is no different than a simulation. Subtle difference in belief but I think it might have a big implication.


There is one key difference between reality and simulation. In reality you have to spend energy to remove noise. In simulation you have to spend energy to add noise. Or perhaps more accurately, all objects interact in reality and energy needs to be spent to prevent interaction, while simulation requires energy to make objects interact.

But it’s even worse than it sounds at first, because you need to spend energy not just on calculating the interactions which is super linear with the number of objects, you must also spend the energy to make it possible for the objects to interact in the first place.


This is an incredibly deep observation that essentially points to the problem with the representations we use to understand the Universe. It feels like the universe is essentially showing us that there is a non-supra-linear representation it uses (based on the kinds or fields of interactions?), and that calculating within this representation (between fields?) is somehow equivalent to calculating all of the interactions for the objects across all of the fields simultaneously.

Almost feels like it's related to P=NP or logic and meta-logic. Is it fundamentally impossible to use the same 'Universe'-al representation inside the Universe, a Gödel-like result limiting us only to the real? Or can we represent and run subsets of smaller universes within without a computational explosion? If so, does it eventually revert back to becoming fundamentally impossible at some limit, and if so, are we there yet? Can we measure how far from the limit we are, somehow?

Fun questions. Thanks for the provocative clarification.


Perhaps a foolish question but does “simulation” necessarily imply calculation or is that just an extension of our current evolution of computing technology as an analogy for what a simulation would be? I’m not convinced the one necessitates the other.


Oh, I don’t know. I mean conceptually a simulation is just a model that changes over some axis, time being a prime candidate. I’ve seen some goofy models that use an axis other than time to create some interesting visuals. There are definitely game makers playing with some of this stuff.

Calculation may be the wrong word for what’s necessary for a simulation, but I don’t think you can have a simulation without something analogous to computing. But the computation may look foreign, think analog vs digital computers. I mean, what would it mean to simulate something if you weren’t interested in finding some measurable thing? How do you seperate the ability to observe the simulation and not be able to measure anything? I may be too steeped in engineering to be able to answer this, since the last thing I simulated was an analog circuit. But I also studied artificial life, and even there the goal was to learn something about life.


What I wonder about from your explanation is how does a simulation know where the noise is coming from. I feeling is that inside the simulation one is unable to differentiate the source of the noise.


You're not wrong. But I suspect you'd find inconsistencies if you looked hard enough. Situations where 2 things don't interact in some obvious expected way. And that's just the simple case. If you've played enough video games, you'd know that devs can easily create scenarios where there is no way to get the correct behavior between 2 objects without doing some pretty drastic changes to their game engine. (I play a lot of simulation centric games). Basically the number of ways you can poorly implement objects interacting with one another explodes pretty quickly. So that means, that the bar is pretty high, for something living in a simulation to never notice irregularities quick enough for the simulator runner to fix them, assuming the simulator runner is able to fix them at all.

I think about this a lot, and sometimes wonder if the edges of science can't be solved until some meta being comes along and implements that edge case. And then the edge cases get weirder and weirder. But really, I'm relying on my intuition of superlinearity when I think about this stuff, and I can see certain problems with simulations going to infinity faster than, say, the infinity of the infinite time argument that we must be in a simulation.


For the record I'm in the reflection of reality camp. I think the simulation camp is silly.


I think the reality as simulation camp gets one thing right - reality is virtual. Space and time don't exist, there is only information and relation.


[flagged]


That comment was a wild ride.

I'm curious if there is a way that I could phrase a polite request to you to ask if you're a human (that just happened to create your account 30 minutes ago to post this within the same minute) or if this comment was auto-generated.


Tell that to the people who subscribe to the simulation hypothesis.


Psh - everyone knows it's a flat disc balanced on the backs of four elephants which in turn stand on the back of a giant turtle.


I first heard Turtles all the way down in spoken story by Kurt Vonnegut. Does anyone else have a source of the story or is that the source?


Quote Investigator traces variants back as far as 1626, though it evolves over time. A "rocks all the way down" variant dates to 1838, "tortoises all the way down" to 1854:

<https://quoteinvestigator.com/2021/08/22/turtles-down/>

The version I'd first heard attributed the story to a lecture by Bertrand Russell and an audience Q&A, though it seems clear that that couldn't have been the first instance.


Your source story sounds like the story I heard/read Kurt Vonnegut tell,


Agreed, though I don't believe that's the context I first heard it.

I've run across the Vonnegut variant more recently. I don't recall where or when I heard the earlier version for the first time, though I suspect it came up in conversation without attribution. Likely sometime ~1980 -- 1999.

That variant may well trace to Vonnegut, though I suspect it had been passed through numerous mouths and ears by the time I heard it.


Dr Seuss's Yertle the Turtle is a metaphor for Hitler.

https://en.wikipedia.org/wiki/Yertle_the_Turtle_and_Other_St...

>Seuss has stated that the titular character Yertle represented Adolf Hitler, with Yertle's despotic rule of the pond and takeover of the surrounding area parallel to Hitler's regime in Germany and invasion of various parts of Europe.[3][4] Though Seuss made a point of not beginning the writing of his stories with a moral in mind, stating that "kids can see a moral coming a mile off", he was not against writing about issues; he said "there's an inherent moral in any story" and remarked that he was "subversive as hell".[5][6] "Yertle the Turtle" has variously been described as "autocratic rule overturned",[7] "a reaction against the fascism of World War II",[8] and "subversive of authoritarian rule".[9]


“The world is on a turtle’s back” is Iroquois cosmology, at least.


Wow, thank you. I’d always thought this a Pratchett things.

I see it has a history in India and China too.

https://en.m.wikipedia.org/wiki/World_Turtle


How could at least two cultures without a communication line between them both come up with such a quirky idea? There must be some underlying truth to it.. I'm sold. World turtle is the answer.


Chinese writing began with turtle shells as well, that's why the mostly conform to a sort of grid system, with curves.


"The Turtle Moves"!


I'm sure they stepped on turtles to go back and forth across the bering strait.


> to lose one’s temper

Huh, never thought about that one before.

Linguistic stuff like this is fun to find; these days I mostly spot it via learning German as a second language, so the artifice in artificial intelligence becomes “Künstliches Intelligenz” where “Kunst” is artist and “Kunststoff” is plastic, and in Middle Low German “kunst” is knowledge and ability.

> coined

Deliberate choice to exemplify your point, or accidental because it’s almost impossible to avoid examples like this in modern English?


Something I've heard a few times is that computer "logs" refer to ships log books, but log books themselves refer to the actual wooden logs that would be thrown out of the back of ships to help determine their speed.


I always thought it was related to

https://en.wikipedia.org/wiki/Muhammad_ibn_Musa_al-Khwarizmi

Muhammad ibn Musa al-Khwarizmi

I think he was pouring over tables of data when he worked out algorithms.


>wooden logs that would be thrown out of the back of ships to help determine their speed.

see also, knots


Modern submarines still have a "log" which is a pole and sensor that extends outside of the hull to measure the speed through the water and other important measurements.


Just an interesting connection, in English, "plastic" comes from Greek, via Latin (and Medieval Italian) "to mold". We see this meaning show up in phrases like "neural plasticity," which refers to the brain's capacity to learn, (re)grow, and make new connections (e.g. knowledge and abilities).


It's also used when talking about magma.


> where “Kunst” is artist and “Kunststoff” is plastic, and in Middle Low German “kunst” is knowledge and ability.

Kunst means art, and an artist is a "Künstler" in German. (and Intelligenz is grammatically female, so there is no trailing s in "künstlich" in "künstliche Intelligenz". Its a difficult language.


Danke.

I was very surprised when I first found out about the local phrase, "Deutsche Sprache, schwere Sprache".


I think in the case of “to lose one’s temper”, there isn’t an obvious match between the technology of the day and the early medical theory of four humours, not of metal-working.

Origin of Temper, New Oxford American Dictionary:

Old English temprian 'bring something into the required condition by mixing it with something else', from Latin temperare 'mingle, restrain'. Sense development was probably influenced by Old French temper 'to temper, moderate'.

The noun originally denoted a proportionate mixture of elements or qualities, also the combination of the four bodily humours, believed in medieval times to be the basis of temperament, hence temper (sense 1 of the noun) (late Middle English). Compare with temperament.

https://en.m.wikipedia.org/wiki/Four_temperaments


So tempered steel is well-balanced steel, or mild-mannered.

See also the well-tempered clavier. In music, the temperament is an aspect of the tuning system relating to how the dissonances of different notes are balanced.


How did WTC not come to my mind, it is one of my favorite works! Maybe because it drives my spouse up the wall so I only listen it less often than I’d like. My favorite recording is the Ishikawa from OpenGoldberg [0].

Reflecting on my response to “lose one’s temper” I can see how a straight line reading of etymology (as I proposed) might be misleading if the specific idiom did come or return from steel or string as an enhancement/extension to the original’s meaning.

https://youtu.be/nPHIZw7HZq4


I think it is more that as technology grows it spreads its terms and contexts to the point of entering pop culture. I'm not sure if the populous uses terms simply becuase they have heard them before in the same context or due to an understanding.

I have to point out that P.G. Wodehouse is often used as an example of this style in recent "literature". I can't even figure out the words to describe the sources of his terms. Wodehouse use terms from anywhere in English language culture (including French I think). The odd part about it is that Wodehouse's writings are so old I find it easy to miss the references.

I don't doubt we do this but do expect that is no different than my love being as deep as the ocean.


This really has piqued my interest. Would you care to share an example or two, please?


It's hard to think of any concrete examples but I will set the scene as best I can and recommend listen to some of the 6 or 7 hour audiobooks narrated by Jonathan Cecil.

I just went searching through a bunch of quote lists to try to find examples.

I grabbed some just to show the breadth of metaphors and analogs used. What I think I realized is that some of the best examples are probably descriptions of the scenes.

I'm too young to know but apparently he was pioneering. I certainly find him funny with old world elocution.

I hope you feel satisfied.

His most famous character is Jeeves the personal gentlemen's gentlemen of Bertie Wooster. Askjeeves.com was named for Jeeves. They are set in the post Great War England/The continent and Bertie is young and part of the leisure class.

Bertie is over educated and deep into night life, pop culture, and sporting. The settings are always over-privileged people trying to work out there issues while Jeeves is the observer and advisor.

<snip> this I thought was good because the use of props, underpinnings, bird, orphanage, payoff.

Bertie Wooster: I was standing on Eden-Roc in Antibes last month, and a girl I know slightly pointed to this fellow diving into the water and asked me if I didn't think that his legs were about the silliest-looking pair of props ever issued to a human being. Well, I agreed that indeed they were and, for perhaps a couple of minutes, I was extraordinarily witty and satirical about this bird's underpinnings. And guess what happened next. Jeeves: I am agog to learn, sir. Bertie Wooster: A cyclone is what happened next, Jeeves, emanating from this girl. She started on my own legs, saying that they weren't much to write home about, and then she moved on to dissect my manners, morals, intellect, general physique and method of eating asparagus. By the time she'd finished, the best that could be said about poor old Bertram was that, so far as was known, he hadn't actually burnt down an orphanage. Jeeves: A most illuminating story, sir. Bertie Wooster: No, no, no, no, no, Jeeves, Jeeves, you haven't had the payoff yet! Jeeves: Oh, I'm so sorry, sir! The structure of your tale deceived me, for a moment, into thinking that it was over. Bertie Wooster: No, no, no, the point is that she was actually engaged to this fellow with the legs. They'd had some minor disagreement the night before, but there they were the following night, dining together, their differences made up and the love light once more in their eyes. And I expect much the same results with my cousin Angela. Jeeves: I look forward to it with lively anticipation, sir.

<snip>

Jeeves: I hope you won't take it amiss, sir, but I've been giving some attention to what might be called the "amatory entanglements" at Brinkley. It seems to me that drastic measures may be called for. Bertie Wooster: [sighs audibly] Drastic away, Jeeves. The prospect of being united for life with a woman who talks about "little baby bunnies" fills me with an unnamed dread.

<snip> gaming the use of chip-in

Bertie Wooster: Oh, very well, then. If you're not going to chip in and save a fellow creature, I suppose I can't make you. You're going to look pretty silly, though, when I get old Biffy out of the soup without your assistance. <snip>

<snip> this has a few but is a good example of using the reaction of a character in movie to describe ones self.

“I felt most awfully braced. I felt as if the clouds had rolled away and all was as it used to be. I felt like one of those chappies in the novels who calls off the fight with his wife in the last chapter and decides to forget and forgive. I felt I wanted to do all sorts of other things to show Jeeves that I appreciated him.” ― P.G. Wodehouse, My Man Jeeves

<snip> this is good becuase he uses Shakespeare

Bertie Wooster: Well, let me tell you, Mr. Mangelhoffer, that the man that hath no music in himself is fit for... hang on a minute. [goes into the other room, where Jeeves is peeling potatoes] Jeeves, what was it Shakespeare said the man that hadn't music in himself was fit for? Jeeves: Treasons, stratagems, and spoils, sir. Bertie Wooster: [returning] Treasons, stratagems, and spoils. Mr. Mangelhoffer: What? Bertie Wooster: That's what he's fit for, the man that hath no music in himself.

<snip>

Aunt Dahlia: Oh, Bertie, if magazines had ears, Milady's Boudoir would be up to them in debt. I've got nasty little men in bowler hats knocking at my door.


Thank you very much for finding these! I find it surprising that I remember these quotes from the 90s Hugh Laurie and Steven Fry television adaptation 'Jeeves and Wooster' - few TV programmes are faithful enough to their original book to include the dialogue verbatim!


That series was my introduction and it was truly brilliantly done. I consider must watch TV. They managed to preserve everything but the exact context of the story. Fry and Laurie is what makes that show work. I'm not sure anyone else could pull it off. I have seen other adaptations. B&W movies and such and the magic is lost. That series is an exception that proves the rule.

I once tried to read him and found it difficult but the audio books ended being a good listen and available on YouTube.

What the books seem to reveal is just how central Wodehouse is to modern comedy. There are obvious Wodehouse references in Seinfeld. Such the surname VanDelay.


When my legs give out from below me, I don’t shout “loss of hydraulic pressure!” like some kind of arthropod.

Yet we’re obsessed with framing ourselves as chains of matmul.


>as metaphors and idioms of mental state through the centuries

after listing a bunch of things from the previous century that barely stretches a bit further back. i would have been impressed if you had come up with a phrase from medieval tech, or roman/greek/egyptian. hell, i'd settle for pioneer days tech to allow for "centuries". otherwise, it just feels like modern day analogies.


Prior to the scientific age, most theory of mind was religious and/or philosophical, and models typically ran to mind/body duality (Descartes), ideals and essences (Plato & Aristotle, generally), or "spirit" which had numerous associations, many of them textual, which was itself the great technology of the Axial Age in which that concept emerged.

Otherwise, in the scientific and technical era, you have the brain as computer, as AI, as homonculus (which really doesn't explain much), as mechanism or clockwork, as a composed of parts (much as a factory or assembly-line, I suppose), and the like.


> i would have been impressed if you had come up with a phrase from medieval tech, or roman/greek/egyptian

the dual metallurgical and psychological senses of “temper” are from the mid-14th c.


It's possible a lot of those are not even seen anymore as such because the use has become commonplace and the original meaning was lost to time


He's an open book?

I needed to let off some steam?


I was thinking something more like "tied up in knots" or "wolf in sheep's clothing". Things before 1800s relevancy. Open book might be a little older, but surely, there were phrases older than that


Do these count?

As the weekend wound down

Back-handed compliment


Ooh, that's very interesting. How would I find the metaphors used before, say, the industrial revolution?


Now the technology is starting to reflect the biology (neural networks). Inception!


Give two very different things the same name and soon enough many people will believe them to be the similar in nature.

If you repeat the lie often enough if will become the truth.


Not really a lie, just biomimicry. The full name is "artificial neural network" after all.

It's a lot closer to biology than the steam engine or clockwork watch.


Abstract algebraic equations are not closer to biology than the steam engine or clockwork watch.

The latter are at least physical. Gradient descent and inference doesn’t resemble the physical mechanisms that drive neurons at all. Floats and integers aren’t even capable of representing a continuous voltage potential.


There is zero relation between a "neuron" in neural networks and a real neuron. It's entirely marketing. The field of "AI" has always been quick to market their work as magic rather than be open and honest about the reality, which is why we have been through at least 3 AI winters already.


Well, like evolutionary algorithms, the thing was designed to replicate some features of the natural one.


> cache miss

Ooh, I like that one. Stealing it


>trying to understand the human brain as a kind of algorithm was a bit of an artefact of computer scientists as they approach biology

I think the advances in neural networks over the past few years have shown that the failure of such an approach was mostly a matter of scale. Trying to reduce the visual system into a few kilobytes of code is of course a fool's errand, but trying to emulate it with ~10^11 parameters is looking much less foolish.

"The brain is a computer" is a stupid analogy if you think of a computer as a scalar or vector machine, but it's much less stupid if you're thinking in tensors.


Especially not purely RGB cameras. There’s a reason why you automatically focus to something that’s moving or fluttering. I think DVS camera would have bridged a huge gap in perception sensing but unfortunately there’s not enough demand for it to scale so most manufacturers dropped it.


As someone who minored in neuroscience, I took away that edge detection is actually quite important to the way your vision works. Google search "center-surround receptive field of retinal ganglion cells". This happens in your eye, before the signal even enters the optic nerve to go to the brain. The brain itself is not detecting the edges; its input already has that information.

I was also struck by the similarity between the way your cochlea (the organ in your ear that picks up sound waves) functions, and the way a Fourier transform works. They both transform the signal into the frequency domain, but your cochlea does it via its mechanical properties rather than by convolving the signal with a bunch of sine waves.


>The truth is the visual cortex is vast, and not sufficient to explain the human classification and perception of objects visually.

There have been experiments which have exactly located individual neurons and sets of neurons responsible for the first layers of image recognition. i.e. a neuron that fires when a specific spot on a retina is stimulated (a single pixel) and neurons that fire which detect lines. This is not theoretical but actual probing of living brains. I'll find the paper(s) later.


Agreed, There certainly is, in CS, no shortage of engineers cramming square pegs into round holes.


Great way to put it

The fields of Cognitive Science would view humans as proto-robots

I'd love to revitalize the field of "cybernetics" because it really answers all of this long ago


Huh, right up my alley... a while back I was playing with trying to turn photos into sketches ( https://hachyderm.io/@bazzargh/109928618521729073 ); one of the effects I found really noticeable with my super-naive approach was that it tended to overemphasise very dark areas in a way we don't see.

Then last week I saw this, which also does some fill in shading (but, using ML)-worth the click, it's a project that makes it appear as if paper held over an object causes it to be sketched...but in fact it's all post-processing... https://www.youtube.com/watch?v=vArIkzYtW6I

This got me back to wondering about the shading problem, and I ended up down a rabbithole reading papers like https://www.yorku.ca/rfm/pub/2021annrev.pdf (review article on state of the art), https://www.frontiersin.org/articles/10.3389/fpsyg.2022.9156... (recent markov model, with links to code) about lightness and brightness perception, and how our mental model of how the scene is lit can explain a bunch of optical illusions.

While I was going for a charcoal effect, this approach https://openprocessing.org/sketch/486307 - scribble using brightness as 'gravity' on the pen - is pretty nice.

Anyhoo, I'm not going to critique the article, because I'm a total amateur just doing this for fun, but it _is_ fun, and very satisfying when you get the computer to draw something that looks hand drawn instead of just a sobel filter.


Figure 4 in the article [1] is absolutely fascinating and seems to prove conclusively that the edge detection hypothesis is completely misguided: when one retains only the edges (B) of the original image (A), all meaning is lost and the resulting image is unrecognizable.

What A has that B lacks is shapes. There is obviously a rotund shape of the face that is reconstructed by the brain, that one can almost see, although technically it's not actually present in A. Not present, yet visible. Same for the eyes, smile, cheekbones.

It must trigger some kind of pattern-matching in the brain.

This problem seems to be similar to the famous optical illusion of the old woman / young woman [2] that works well also in strict black and white.

In this optical illusion there is surprisingly little information on the image, yet it can trigger two very different representations (that one can see alternatively, but never at the same time).

I think the brain tries to fit the shapes it sees in one box or another, and when it finds a box it builds a whole concept around it.

I also think that boxes have to pre-exist or be learned: they can't be inferred from an image, if one has never encountered the original representation in the wild.

One evidence for this is children are completely blind to optical illusions that have one "innocent" representation and one involving some kind of nudity or sexual activity, while adults tend to see the NSFW version first.

And so, to come back at the original question of why line drawing works, I think it's because it triggers concepts. The word "square" or "circle" are unambiguous and designate precise geometric shapes (provided one has learned the relevant concepts of square and circle).

Same with shapes, except that there are an infinite amount of different shapes that we can "discuss" using the language of line drawings.

[1] https://aaronhertzmann.com/images/howtodraw/sayim.jpg

[2] https://cdn.mos.cms.futurecdn.net/rQkQZ6pDZbEHz23rxckWPm-320...


Are there any studies on what, if any, animals respond to line drawings (and to what extent)? In particular, do chimpanzees and other apes closely related to humans?

It seems somewhat plausible that “lines-as-edges” was a foot in the door for something that specifically evolved in humans as our ancestors began using paintings and drawings for communication. Maybe it was initially just hijacking edge detection so that some images could be conveyed through drawings, and over time, and that developed into a kind of “grammar” for artificially depicting more nuanced images in media where realism was impossible.

This would be similar to how recognizing and interpreting different kinds of basic vocalizations (as many animals can) developed into a much more sophisticated mechanism for developing complex language.


I am not a computer vision specialist, but I wanted to add an anecdote from years of parenting.

One thing that amazed me about babies was how early they understand line drawings. Long before a baby can talk, it knows what a dog is (and can imitate a dog sound), and can also identify a dog in a photo, in a realistic drawing, and in a very simple line drawing.

It seems so easy, but in reality those things all look so very, very different. And after 6ish months, babies have mastered that recognition.


The article sheds doubt on edge detection as being the major reason why we can interpret drawings as what they attempt to portray, because drawings are often missing shading and color.

Drawings also often drop a great amount of detail, and proper proportion information, and remain recognizable.

Think of cave paintings or Picasso’s bull.

But if you consider image interpretation as a competitive classification, then missing information doesn’t present a problem.

If color, shading, detail & proportion are missing, then they are missing from all possible pattern interpretations equally. That leaves the final classification problem relatively unchanged despite all the missing info.

EDIT: in fact, if my hypothesis is true we should be able to see patterns with even less information!

For instance, dropping all internal detail, and even most shape. As in seeing a face profile on the side of a cloud that otherwise looks nothing like a human head.

Or dropping edges completely. Perhaps seeing an object where there are only stars creating points of light.

Please post your anecdotal experiments at 404experimentreports.com!


This is fascinating!

I remember reading that most optical illusions don't work on people raised in remote tribes in the Amazon, as their visual perception has been 'fine-tuned' for jungle contours, instead of the straight lines in the west.

Is it possible that we _learn_ how to perceive line drawings in our early years?


I think so and this is true about music as well as other arts.


My completely uninformed hypothesis is that it is edge detection, but on a depth map rather than a color map. A figure in the article even shows a situation where a depth sketch succeeds while edge detection on a color map fails


Kinda both, would be my guess. The brain reconstructs depth from visual information, so where the visual edge can be interpreted as a depth cue, you're faking out that mechanism. It also explains why the pure edge detection image just looks noisy, especially in the hair: most of the edges are effectively incidental colour shifts that don't provide depth information.


> Why do line drawings work? Why is it that we can immediately recognize objects in line drawings, even though they are not a phenomenon from our natural world.

This is wrong, as humans have been drawing and viewing art for tens of thousands of years. Also, for other animals, a form of edge detection is present in for example the tiger paw mark that a tiger leaves on the bark of a tree to mark territory, or simply paw/foot marks on the ground.


Line drawings work because they represent the centers of symmetry for surfaces and volumes. They trigger the same center-neurons that the original shape would.

A hand could appear on you retina in different sizes and orientations. According to distance, the size will grow and shrink. But the center of symmetry will stay the same.

This goes further. There's also a center of symmetry between edges, and higher-level features as well. Our brain has no issue detecting these.


The human brain also triggers on soft features.

It's a combination.


A drawing of a cup has 'cup-ness', as far as our mind is concerned. It's not about the micro mechanisms of edge detection or color or whatever, it's about how we recognize the quality of a cup in real cups and things that push the margins for what a cup is and in drawings of a cup.


There was a science fiction story of a race of aliens that had to draw what they found instead of photographing it. A photograph was flat and uninformative to them, almost completely lacking in meaning. Their science was based on interpretation at the root.

An interesting concept to put in a story anyway.


Drawing lines around semantic segmentation, such as using Segment Anything [1] seems to make a lot more sense than just doing edge detection on image brightness.

[1] https://segment-anything.com/


Figuring this out is one of the central questions of phenomenology... how is it that after seeing a couple of trees, we can recognize almost any kind of tree from any angle? What is intrinsic about a tree that makes this possible? What (or is) there a pure form of a tree?


Since definitions aren't concrete, and language is explicitly a popularity contest, you are not learning "what makes a tree a tree", but rather "what people call a tree"

You are re-deriving an existing definition, not an actual physical reality. This is a great example actually, because there is no genetic or biologic definition of a tree, and what we call a tree is matched better by some idealized representation of info like "has a trunk maybe". Also that the definition you develop is rarely rigorously tested in practice, so you only need about 80%, and basic pattern matching can get you that really easily.


Line drawings are just easy to produce, historically. This makes them embedded in our culture, and kids learn to read them from an early age.

The same skepticism could be raised towards letters and words.

If we had invented the photo camera before the paint brush or pen, things might have looked different.


This was a very interesting read.

From childhood we instinctively draw lines, but I never questioned how we produce them from real objects, and why they work so well at representing real objects. It’s one of those things we just take for granted.


Re. problem 2, Running edge detection twice will still give a recognizable result. Maybe some functions DO let you take the result of an internal layer, feed it to the input, and not get complete nonsense


Whilst reading Dr Seuss to kids I observed that even at a very early age they can mimic the expressions on the cartoon characters’ faces, even when they’ve never seen that drawing before.

An easy experiment to reproduce.


when a plane is receding from us, it is more likely to contain color variation, which or eyes sees as a line. it is more likely to contain color variation simply because a plane receding from us has more surface area relative to a unit of space in our visual cortex, versus a plane we are viewing at the position of a normal vector. the edges of a 3d object tend to be receding planes, hence we evolved to detect this


I don't understand the essay at all, perhaps because I'm not a domain expert. Edges are clearly a very strong signal in the visual system -- as the article points out, edge detection is one of the first things that happens in the visual pathway from the eyes. However, edges are clearly not the only signal, as the article demonstrates with cross-hatching, colour, and so on (and is also obvious to presumably all sighted humans).

If you remove edges as a signal -- say, by taking the coloured apple in the article, removing the lines, and blurring the colours around the silhouette -- you'd probably still recognise an apple, but not as quickly. For the same reason, if you defocus your eyes (or take off your glasses -- the popularity of these is a strong indicator that contour matters), you'll have more difficulty navigating, even though many signals (colour, shade, depth) are still present. Clearly edges are important, but are not the only thing we're working with.

Optical illusions also don't invalidate the hypothesis, because they almost by definition rarely occur in nature. Similarly, objects in extreme shadow don't invalidate it either, because, e.g., we are quite capable of recognising half a lion as a lion.

I think possibly there's a difference in interpretation here. The claim the author has an issue with is (his phrasing): "the lines in a line drawing are drawn at natural image edges, where an edge receptor would fire. These lines activate the same edge receptor cells that the natural image would. Hence, the line drawing produces a cortical response that is very similar to that of some natural image, and thus you perceive the drawing and the photograph in roughly the same way."

But the paper the author links to doesn't say that. It says "The likely explanation is that lines trigger a neural response that has evolved to deal with natural scenes." It's not claiming that line drawings and photographs are perceived in "roughly the same way", only that we evolved to recognise outlines and line drawings are outlines or at least contain outlines.

So problem 1 (what about all the other features), problem 3 (what is the benefit), problem 4 (visual art isn't just line drawings), and problem 5 (edge detection is not a line drawing algorithm) don't seem to really address the hypothesis.

That just leaves problem 2, "we can't see internal representations". I'm not sure what the statement is here. The author writes "The idea is that we have neurons that activate for object contours and similar, and that line drawings directly activate these neurons. Lines-As-Edges is a special case of this hypothesis. I don’t understand this claim at all." But that claim, as a hypothesis, seems very reasonable? The author seems to be saying that you can't bypass all the neuronal machinery to get directly to the contour-recognising neurons. But that's not true if all you're bypassing is the stuff to isolate the feature of interest (edges, in this case).


Here's a link to the paper that contains the "my hypothesis". The link in the article runs into a paywall.

https://arxiv.org/pdf/2002.06260.pdf


>A classic answer to this question is what I will call the Lines-As-Edges hypothesis. It says that drawings simulate natural images because line features activate edge receptors in the human visual system.

Why does the explanation need to go to that direction?

Line drawings have a liking to the thing being depicted.

They're a crude representation of it (compared to a photograph or a photo-realistic oil painting) but are nonetheless a represenation.

That's why they work, in the sense of people understanding what they show: they share similar patterns with the things being show. And we are pattern matches.

In this case the patterns are edge patterns, but it could just as well be non-edge patterns. Imagine a color drawing of a human face, where the ink drawn edges have been removed, and it's just blocks of color for the head, the eyes, the pupils, the mouth, the nostrils, ears, etc. We could still tell it's a face, even if we applied some gaussian blur to those blocks.

>The most basic statement of the problem with Lines-As-Edges is that the human visual system isn’t just an edge detector. You can see colors, you can see absolute intensities. You can tell the difference between a thin black line and the silhouette of a dark object against a light background; we have both kinds of receptors in the primary visual cortex, as well as others. Yet Lines-As-Edge supposes that the vision system discards all of this other information present in an image, for just this one special case. Why?

Isn't this taking things backwards?

It's not the vision system which "discards all of this other information present in an image" in our regular operation.

Rather, it's the line drawing with does away with (discards) all of this other information and only focuses on a thing's edges.

In other words, our visual system has capabilities A, B, C (say edge detection, color, 3d placement, etc). And, a line drawing gives it only A - which is still enough.

When our visual system also gets B and C, it can perceive objects even better. But for merely identifying something, A is apparently enough.

What I described here is totally compatible with Lines-As-Edges hypothesis, and makes sense too, so I don't see where the author's issue is, and why he thinks the fact that "the human visual system isn’t just an edge detector" invalidates the lines-as-edges hypothesis.

>Now you get a sense of the color of the object, and not just its outlines. How would one generalize Lines-As-Edges to account for these different types of depiction? The visual system is no longer ignoring everything aside some gradients; it’s now paying attention to some colors (and not others).

Yeah, so? It just means that the visual system can work with less or with more (and multi-type) information.

Lines-as-edges is a hypothesis for why line drawings "work" (are recognizable as the thing). The hypothesis doesn't say however that edges are the only thing the visual system can understand.

So, there's no need to "generalize Lines-As-Edges to account for these different types of depiction".

Lines-as-edges explains how line drawings are understood, period.

Line drawings with shading and color, add additional information, aside from the edges.

That's fine: no proponent of Lines-As-Edge ever said that the visual system only works with edges. Just that it works with edges when interpreting a line drawing which only offers edges.


People are so used to seeing images that it's impossible to recognize that they're an _illusion_. If you see a drawing of an apple, it is colored pigment on a page, it's not an apple, but it's _impossible_ to look at it and not see an apple.

_Why_ do we recognize it as an apple? It's certainly not an exact duplicate of the light rays that would enter your eye from a real apple. How different can it be from an apple and still be recognizable? What exactly is the mechanism by which it triggers the recognition?

Calling an image a "representation" or saying that it "has a liking" to the real thing is sort of begging the question. The question to be answered is: "In what way is it a representation? And how does it have a liking to the thing represented"?


>Calling an image a "representation" or saying that it "has a liking" to the real thing is sort of begging the question. The question to be answered is: "In what way is it a representation?

Represenation or having-a-liking is not some hazy notion though. It means there are things in our drawing that map to how those things are in the actual thing. And there are: edges, proportions, shapes.

But it's not some close mapping of edges in the literal sense. That is, a drawing doesn't have to follow the actual edges of the thing depicted with any accuracy for it to be recognized as such.

E.g. we could draw a stick figure instead of a detailed line drawing of a person, or a highly stylized "child drawing" style house, and their edges would look nothing like the edges of the real thing's. But those line drawings would still be easily recognizable.


> That is, a drawing doesn't have to follow the actual edges of the thing depicted with any accuracy for it to be recognized as such

This is literally what this link is about, but you do still recognize that there is are "why" and "how" questions to be answered here?


Yes, just not the why and how questions the author asks, or the way he phrases them...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: