I think it's quite easy to distinguish b/w DALL-E and human.
The biggest problem w/ current DALL-E is that it's really really bad at details, unless it's a stereotypical one. DALL-E delivers pretty good impression, but almost always fails to fill in the last mile. This is because DALL-E currently doesn't know what "perfection for human" is like..
Also, often times, DALL-E doesn't have a definitive style, meaning it composes images with different styles in one place. This is quite noticeable in the second image for "Teddy bears, mixing sparkling chemicals as mad scientists". The bubbles are obviously not in the same style. This hardly happens for humans because, welp, when someone draws a picture, one person draws with a limited set of tools in a single style. DALL-E itself is not subject to such limitations.
The obvious markers to me is mistakes / skill level. The AI is making mistakes no human would and it’s perfecting advanced techniques only experts pull off.
For example the space ones have incredible style, texture and shading, but the basket ball lines are nonsensical. While the human ones are quite basic and artistically primitive but all of the easy things are flawless.
And then the bears have an eye on their forehead and missing half their mouth mixed with extremely good textures everywhere else.
The human drawn ones were much more interesting to look at, as art, usually because of the emotion and hints of narrative conveyed by facial expressions. I also liked the giant cat.
These days famous artists often don't execute the artwork themselves, they leave that to craftspeople. They have the idea, and the fame to imbue it with attention which connects it with the semantic fabric of the current art scene.
This is mainly because DALLE-2 generates at 64x64, then uses two separate upscalers to reach 256px then 1024px. The problem is that the upscalers are much less capable than the main model, and each upscaling stage tends to compound the mistakes of the upstream image.
I imagine DALLE-3 will generate at 256px directly, and use something like latent diffusion for a free upscale to 1024px. This should eliminate most of the high frequency artifacts.
The best, most concise description of AI generated content (whether textual or visual) I’ve read (I believe it was here on HN) is that “it gets to the topic, but not to the point”.
In other words, while GPT, DALL-E, etc. are good at making things that superficially match the prompt given to them, vaguely seeming coherent, the moment you inspect them closer they do not guide your attention, or create a broader artefact with inter coherent sub parts, the way a human artist/writer would.
It’s hard to tell if that’s an inherent limitation of approaches based on statistical sampling rather than “true understanding” or not. That criticism has held so far over the last decade of progress, so I doubt “more parameters/training data/resolution/etc” is the answer.
A lot of designers just prostitute themselves to making the art as requested - very few infuse it with their own ideas. But they would like to. It won't surprise me if corporations replace them all with AIs. But people will see through it. It won't score as well. A decade or two might pass where almost all communication art is done by AI. In the end, designers will be in high demand because - Marvel movies to the contrary - you can't replace ideas and substance by just visually manipulating stuff. Even if that form of legerdemain works in the short run, it wears off because humans are drawn toward novel, human ideas. This is the whole reason 70s disco collapsed and punk music sprang up. The idea that you can endlessly sell generation after generation complete rip-off garbage runs into a wall because you have to dumb down the population to such an extent that e.g. they're all hooked on fentanyl and can't actually buy anything if you do that. So you then have to retread and market to the people with brains left, but they won't accept recycled corporate trash, and they're the ones you need to keep your brand alive.
You can buy and shovel out all the Maroon 5 you want but you can't be cool unless you co-opt Rage or GnR or some band that's legitimately cool and meaningful to people and outside the reach of what everyone knows was already for sale. AI art is always gonna be corporate garbage regardless of how perfect it looks because the people who guide it will always be corporate douchebags.
It's not that DALL-E2 has a lack of stylistic continuity, or "wholeness" in the image. That's just how it comes across to an untrained eye. That's what engineers are working hard to fix, but it can't fill the uncanny valley.
What you're really perceiving is that every detail in the human work has humor. The humor runs through it; everything, mistakes and all, is shot through with what we visually can see as a sense of the personality of the person who drew it. Not that an AI can't imitate each one, but it can only think inside the conceptual box of what it's trained to imitate.
For instance, the first graphic concept that came to my mind about astronauts playing basketball in space with cats would be to have the whole scene of a cat dunking in space shown in the reflection of an astronaut's visor. The raw literal image is cheap crap without the viewer seeing some human perspective that's trying to be conveyed... in other words, engineers need to go to art school because they're misunderstanding the value proposition behind human-made art.
I think the AI drawings are far better than the human ones.
The human ones are basically a collage of items accurately created to a strict specification, while objects in the AI-drawn ones are all integrated and more artistic and creative.
Although this is probably because the humans were people motivated by money who were paid to spend 15-30 minutes to make the artwork and just tried to check off the requirements (I can't possibly imagine a decent artwork being created in 15-30 minutes).
It's very interesting though that humans are more robotic than neural networks!
Superficially, the AI-drawn images seem better. But I showed these images to my wife asking if she could tell the difference and only the human ones invoked any sort of emotional reaction from her. The human-drawn ones were more interesting visually and told more of a story. She laughed at a few of them. She couldn't really tell the human from AI, but she definitely enjoyed the human-drawn ones more.
The cats not having space suits is a fun detail; it seems to 'understand' clothing on things and some other features now; but it doesn't understand why the astronaut has the space suit or that the cat would obviously need one. I wonder if 5 year olds notice that.
Also Saturn being a basketball is a dead giveaway. dall-e is terrible at localizing stuff to just one part of an image, so it doesn’t know that “space” doesn’t have to be basketball themed as well. It reminds me of when someone tried the prompt “happy sisyphus” and it drew a happy guy carrying a rock with a smiley face on it.
My intuition is that humans will continue to make art that takes advantage of technological advances, just like they always have.
The modern process of producing music would basically be unrecognizable to anyone 40 years ago — it's completely intertwined with technology, and far more automated. Yet music is as important as ever, and amazing music is being made (will politely side-step the pitfall of debating whether music was better 40 years ago!)
So I'm excited to see how visual artists incorporate tools like Dall-E into their artistic process.
> (will politely side-step the pitfall of debating whether music was better 40 years ago!)
I don't think you'll have many takers here suggesting that things were magically better 40 years ago. (We know about survivorship bias and that there was less of the design space search so much more novelty/amazing).
I do think we've gotten a few more axes of exploration but also have in the mainline of music has gotten much more homogenous in some ways as well, which is kinda sad.
Sophisticated tools are a bit of a trap. People tend to create in ways that their tools make easier. Tastes evolve around what's being created. And tools evolve to match those tastes, which in turn really optimizes everything to a specific local maximum. And tools cause loss of skills and dependency upon the tooling.
> I don't think you'll have many takers here suggesting that things were magically better 40 years ago.
Ha, fair point. I must not realize how old I am, because I was attempting to reference the music of the 1960s and 70s, not 1982, which I agree is not many people's idea of the golden year for music ("Come On Eileen" notwithstanding).
> Sophisticated tools are a bit of a trap. People tend to create in ways that their tools make easier.
No doubt. Ableton, logic, and protools have drastically altered the norms of what modern music is "supposed" to sound like (ie tuned vocals, quantized drums etc). I do wonder what the next generation of music tech will bring.
As someone whose high school years spanned 1982–6, I have a bit of a soft spot for that era.¹ I think, also, that in the early days at least, a lot of marginal bands who had videos ended up getting a bit more fame thanks to those videos than they would have pre-MTV.
⸻
1. I have a theory that most people tend to favor the pop music of their high school years. No idea if it’s true or not.
Oh I think your high school pop music theory is spot on. I grew up in the emo era, can actively laugh at the music/fashion now... and still love it more than anything :)
I believe the real AI music revolution is still ahead of us. Most existing tools don't really use modern deep neural nets and the ones that have just been released and do are quite good (e.g. Synthesizer V). The project I released yesterday uses all research advances (such as diffusion models) and more (https://www.youtube.com/playlist?list=PLoCzMRqh5SkFwkumE578Y...) and I think it's a large leap forward over what the existing melody generation software can do.
I think that pretty soon we'll see many hit songs that have a large contribution from AI models. It's possible that the variety can increase as well and new styles can appear, as the "good" generated songs are fed back as training data.
I don't think creative types will be satisfied with reaching a local maximum. They always try to find ways to stretch what's possible by looking for completely different ways to use said tools. Just the different genres of music points to this fact.
How I see AI fitting in is being used as an effort multiplier. Imagine if a single person was able to produce an entire anime / cartoon tv series rather than the current situation of hundreds of people being involved. The single human could act as a creativity director ensuring consistent story and style while the AI does the heavy lifting of pumping out hundreds of thousands of frames.
Currently we have youtubers putting out crude AI voice generated videos with basic visuals, and its still funny and entertaining because the story is good. But imagine if anyone with a good idea could start producing studio quality works to accompany their meme video.
A good example are Joel Haver's videos on youtube. He explains his process here: https://youtu.be/tq_KOmXyVDo . He produces animations by rotoscoping live footage, with only a few keyframes being done by hand and the rest by AI, using something similar to style transfer.
Before recorded music, if one wanted to listen to music one had to pay skilled artisans for every song. Now music is more or less free and available in unlimited quantities. The same will probably happen for art, and maybe even animation thanks to DALL-E.
It's relatively easy to tell which are AI made due to the weird artifacts they have that human drawings would not have.
For example, the first image has a basketball where the lines don't line up correctly, a visual artifact that lets you immediately know it was an AI. Human artists would have made all the lines line up correctly.
The thing that at first glance tipped me off to the AI images was one of the most common issues: several instances of whatever is the most important keyword.
Playing basketball? Multiple balls! The planet is a ball!
I actually like this dollop of creativity. Humans tend to ground their pictures in reality more than necessary for prompts that sit firmly in the realm of fantasy.
In general I would love it as a creative bit of whimsy. The problem is that it's a very specific type of nonsense which after a while becomes recognizable as the art style of AIs in general. This both makes it feel less original and stick out as a recognizable computer artifact.
That’s what made me think the first image was drawn by a human. I thought it was a cheeky and funny way of interpreting a prompt that the AI wouldn’t have done…
Yeah, it's particularly bad at eyes. Most 1 year olds are better at placing eyes on a face, and stopping at 2, than the ML. But, it's got other hallmarks of maturity that are completely discordant with the inability to make a plausible face.
First image is glaringly AI - cat head is weird, there's no way a cat head can look like that, the tail has holes in it (if you don't look closely you might confuse them as stripes), basket net is cut up in the middle, astronaut's hands are basically broken.
Fourth image it gets a bit confusing because the art is originally abstract, but the astronauts hands are again very easy to spot, the second biggest artifact are the cat heads, they are just close enough to be abstract but there's warping there that no human would do.
> cat head is weird, there's no way a cat head can look like that, the tail has holes in it (if you don't look closely you might confuse them as stripes)
Neither of those is the biggest problem with the cat - it has two left front legs and no right front leg.
I thought the biggest tell in the first four images was that the AI makes no attempt to draw correct lines on the basketballs.
Well, that picture captioned "DALL·E or Dalí?" is almost certainly drawn by an AI, because most humans have learned that you don't draw weird shapes anywhere near ones crotch, it's just something a half-decent human won't do.
Joke aside, the AI generated pictures are amazing. See those radishes? Most human-drawn radishes are happy. And those generated ones are happy too, even without explicit instruction to tell the AI to do so. I guess it really captured what humans collectively want.
> I guess it really captured what humans collectively want
Happy radishes? :)
Anyway, I wonder if things like these cause a feedback cycle that just makes everything super boring. Humans sometimes get bored and do unusual things, what about the machines?
I had an odd reaction reading the threads. People considering it worth posting that it is easy to tell which one is AI and which one is by a human .. are doing so because the Dall-e ones are damn good and not because te difference is too obvious, in which case nobody should bother.
I think it's not just that they're good enough that it's worthwhile discussing how it fails. It's also that it's the same way it fails as has been the weak point of AI generated images for years.
It's great and I love it, but the wow factor is fading and the easily recognizable problems remain.
Interesting that the human-drawn lab bears are nearly all clearly mad scientists out to conquer the world, while the AI bears are merely earnest… clowns I think?
Subjectively I find the human versions are almost always better even the poorly drawn ones, which isnt surprising, but even though DALLE is able to emulate elements of style it still seems to be lacking some gestalt or ability to connect on an emotional level. There's no accounting for taste it seems.
I find myself wondering: were any of these hundred humans professional artists, was there money accompanying this request to draw these prompts, and if so how much?
The company kind of sounds like “Mechanical Turk for checking AI decisions” so I’m assuming “surgers” are people being paid those kinds of rates for small tasks. Which is definitely less than I charge for art.
Imagine a future where every homemade bedtime story can be turned into an IMAX-quality cartoon…
Alternatively, imagine one where every website is flooded with mass-produced bland "art" like Corporate Memphis... we know what SEO spam looks like today, and I've been occasionally seeing what looks like AI-generated images show up in image searches. This feels like the latter --- good enough to look the part at a brief glance, but clearly "off" if you take a closer inspection (see comments about the lines on the basketballs.)
I suspect the art is bland as it is plentiful, consistent, and cheap. This is still a large improvement over many modern spaces which include large blank slabs of concrete, white walls, and if its "fancy" faux-brick.
What I want to see next is Dall-E produced variations from all the "human" versions in this post. That would illuminate Dall-E's limitations in terms of mimicking original styles.
P. S. Has anyone demonstrated actual style transfer with Dall-E? As in, provide an image by one artist and telling it to draw it as another artist would? Something along the lines of "Vincent Van Gogh's self portrait as drawn by Jack Kirby'.
I'd seen that post, and just skimmed it again to see if I missed something, but I don't see anything about style transfer unless you mean writing prompts like "thing as painted by artist". Which works when "thing" is a sufficiently well known work (as I suppose my "Van Gogh self portrait" example would be), but I meant something more along the lines of "[image] as painted by $ARTIST", or "[image] as drawn in $STYLE".
For me, the DALL-E generated imagery looks better than what an average human can draw, but there is still a gap to the skill of a true artist. However, commissioning artists is expensive, DALL-E less so.
So there's definitely pressure on illustrators on upwork, fiverr etc.
I think the difficult part is actually controlling the output. For an artist on fiverr you can say, "can you give the bear a red cap". Or "can you make them smile a little more". While with an AI its largely you get what you get. Other than tweaking the prompts or randomly adjusting parameters hoping to get what you wanted.
And also worth mentioning that each of those iteration only takes like 10sec, not days or weeks as with a real artist.
The one thing DALL·E2 is however missing is being specific, you can tell it to draw SuperMario and it will come up with something that looks like somewhat like SuperMario, but it is taking a lot of creative freedom along the way. As far as I can tell, it isn't able to draw the same character in different situations. Every image is unique and there is so far no way to coax it into drawing a series of multiple images that are consistent with each other, as you would need to illustrate a short story for example. Best example I have seen so far is this, but even that remains very abstract:
Dall-E, like most AI stuff, doesn't have a memory, so you can't just refer to an earlier image. Each time you run it, it starts from scratch. You might be able to (ab)use in-painting for this, i.e. have a comic panel with two image, fill the left side with an existing image, while letting Dall-E finish the right side. Though so far I haven't seen anybody pull that off successfully. In general Dall-E seems to struggle with complex scenes that have multiple characters. For example here is Dall-E trying to create variations of SuperMarioBros box art:
the Dall-e pictures are good, interesting, and feel the most human from any ai produced content I've seen. I don't get the same sense when I read ai generated text, even the best examples, although those results may be impressive too. Perhaps something about the visual pieces more convincingly imparts the sense of another fellow mind
The biggest problem w/ current DALL-E is that it's really really bad at details, unless it's a stereotypical one. DALL-E delivers pretty good impression, but almost always fails to fill in the last mile. This is because DALL-E currently doesn't know what "perfection for human" is like..
Also, often times, DALL-E doesn't have a definitive style, meaning it composes images with different styles in one place. This is quite noticeable in the second image for "Teddy bears, mixing sparkling chemicals as mad scientists". The bubbles are obviously not in the same style. This hardly happens for humans because, welp, when someone draws a picture, one person draws with a limited set of tools in a single style. DALL-E itself is not subject to such limitations.