Dall-E 2

Imnimo · on April 6, 2022

I'm only part way through the paper, but what struck me as interesting so far is this:

In other text-to-image algorithms I'm familiar with (the ones you'll typically see passed around as colab notebooks that people post outputs from on Twitter), the basic idea is to encode the text, and then try to make an image that maximally matches that text encoding. But this maximization often leads to artifacts - if you ask for an image of a sunset, you'll often get multiple suns, because that's even more sunset-like. There's a lot of tricks and hacks to regularize the process so that it's not so aggressive, but it's always an uphill battle.

Here, they instead take the text embedding, use a trained model (what they call the 'prior') to predict the corresponding image embedding - this removes the dangerous maximization. Then, another trained model (the 'decoder') produces images from the predicted embedding.

This feels like a much more sensible approach, but one that is only really possible with access to the giant CLIP dataset and computational resources that OpenAI has.

recuter · on April 6, 2022

What always bother me with this stuff is, well, you say one approach is more sensible than the other because the images happen to come out more pleasing.

But there's no real rhyme or reason, it is a sort of alchemy.

Is text encoding strictly worse or is it an artifact of the implementation? And if it is strictly worse, which is probably the case, why specifically? What is actually going on here?

I can't argue that their results are not visually pleasing. But I'm not sure what one can really infer from all of this once the excitement washes over you.

Blending photos together in a scene in photoshop is not a difficult task. It is nuanced and tedious but not hard, any pixel slinger will tell you.

An app that accepts a smattering of photos and stitches them together nicely can be coded up any number of ways. This is a fantastic and time saving photoshop plugin.

But what do we have really?

"Kuala dunking basketball" needs to "understand" the separate items and select from the image library hoops and a Kuala where the angles and shadows roughly match.

Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

I think the next step has got to be that it conjures up a 3d scene in Unreal or blender so you can zoom in and around convincingly for further tweaks. Not a flat image.

mrandish · on April 6, 2022

> This is a fantastic and time saving photoshop plugin. But what do we have really?

Stock photography sales are in the many billions of dollars per year and custom commissioned photography is larger still. That's a pretty seriously sized ready-made market.

> But if doesn't spit up exactly what you want can't edit it further.

I suspect there's a big startup opportunity in pioneering an easy-to-use interface allowing users to provide fast iterative feedback to the model - including positional and relational constraints ("put this thing over there"). Perhaps even more valuable would be easy yet granular ways to unconstrain the model. For example, "keep the basketball hoop like that but make the basketball an unexpected color and have the panda's right paw doing something pandas don't do that human hands often do."

dhosek · on April 6, 2022

I've adopted a practice of having odd backgrounds for video conferences.¹ I generally find these through Google image search, but I often have a hard time finding exactly what I would like. My own use case is a bit idiosyncratic and frivolous, but I can see this being really handy for art direction needs. When I used to publish a magazine, I would often have to commission photographs for the needs of the publication. A custom photograph (in the 90s) would cost from $200–$1000² depending on the needs (and none required models). Stock photo pictures for commercial use were often comparable in cost. Being able to generate what I wanted with a tool like this would have been fantastic. I think that this can replace a lot of commercial illustration.

⸻

1. My current work background is an enormous screen-filling eyeball. For my writing group, I try to have something that reflects the story I'm workshopping if I'm workshopping that week and something surreal otherwise.

2. My most expensive custom illustration was a title for an article about stone carver/letterer David Kindersley which I had inscribed in stone and photographed.

johnwheeler · on April 6, 2022

Or as a precursor to Meta Horizon build a 3D world with speech

https://www.fastcompany.com/90725035/metaverse-horizon-world...

recuter · on April 6, 2022

Certainly food for thought.

Say I'm looking for photography of real events and places, like a royal weeding or a volcano erupting does this help me? Of specific places and architectural features? Of a protest?

You're suggesting clipart on steroids: https://thispersondoesnotexist.com

I think if I was istockphoto.com I'd be a little worried, but that is microstock photography. I'm not sure that is worth billions. In fact I know it isn't.

Besides once this tech is wildly available if anything it devalues this sort of thing further closer to $0.

It would probably augment existing processes rather than replace them completely.

If you are doing a photoshoot for a banana stand with a human model with characteristics x,y,z you're still going to get a human from an agency or craigslist to pose. If suddenly the client informs you that they needed human a,b,c instead maybe one of these forthcoming tools will let you swap that out faster. You'd upload your photoshoot and an example or two of the type of human model you wished you had retroactively and it would fix it up faster than an intern.

Cool.

weird-eye-issue · on April 7, 2022

Shutterstock is a direct competitor of iStock and is a $3B company. I personally pay them $200/mo. Maybe you just don't know enough about this industry?

recuter · on April 7, 2022

Seems about right. Their yearly revenue is $700 million, I don't know about iStock as it isn't public. Any other big ones?

My hypothesis is that it could be a partial replacement/competitor and devalue their offering - reasonable to assume you'd be paying $99/mo soon and it will gradually decrease as the tech spreads and more competitors emerge.

Adobe is also in this game (https://stock.adobe.com), they are not unfamiliar with AI. You can see how a lot of people will jump on this if it proves to be lucrative.

I don't claim to be an expert and I didn't say this is worthless.

Imnimo · on April 6, 2022

Yeah, I mean you're right that ultimately the proof is in the pudding.

But I do think we could have guessed that this sort of approach would be better (at least at a high level - I'm not claiming I could have predicted all the technical details!). The previous approaches were sort of the best that people could do without access to the training data and resources - you had a pretrained CLIP encoder that could tell you how well a text caption and an image matched, and you had a pretrained image generator (GAN, diffusion model, whatever), and it was just a matter of trying to force the generator to output something that CLIP thought looked like the caption. You'd basically do gradient ascent to make the image look more and more and more like the text prompt (all the while trying to balance the need to still look like a realistic image). Just from an algorithm aesthetics perspective, it was very much a duct tape and chicken wire approach.

The analogy I would give is if you gave a three-year-old some paints, and they made an image and showed it to you, and you had to say, "this looks like a little like a sunset" or "this looks a lot like a sunset". They would keep going back and adjusting their painting, and you'd keep giving feedback, and eventually you'd get something that looks like a sunset. But it'd be better, if you could manage it, to just teach the three-year-old how to paint, rather than have this brute force process.

Obviously the real challenge here is "well how do you teach a three-year-old how to paint?" - and I think you're right that that question still has a lot of alchemy to it.

johnfn · on April 6, 2022

I gotta be missing something here, because wasn’t “teaching a three year old to paint” (where the three year old is DALLE) the original objective in the first place? So if we’ve reduced the problem to that, it seems we’re back where we started. What’s the difference?

Imnimo · on April 6, 2022

I meant to say that Dall-E 2's approach is closer to "teaching a three year old to paint" than the alternative methods. Instead of trying to maximize agreement to a text embedding like other methods, Dall-E 2 first predicts an image embedding (very roughly analogous to envisioning what you're going to draw before you start laying down paint), and then the decoder knows how to go from an embedding to an image (very roughly analogous to "knowing how to paint"). This is in contrast to approaches which operate by repeatedly querying "does this look like the text prompt?" as they refine the image (roughly analogous to not really knowing how to paint, but having a critic who tells you if you're getting warmer or colder).

astrange · on April 7, 2022

Well, original DALL-E also worked this way. The reason the open source models use searches is that OpenAI didn't release DALL-E, but only another project called CLIP they used to sort DALL-E output by quality. It turns out CLIP could be adapted to produce images too if you used it to drive a GAN.

There is a DALL-E model available now from another company and you can use it directly (mini-DALLE or ruDALL-E), but its vocabulary is small and it can't do faces for privacy reasons.

recuter · on April 6, 2022

I don't think it is actually painting at all but I need to read the paper carefully.

I think it is using a free text query to select the best possible clipart from a big library and blends it together. Still very interesting and useful.

It would be extremely impressive if the "Kuala dunking a basketball" had a puddle on the court in which it was reflected correctly, that would be mind blowing.

Imnimo · on April 6, 2022

This is actual image generation - the 'decoder' takes as input a latent code (representing the encoding of the text query), and synthesizes an image. It's not compositing or querying a reference library. The only time that real images enter the process is during training - after that, it's just the network weights.

recuter · on April 6, 2022

It is compositing as final step. I understand that the Kuala it is compositing may have been a previously un-existent Kuala that it synthesized from a library of previously tagged Kuala images... that's cool, but what is the difference really from just plucking one of the pre-existing Kualas into the scene?

The difference is just that it makes the compositing easier. If you don't have a pre-existing image that would match the shadows and angles you can hallucinate a new Kuala that does. Neat trick.

But I bet if I threw the poor marsupial at a basket net it would look really differently than the original clipart of it climbing some tree in a slow and relaxed manner. See what I mean?

Maybe Dall-E 2 can make it strike a new pose. The limb positions could be altered. But the facial expression?

And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it. 'etc.

This thing doesn't understand what a Kuala is like a 3-yr old. It understands the text "Kuala" is associated with that tagged collection of pixel blobs and can conjure up similar blobs unto new backgrounds - but it can't paint me a new type of Kuala that it hasn't seen before. It just looks that way.

dash2 · on April 6, 2022

>And if the basketball background has wind blowing leaves in one direction the Kuala fur won't match, it will look like the training set fur. The puddle won't reflect it.

If you read the article, it gives examples that do exactly this. For example, adding a flamingo shows the flamingo reflected in a pool. Adding a corgi at different locations in a photo of an art gallery shows it in picture style when it's added to a picture, then in photorealistic style when it's on the ground.

recuter · on April 6, 2022

Well not so much an article as really interesting hand picked examples. The paper doesn't address this as far as I can tell. My guess is that this is a weak point that will trip it up occasionally.

A lot of the time it doesn't super matter, but sometimes it does.

andybak · on April 6, 2022

> It is compositing as final step.

I might be misinterpeting your use of "compositing" here (and my own technical knowledge is fairly shallow) but I don't think there's any compositing of elements generally in AI image generation. (unless Dall-E 2 changes this. I haven't read the paper yet)

recuter · on April 6, 2022

https://cdn.openai.com/papers/dall-e-2.pdf

> Given an image x, we can obtain its CLIP image embedding zi and then use our decoder to “invert” zi, producing new images that we call variations of our input. .. It is also possible to combine two images for variations. To do so, we perform spherical interpolation of their CLIP embeddings zi and zj to obtain intermediate zθ = slerp(zi, zj , θ), and produce variations of zθ by passing it through the decoder.

From the limitations section:

> We find that the reconstructions mix up objects and attributes.

Jack000 · on April 6, 2022

The first quote is talking about prompting the model with images instead of text. The second quote is using "mix up" in the sense that the model is confused about the prompt, not that it mixes up existing images.

ML models can output training data verbatim if they over-fit, but a well trained model does extrapolate to novel inputs. You could say that this model doesn't know that images are 2d representations of a larger 3d universe, but now we have NERF which kind of obsoletes this objection as well.

recuter · on April 6, 2022

The model is "confused about the prompt" because it has no concept of a scene or of (some sort of) reality.

If we task "Kuala dunking basketball" to a human and present them with two images, one of a Kuala climbing a tree and another of a basketball player dunking - the human would cut out the foreground (Human, Kuala) from the background (basketball court, forest) and swap them places easily.

The laborious part would be to match the shadows and angles in the new image. This requires skill and effort.

Dall-E would conjure up an entirely novel image from scratch, dodging this bit. It blended the concepts instead, great.

But it does not understand what a basketball court actually is, or why the Kuala would reflect in a puddle. Or why and how this new Kuala might look different in these circumstances from previous examples of Kualas that it knows about.

The human dunker and the kuala dunker are not truly interchangeable. :)

andybak · on April 6, 2022

I'm not sure that's "compositing" except in the most abstract sense? But maybe that's the sense in which you mean it.

I'd argue that at no point is there a representation of a "teddy bear" and "a background" that map closely to their visual representation - that are combined.

(I'm aware I'm being imprecise so give me some leeway here)

astrange · on April 7, 2022

This model's predecessor could do image editing with some help:

https://arxiv.org/pdf/2112.10741.pdf

so it could distinguish individual objects from backgrounds. Other ML models can definitely do that; it's called "panoptic segmentation".

recuter · on April 7, 2022

Thank you! Fascinating, I didn't know about panoptic segmentation - that makes things much more interesting.

It really needs to expose the whole pipeline to become truly useful.

qq66 · on April 6, 2022

I think deep learning is better thought of as "science" than "engineering." Right now we're in the stage of the Greeks and Arabs where we know "if we do this then that happens." It will be awhile before we have a coherent model of it, and I don't think we will ever solve all of its mysteries.

uoaei · on April 7, 2022

We are getting closer with variational methods and kernel methods to achieving a more holistic framework for understanding machine learning (incl. traditional deep learning) training and inference. There is a deep unity in the fundamentals of machine learning, formed into a cohesive whole by applying the analytical techniques of statistical mechanics and Bayesian probability theory.

rileyphone · on April 6, 2022

It would be interesting to see more attempts to “reverse engineer” ML models like in https://distill.pub/2020/circuits/curve-circuits - maybe even with a ML model of its own?

ehsankia · on April 7, 2022

> But there's no real rhyme or reason, it is a sort of alchemy.

Is there a rhyme or reason as to why picasso decided to paint like that? Yes these networks are hard to reason about, but so are real human brains.

moyix · on April 6, 2022

> But if doesn't spit up exactly what you want can't edit it further.

Why? You can tweak the prompt, change parameters, or even use the actual "edit" capability that they demo in the post.

recuter · on April 6, 2022

Maybe I am misunderstanding but if you start tweaking the prompt you'll end with something completely different.

The "edit" capability, as far as I can tell please correct me if I got confused, is picking your favorite out of the generated variations.

I would like to "lock" the scene and add instructions like "throw in a reflection".

Jack000 · on April 6, 2022

This is exactly what they demo - they lock a scene and add a flamingo in three different locations. In another one they lock the scene and add a corgi.

recuter · on April 6, 2022

Not quite, it looks like this:

- Provide an existing image

- Provide a text prompt ("flamingo")

- Select from X variations the new image that looks best to you

  - It does the equivalent of a google image search on your "flamingo" prompt
  - It picks the most blend-able ones as a basis to a new synthetic flamingo 
  - It superimposes the result on your image

Very cool don't get me wrong. Now I want to tweak this new floating flamingo I picked further, or have that Corgi in the museum maybe sink into the little couch a bit as it has weight in the real world.

Can't. You'd have to start over with the prompt or use this as the new base image maybe.

The example with furniture placement in an empty room is also very interesting. You could describe the kind of couch you want and where you want it and it will throw you decent options.

But say I want the purple one in the middle of the room that it gave me as an option, but rotated a little bit. It would generate a completely new purple couch. Maybe it will even look pretty similar but not exactly the same.

See what I mean?

ricardobeat · on April 6, 2022

That's not how this works. There is no 'search' step, there is no 'superimposing' step. It's not really possible to explain what the AI is doing using these concepts.

If you pay attention to all the corgi examples, the sofa texture changes in each of them, and it synthesizes shadows in the right orientation - that's what it's trained to do. The first one actually does give you the impression of weight. And if you look at "A bowl of soup that looks like a monster knitted out of wool" the bowl is clearly weighing down. I bet if the picture had a more fluffy sofa you would indeed see the corgi making an indent on it, as it will have learned that from its training set.

Of course there will be limits to how much you can edit, but then nothing stops you from pulling that into Photoshop for extra fine adjustments of your own. This is far from a 'cool trick' and many of those images would take hours for a human to reproduce, especially with complex textures like the Teddy Bear ones. And note how they also have consistent specular reflections in all the glass materials.

recuter · on April 6, 2022

How do you propose we talk about what it is doing if not by using the terminology from the human editing process it is replacing? I'm struggling to express things.

My issue is that it appears to not be possible to explain what the AI is doing at all. If you could, you'd be able to actually control the output. And talking about how the model is trained is interesting but not an answer.

Of course there is a superimposing step, that just means it adds its layer on top of the photo you provide. That's all it means and that's literally what it is doing, that's all I tried to say, heh.

> If you pay attention to all the corgi examples, the sofa texture changes in each of them

Yes, exactly!

> This is far from a 'cool trick' and many of those images would take hours for a human to reproduce

OK, fair enough. I'll try to be more clear:

It is very cool and not a trick and the results are fantastic if you got out exactly what you wanted. Amazing time saver. And if not? Right now this is totally hit or miss.

It would also take hours for a human to reproduce a Vermeer and this no doubt has those in its training set and would style-transfer unto a corgi instantly. Certainly faster than Vermeer himself could do it.

But Vermeer could explain how he came up with the style, his techniques, choices, 'etc.

It reads like the advance here is that it will usually synthesize something that looks great but not always the thing that you want. With no recourse.

gwillen · on April 7, 2022

> Of course there is a superimposing step, that just means it adds its layer on top of the photo you provide. That's all it means and that's literally what it is doing, that's all I tried to say, heh.

It is not doing this. You are wrong. You are mistaken. You are confused. You do not understand what is happening.

(People have tried to tell you this several times, but you're not listening. shrug One more can't hurt.)

recuter · on April 7, 2022

I am specifically referring to the flamingo example: "DALL·E 2 can make realistic edits to existing images from a natural language caption."

You provide the background image and a text prompt and it doodles on top of the image you provided as per their demonstration. I wasn't referring to the other examples down the page where it conjures up a brand new image from scratch based on your image input.

It is great that you can tell it to add a flamingo and it fits into the background you provide nicely due to the well tuned style transfer. That part is cool. And it is impressive that sometimes the flamingo it adds is reflected in the water. But sometimes it isn't reflected. And it isn't up to you, it is up to it. And you can't tell it to add a reflection as a discrete step.

Look more carefully. This is more akin to a clipart finder, except if the clipart doesn't exist it uses the most similar thing in its training set to what it guesses you want as a starting point to synthesize new clipart from.

It doesn't add it in like an artist would and you can't control it at all. I don't know how to better express this.

This isn't unimpressive or un-useful but not quite as mind blowing on second glance.

visarga · on April 7, 2022

That sounds like "humanity of the gaps", retreating in smaller and smaller human only spaces.

https://en.wikipedia.org/wiki/God_of_the_gaps

recuter · on April 7, 2022

Is their model god and I'm the heretic?

Or am I in denial about how impressive this all really is by reading something slightly different into the static hand selected examples openai teased us with? :)

I'm sure two more papers down the line this thing will do what the true believers are convinced it already does perfectly much more seamlessly if they solve for my new favorite term, panoptic segmentation.

visarga · on April 7, 2022

The link was for analogy, like religious people who can't accept science still try to find "gaps" where science can't explain something so they can imply God is doing it.

recuter · on April 7, 2022

Personally I would happily change my opinion as soon as openai shares an interactive demo.

Maybe it really will magically output everything to my satisfaction, what a time to be alive! :)

astrange · on April 7, 2022

> But Vermeer could explain how he came up with the style, his techniques, choices, 'etc.

Often they can't. Ramanujan couldn't explain how he solved math problems, for instance, and humans can forget their own history easily, or even forget how to do something consciously while still doing it through muscle memory.

An ML model wouldn't forget the same way, but it could just lie to you.

ricardobeat · on April 7, 2022

Being opaque to human understanding is one of the downsides of existing AI/ML tech, for sure. Check out to the video in the page, and notice how the images transition from random color blobs to increasing detail - that's showing you how the image is being generated. It's a continuous process of trying to satisfy a prediction, there are no discrete editing steps.

The kind of tech you're imagining, where the computer has semantic understanding of what's in the picture, and is reproducing something based on a 3D scene, knowledge of physics, materials, etc is probably decades away. In that sense yes, this is just a 'trick'.

malka · on April 6, 2022

Are your affirmations based on the content of the paper ?

woko · on April 7, 2022

> Very interesting, potentially useful. But if doesn't spit up exactly what you want can't edit it further.

DALL-E 2 spits as many outputs as you want. Then you choose the one you prefer.

mahastore · on April 6, 2022

I wish there was something available in open source that has similar functions i.e sensible amalgamation of pictures based on some text.

astrange · on April 7, 2022

https://github.com/borisdayma/dalle-mini

https://rudalle.ru/en/

Smaller reproductions of the original research.

krick · on April 6, 2022

While the whole narrative of your comment totally makes sense, I don't really see the difference between the two approaches, not on a conceptual level. You still needed to train this so called "prior" at some point (so, I'm also not sure if it's fair to call it a "prior"). I mean, the difference between your two descriptions seems to be the difference between descriptions (i.e., how you chose to name individual parts of the system), not the systems.

I'm not sure if I'm speaking clearly, I just don't understand, what's the difference between training "text encoding to an image" vs "text embedding to image embedding". In both cases you have some kind of "sunset" (even though it's obviously just a dot in a multi-dimension space, not the letters) on the left, and you try to maximize it when training the model to get either a image-embedding or a image straight away.

Imnimo · on April 6, 2022

Yeah, my comment didn't really do a good job of making clear that distinction. Obviously the details are pretty technical, but maybe I can give a high-level explanation.

The previous systems I was talking about work something like this: "Try to find me the image the looks like it most matches 'a picture of a sunset'. Do this by repeatedly updating your image to make it look more and more like a sunset." Well, what looks more like a sunset? Two sunsets! Three sunsets! But this is not normally the way images are produced - if you hire an artist to make you a picture of a bear, they don't endeavor to create the most "bear" image possible.

Instead, what an artist might do is envision a bear in their head (this is loosely the job of the 'prior' - a name I agree is confusing), and then draw that particular bear image.

But why is this any different? Who cares if the vector I'm trying to draw is a 'text encoding' or an 'image encoding'? Like you say, it's all just vectors. Take this answer with a big grain of salt, because this is just my personal intuitive understanding, but here's what I think: These encodings are produced by CLIP. CLIP has a text encoder and an image encoder. During training, you give it a text caption and a corresponding image, it encodes both, and tries to make the two encodings close. But there are many images which might accompany the caption "a picture of a bear". And conversely there are many captions which might accompany any given picture.

So the text encoding of "a picture of a bear" isn't really a good target - it sort of represents an amalgamation of all the possible bear pictures. It's better to pick one bear picture (i.e. generate one image embedding that we think matches the text embedding), and then just to try to draw that. Doing it this way, we aren't just trying to find the maximum bear picture - which probably doesn't even look like a realistic natural image.

Like I said, this is just my personal intuition, and may very well be a load of crap.

astrange · on April 7, 2022

A bit more detail is that CLIP isn't designed to directly solve "is this a bear" aka "does this image match 'bear'". It's designed to do comparisons, like "which of images A and B is more like 'bear'". So it doesn't have a concept of absolute bear-ness.

OpenAI had no idea it could be used to generate images itself, which is why they left in issues like how it thinks an apple and the word "apple" written on a piece of paper are the same thing. Probably wouldn't have released it if they did know.

duxup · on April 6, 2022

This isn't something I'm knowledgeable on so forgive my simplification but is this like a sort of micro services for AI. Each AI takes their turn handing some aspect, another sort of mediates among them?

Imnimo · on April 6, 2022

I'd say Dall-E 2 is a little more unified - they do have multiple networks, but they're trained to work together. The previous approaches I was talking about are a lot more like the microservices analogy. Someone published a model (called CLIP) that can say "how much does this image look like a sunset". Someone else published a totally different model (e.g. VQGAN) that can generate images (but with no way to provide text prompts). A third person figures out a clever way to link the two up - have the VQGAN make an image, ask CLIP how much it looks like a sunset, and use backpropagation to adjust the image a little, repeat until you have a sunset. Each component is it's own thing, and VQGAN and CLIP don't know anything about one another.

astrange · on April 7, 2022

VQGAN (being a "GAN") is already two networks - one Generates things, and the other is Adversarial and judges if the other network is good enough, then you train them both at once and they fight.

CLIP+VQGAN generation IIRC works by replacing the adversarial network with CLIP, so it understands text prompts, then retraining it for a while towards the prompted target, then generating whatever it's learned from that.

GANs are a silly idea that shouldn't work but somehow do. There's some attempts to replace the idea: https://www.microsoft.com/en-us/research/blog/unlocking-new-...

Imnimo · on April 7, 2022

I think that in CLIP+VQGAN, the VQGAN model is frozen, and what you do is start from a random latent code, generate an image, pass it to CLIP, and the backprop through CLIP and through the VQGAN generator to figure out how you should move the latent code to make it better match the prompt. Then you just keep taking gradient ascent steps to find better and better latent codes. So it's like 'retraining', except you're 'training' the network input rather than the network weights.

duxup · on April 6, 2022

Got it, thanks.

Makes sense to me as far as avoiding a sort of maximized sunset that is always there and is SUNSET rather than a nice sunset... but also avoiding watering it down and getting a way too subtle sunset.

It's not AI but I've been watching some folks solving / trying to solve some routing (vehicles) problems and you get the "this looks like it was maximized for X" kind of solution but that's maybe not what is important / customer perception is unpredictable. I kinda want to just come up with 3 solutions and let someone randomly click .... in fact i see some software do that at times.

Imnimo · on April 6, 2022

Yeah, I think the trick is that when you ask for "a picture of a sunset", you're really asking for "a picture of a sunset that looks like a realistic natural image and obeys the laws of reality and is consistent with all of the other tacit expectations a human has for an image". And so if you just go all in on "a picture of a sunset", you often end up with what a human would describe as "a picture of what an AI thinks a sunset is".

swalsh · on April 6, 2022

Do you think some of these techniques could be slightly modified, and applied to DNA sequences?

snek_case · on April 6, 2022

Maybe very very short (single-gene) sequences. The thing with DNA is it's the product of evolution. The DNA guides the synthesis of proteins, then the proteins fold into a 3D shape, and they interact with chemicals in their environment based on their shape.

In the context of a living being, different genes interact with each other as well. For example, you have certain cells that secrete hormones (many genes needed to do that), then you have genes that encode for hormone receptors, and those receptors trigger other actions encoded by other genes. There's probably too much complexity to ask an AI system to synthesize the entire genetic code for a living being. That would be kind of like if I asked you to draw the exact blueprints for a fighter get, and write all the code, and synthesize all the hardware all at once, and you only get one shot. You would likely fail to predict some of the interactions and the resulting system wouldn't work. You could only achieve this through an iterative process that would involve years of extensive testing.

Could you use a deep learning system to synthesize genetic code? Maybe just single genes that do fairly basic things, and you would need a massive dataset. Hard to say what that would look like. Is it really enough to textually describe what a gene does?

Jack000 · on April 6, 2022

This is all true, but it doesn't preclude the possibility of generating DNA. Human share a lot of DNA sequences with other animals, and the genetic differences between individual humans are even smaller. You might have trouble generating a human with horns or something, but a taller one is probably mostly an engineering problem.

What GPT-3 and DALL-E shows is that you can infer a lot based on the latent structure of data, even without understanding the underlying physical process.

snek_case · on April 8, 2022

Deep learning is probably not the right tool to generate a taller human. We've mapped the human genome. You could probably create a statistical model that pretty accurately maps different versions of genes to height. Then it would mostly be a question of swapping different versions of genes to get the result you want. With a statistical model, you would need a relatively small dataset (hundreds, or thousands of human genomes), and you wouldn't have to worry about errors being introduced.

dekhn · on April 6, 2022

probabilistic generative models have been applied to DNA and protein sequences for decades (my undergrad thesis from ~30 years ago did this and it wasn't even new at that point). The real question is what question you want to answer and what is this system going to do better enough to justify the time investment to prove it out?

j7ake · on April 7, 2022

The problem is that with DNA sequences you dont have good training sets.

With text and images you can leverage “ground truth” data (verified by humans) to train your model.

The DNA sequences I would look for methods that don’t require good ground truth data.

falcor84 · on April 6, 2022

>We’ve limited the ability for DALL·E 2 to generate ... adult images.

I think that using something like this for porn could potentially offer the biggest benefit to society. So much has been said about how this industry exploits young and vulnerable models. Cheap autogenerated images (and in the future videos) would pretty much remove the demand for human models and eliminate the related suffering, no?

EDIT: typo

sillysaurusx · on April 6, 2022

Depends whether you think models should be able to generate cp.

It's almost impossible to even give an affirmative answer to that question without making yourself a target. And as much as I err on the side of creator freedom, I find myself shying away from saying yes without qualifications.

And if you don't allow cp, then by definition you require some censoring. At that point it's just a matter of where you censor, not whether. OpenAI has gone as far as possible on the censorship, reducing the impact of the model to "something that can make people smile." But it's sort of hard to blame them, if they want to focus on making models rather than fighting political battles.

One could imagine a cyberpunk future where seedy AI cp images are swapped in an AR universe, generated by models ran by underground hackers that scrounge together what resources they can to power the behemoth models that they stole via hacks. Probably worth a short story at least.

You could make the argument that we have fine laws around porn right now, and that we should simply follow those. But it's not clear that AI generated imagery can be illegal at all. The question will only become more pressing with time, and society has to solve it before it can address the holistic concerns you point out.

OpenAI ain't gonna fight that fight, so it's up to EleutherAI or someone else. But whoever fights it in the affirmative will probably be vilified, so it'd require an impressive level of selflessness.

kromem · on April 6, 2022

I don't think it's necessarily certain villainy for those who fight that fight as long as they are fighting it correctly.

There's a huge case to be made that flooding the darknet with AI generated CP reduces the revictimization of those in authentic CP images, and would cut down on the motivating factors to produce authentic CP (for which original production is often a requirement to join CP distribution rings).

As well, I have wondered for a long time how the development of AI generated CP could be used in treatment settings, such as (a) providing access to victimless images in exchange for registration and undergoing treatment, and (b) exploring if possible to manipulate generated images over time to gradually "age up" attraction, such as learning what characteristics are being selected for and aging the others until you end up with someone attracted to youthful faces on adult bodies or adult faces on bodies with smaller sexual characteristics, etc - ideally finding a middle ground that allows for rewiring attraction to a point they can find fulfilling partnerships with consenting adults/sex workers.

As a society we largely just sweep the existence of pedophiles under the rug, and that certainly hasn't helped protect people - nearly one in four are victims of sexual abuse before adulthood, and that tracks with my own social circle.

Maybe it's time to all grow up and recognize it as a systemic social issue for which new and novel approaches may be necessary, and AI seems like a tool with very high potential for doing just that while reducing harm on victims in broad swaths.

I'd not be that happy with an 8chan AI just spitting out CP images, but I'd be very happy with groups currently working on the issue from a treatment or victim-focus having the ability to change the script however they can with the availability of victimless CP content.

JohnBooty · on April 7, 2022

Thought-provoking post, thanks.

Especially the part about maybe generating specifically tailored material to "train" folks. Although, while obviously moral instead of immoral like "gay conversion therapy", I wonder if it would be just as ineffective.

    and would cut down on the motivating factors 
    to produce authentic CP (for which original 
    production is often a requirement to join 
    CP distribution rings).

Hmmmmm. Will machine-generated "normal" (i.e., non-CP) porn really eliminate the motivating factors to produce normal porn?

I obviously can't speak for enjoyers of CP. But when watching normal porn, I think part of the thrill for many/most people is knowing that what's happening is real.

Another potential risk is that a flood of publicly available, machine-generated CP might actually help the producers and distributors of real CP by serving as camouflage. Finding and prosecuting the people who make real CP is difficult enough already. Now, imagine if the good guys couldn't even reliably tell what was real and there were 100000x as many fake images as real ones floating around.

Yikes.

PoignardAzur · on April 14, 2022

> But when watching normal porn, I think part of the thrill for many/most people is knowing that what's happening is real.

I'm wondering how true that is.

Obviously, lots of people consume hentai, and platforms like Danbooru are immensely popular.

Also, speaking personally... some of the porn that I've consumed that felt the most "real" was 3D animations where the only real humans behind them were the SFM artists (and voice actors). These artists felt free to do scenes with, like, actual cinematography, with flirting and teasing and emotions between the characters, of a kind you never see even in softcore live-action porn.

So I do wonder how much potential AI generation has for completely substituting large parts of the porn industry.

Szpadel · on April 7, 2022

> Finding and prosecuting the people who make real CP is difficult enough already.

let's assume that AI generated CP should be illegal. Does it mean that possession of model that is able to generate such content should also be illegal? If not, then it's easy to just generate content on the fly and do not store anything illegal. But when we make model illegal, then how do you enforce that? Models are versatile enough to generate a lot of different content, how do you decide if ability to generate illegal content is just a byproduct or purpose of that model?

JohnBooty · on April 7, 2022

>> Finding and prosecuting the people who >> make real CP is difficult enough already.

> let's assume that AI generated CP should be illegal

Well that's a big assumption, lol. I definitely agree that it would be impossible to enforce, for the reasons you say.

I personally would not be in favor of such a law at all. Partially because it's unenforceable as you say, and partially on principle.

The argument against real CP is extremely clear: we deem it abominable because it harms children. That doesn't apply to computer-generated CP, or the models/tools used to produce it.

Ajedi32 · on April 8, 2022

I think you might be able to argue AI generated CP could cause indirect harm by feeding those desires and making people more likely to act on them, but I agree that's a far more fragile argument.

JohnBooty · on April 8, 2022

I think there's a big range of possibilities there and they're not mutually exclusive.

There's the possibility that watching FOO directly encourages viewers to do FOO in real life. Like you said, this is the most fragile. I think clearly this is true in some cases -- most of us have seen a food commercial on TV and thought, "I could really go for that right now." I'm less convinced that it's true for something like pedophilia: the average person will be revolted by it, not encouraged, unless they already are into that kind of awful thing.

There's the possibility that watching FOO doesn't directly encourage viewers to do FOO, but serves to kind of normalize it. I think this happens a lot, but I think it takes a carefully crafted context and message.

There's the possibility that AI generated CP could actually helps children, by providing a safe outlet for pedophiles so that they wouldn't need to do heinous shit in real life. I recall reading studies that instances of (adult) rape in societies were inversely correlated with the availability of (adult) pornography, with a possible explanation being that porn provided a safe outlet for people who weren't getting the kind of sex they wanted.

irae · on April 8, 2022

Most people are not developers and most people don't provide SaaS products. They are only consumers of existing technology.

In that sense, instead of enforcing non-existance of models, the enforcement could just make ilegal to provide any service that process inputs or provide outputs that are cp-like, by, i.e. obligating people with the models to add filters on input and/or after result is generated but before it is displayed or returned from computation.

danbruc · on April 7, 2022

But when watching normal porn, I think part of the thrill for many/most people is knowing that what's happening is real.

Unless you understand real to just mean that actual humans were involved, describing porn as real seems to be a bit of a stretch more often than not.

JohnBooty · on April 7, 2022

I gave my audience a bit of credit there.

I am assuming that any adult reading this understands that professional porn is quite different from the sex most of us experience in our private lives in a number of major ways, both emotionally and physically.[1]

But anyway, yes. By "real" I mean "real human beings, having real sex."

----

[1] There is a lot of homemade, amateur porn on the big well-known porn sites and it seems quite popular, and much of that is closer to what typical folks do at home. But that's beside the point.

egeozcan · on April 7, 2022

> exploring if possible to manipulate generated images over time to gradually "age up" attraction

If people already accepted that they need help, there are many good ways to treat people with unwanted sexual obsessions (trying to choose my words carefully here). I honestly don't think that it would help them to serve them more content.

However, I'd love to see some research to explore the possibility of involving machine generated content in psychological treatment. The core of your idea is IMHO brilliant.

aliakhtar · on April 6, 2022

How do you suppose your CP generator will be trained without using authentic CP images? Not only will that require revictimization but you’ll also be downloading CP to train the model.

Fernicia · on April 7, 2022

Did they need to put possums in space suits to tell Dell-E 2 how to render them?

aliakhtar · on April 7, 2022

No because there are lots of photos of possum and space that it has seen

hollowpython · on April 7, 2022

No, but kid genitalia pictures would need to be provided, right?

Sargos · on April 8, 2022

There are tons of legal medical images of that content as well. Training wouldn't require any damaging material.

loufe · on April 6, 2022

There are so many excellent, thought-provoking comments in this thread, but yours caught me especially. Something that came to mind immediately upon reading the release was the potential for this technology to transform literature, adding AI generated imagery to turn any novel into a visual novel as a premium way to experience the story, something akin to composing D-Box seat response to a modern movie. I was imagining telling the cyberpunk future story you were elaborating, which is really compelling, in such a way and couldn't help but smile.

aryamaan · on April 6, 2022

In the same theme, I liked the comments of both of you.

Another use case could be to make it easier/ automatic to create comics. You tell what the background should be, characters should be doing and the dialogues. Boom, you have a good enough comic.

-----------

Reading as a medium has not evolved with technology. Creating the imagery does happen in humans' minds. It's not surprise that some people enjoy doing that (and also enjoy watching that imagery) and others do not.

This could be a helping brain to create those imageries.

-----------

Now imagine, reading stories to your child. Actually, creating stories for your child. Where they are the characters in the stories. Having a visual element to it is definetly going to be a premium experience.

jazzyjackson · on April 7, 2022

I can also imagine the magical nature of a child being able to make up a story (as children are wont to do) and having Dall-E here generating a picture book as they go.

sillysaurusx · on April 6, 2022

Please write it! I'd love to read one.

mgdlbp · on April 7, 2022

I've thought for quite some time that questionable AI-generated content will lie at the heart of an forthcoming 'Infocalypse'. [0] Given the 2021 AI Dungeon fiasco over text-based AI-generated child porn, I shall posit that it's already upon us.

30 years since the original issue of encryption, it looks like cp trumps the other Horsemen of the Cyperpunk FAQ, with drug dealers and organized crime taking the back seat. It's interesting how misinformation is a recent development that they anticipate; a Google search shows that the term 'Infocalypse' was actually appropriated by discussions of deepfakes some time in mid-2020. That said, the crypto wars are here to stay—most recently with EARN IT reintroduced just two months ago.

The similar issue of 3D-printed guns has developed in parallel over the past decade as democratized manufacturing became a reality. There are even HN discussions tying all of these technologies together, by comparing attitudes towards the availability of Tor vs guns (e.g., [1]).

And there are innumerable related moral qualms to be had in the future; will the illegal drugs or weapons produced using matter replicators be AI-designed?

Overall, I think all of these issues revolve around the question of what it means to limit freedoms that we've only just invented, as technological advances enable things never before considered possible in legislation. (And as the parent comment implies, here's where the use of science fiction in considering the implications of the impossible comes in).

[0] https://en.wikipedia.org/wiki/Four_Horsemen_of_the_Infocalyp...

[1] https://news.ycombinator.com/item?id=8816013

aliakhtar · on April 6, 2022

Of course some level of censorship is needed, otherwise it can be used to produce porn involving real people without their consent (eg celebs)

Szpadel · on April 7, 2022

we already have a lot of fabricated content like that using current photo editing technology (Photoshop) and it's not causing much legal nor moral issues

aliakhtar · on April 7, 2022

Because it’s still pretty hard to make it and it’s bad - you can easily tell it’s fake.

This makes it as easy as typing a sentence - and the quality seems fairly realistic

BlueTemplar · on April 7, 2022

Considering where the progress on deepfakes is at, I'm going to have to disagree on both counts of "hard to make" and "easily tell it's fake".

chias · on April 6, 2022

Would this not necessarily require training it on a large body of real CSAM? Seems like it would be a non-starter.

sillysaurusx · on April 6, 2022

Surprisingly no. It knows what a child looks like, and can infer what a naked child looks like from medical imagery.

A child with adult body parts is a whole other class of weirdness that might pop out too.

Models want to surprise us all.

BlueTemplar · on April 7, 2022

Relatedly, when checking for a related comment, I wanted to see what the current state of deep fakes progress was, so I went to the usual place where the bleeding edge for such things could be found.

First video clips were with the faces of your usual celebrities, but then suddenly I got "treated" to Greta Thunberg in the situations you might expect. I cut my exploration short.

Now, Greta Thunberg is actually 19 now (how time flies !), except that deep fake was most likely trained on her media appearances, which started when she was 15 !

(I guess at least that she wasn't a child any more, which might explain why those clips had not been almost immediately flagged and removed ?)

GauntletWizard · on April 6, 2022

Religious people don't only believe that porn harms the models, but also the user. I happen to agree, despite being a porn user - Porn is a form of simulated and not-real stimulation. Porn is harmful to the user the same way that any form of delusion is: It associated positive pleasure with stimulation that does not fulfil any basic or even higher-level needs, and is unsustainable. Porn is somewhere on the same scale as wireheading[1]

That doesn't mean that it's all bad, and that there's no recreational use for it. We have limits on the availability of various other artificial stimulants. We should continue to have limits on the availability of porn. Where to draw that line is a real debate.

[1] https://en.wikipedia.org/wiki/Wirehead_(science_fiction)

AYBABTME · on April 6, 2022

Iain Banks' "Surface Detail" would like to have a word with you.

This author's books are great at putting these sort of moral ideas to test in a sci-fi context. This specific tome portraits virtual wars and virtual "hells". The hope is of being more civilized than by waging real war or torturing real living entities. However some protagonists argue that virtual life is indistinguishable from real life, and so sacrificing virtual entities to save "real" ones is a fallacy.

Or some such, it's been a while.

uoaei · on April 7, 2022

No.

If people are exposed to stimuli, they will pursue increasingly stimulating versions of it. I.e., if they see artificial CP, they will often begin to become desensitized (habituated) and pursue real CP or even live children thereafter.

Conversely, if people are not exposed to certain stimuli, they will never be able to conceptualize them, and thus will be unable to think about them.

Obviously you cannot eliminate all CP but minimizing the overall levels of exposure / ease of access to these kinds of things is way more appropriate than maximizing it.

JohnBooty · on April 7, 2022

    If people are exposed to stimuli, they will pursue 
    increasingly stimulating versions of it.

This is not true in any kind of universal way.

If you enjoy car chases in movies, does that mean you're going to require more and more intense chase scenes, and then consume real-life crash footage, and ultimately progress to doing your own daredevil driving stunts in real life?

No, because at some point it's "enough."

Same with... literally anything we enjoy. Did you enjoy your lunch? Did you compulsively feel the need to work up to crazier and crazier lunches?

What about sex? Have you had sex? Do you feel the need to seek out crazier and crazier versions of it?

nemo44x · on April 7, 2022

> What about sex? Have you had sex? Do you feel the need to seek out crazier and crazier versions of it?

For porn and sex it's different though. Some people are attracted to things that are deviant and taboo. That's the part they're looking for. As pornography has become more widely accepted, a market has developed for more and more extreme forms of it. This has been documented. It's not the content per-se but rather the nature of it that is found attractive. So the idea is to find a line that's reasonable so the people that feel the need to get close to that line can have that urge fulfilled without damaging society.

A market will form for more and more extreme content as soon as the line moves and what was one taboo no longer is. An Overton window of sorts for pornography.

numpad0 · on April 8, 2022

There seems to be a small issue in GP's logical inference to me, in that he places artificial CP as proportional and wholly inferior replacement to real CP. As if, ham sandwiches and boiled sausages are _inferior_ replacements to blocks of body parts of animals on a dish.

I don't think this is the case, from anecdotal experiences; Hollywood chase scenes are much more exciting to me than real life crash footage, I've watched enough. They need cooking, and if you are cooking anyway, mixing artificial and "natural" ingredients can even be a problem than a positive.

Truth is always boring.

sneak · on April 7, 2022

> If people are exposed to stimuli, they will pursue increasingly stimulating versions of it. I.e., if they see artificial CP, they will often begin to become desensitized (habituated) and pursue real CP or even live children thereafter.

I have accumulated tens of thousands of headshots in video games but have yet to ever shoot a single real person in the face. More importantly, I have never had the urge to seek out same.

I am not sure that your initial premise has any truth to it.

uoaei · on April 7, 2022

The point is more "can you conceive of a headshot before you've ever witnessed one?" And the assertion is, no.

I should be explicit -- I am saying the exposure which makes one seek stimulus is merely a catalyst for deeper urges, not a generator of them as such. A certain level of inhibition (e.g. sociopathy) is required but IMO so is a prior conception of the deed.

In your example, if someone is predisposed to wanting to shoot actual people in the head, exposing them to video game headshots may distract in the short term but desensitizes and entrenches the image in the long term, possibly making it easier to decide to pull the trigger later on if they are sufficiently inhibited of social concerns. This does not happen for people with high inhibitions, or at least sufficient self-control.

wolverine876 · on April 7, 2022

> The point is more "can you conceive of a headshot before you've ever witnessed one?" And the assertion is, no.

I'm not sure that's true. Our brains can imagine a lot that we've never seen, though maybe not very accurately. Inventors and developers and artists do it all the time, if we are talking about the same thing.

I'm not sure that disproves your premise. Virtual experiences may make real ones easier, but some research and details about where it works, where it doesn't, would be helpful. Many training programs use virtual experiences, such as flight simulators.

ctoth · on April 7, 2022

> "can you conceive of a headshot before you've ever witnessed one?"

Am totally blind, have never been able to see, can still conceive of a headshot. So, yes?

uoaei · on April 7, 2022

"Conceive of" is a different idea than "visualize"...

krageon · on April 8, 2022

They can by definition not perceive a headshot, it is a visual thing. I'm not sure what point you're trying to make here, the difference is not germane to the conversation.

JohnBooty · on April 7, 2022

Is this just your own personal theory or opinion? Do you have some proof?

To put it as nicely as possible, this wildly contradicts reality as I have experienced it and observed others experiencing it.

throwaway675309 · on April 7, 2022

I'm not sure I agree with the statement, you're putting forth a lot of assertions without the actual quantitative data to back up what you're saying, and even though you think it sounds intuitive that doesn't necessarily make it valid.

I'd actually argue the reverse, I think you see a lot more effort towards acquiring things that are illegal than you would otherwise.

nemo44x · on April 7, 2022

It's documented well already. The Overton window for pornography has continued to move to more and more extreme forms as what was once considered unacceptable and taboo becomes socially acceptable. It's because there is a market for deviance. Some people are interested in what's taboo and off limits and so long as they are approaching or just crossing that line, they're happy. As we've moved that line these people are no longer happy with the status quo and want content that is taboo, so a new market forms around that.

Pornographers know this and talk about. Read David Foster Wallace's essay on it.

mikotodomo · on April 7, 2022

Wow I didn't even think of this, that people could use this for something so horrifying. I'm relived that the geniuses behind this seem so smart that they even thought of this too and prohibit using the AI for sexual images.

> Our content policy does not allow users to generate violent, adult, or political content, among other categories. We won’t generate images if our filters identify text prompts and image uploads that may violate our policies. We also have automated and human monitoring systems to guard against misuse.

techdragon · on April 7, 2022

Please tell me your being sarcastic…

This is arguably the most insipid and stupid crippling of a powerful tool for content creation I can think of. It’s worse than the adobe updates using every cpu core and locking up my machine once a week.

What counts as “political” hm? Want it to look like that Obama poster or perhaps you want a Soviet Union flag for your retro 80s punk… oops sorry “political”… let’s go to adult… hmm that’s even dumber is the model showing too much ankle? What about the obvious fact that this is just designed with a heterodoxy view of pornography and likely does nothing to stem the wildly various fetishes and other sexual proclivities that exist in the world…

It is effectively “we got squeamish and have done a bunch of stuff to stop you doing stuff that makes us squeamish, please don’t make us squeamish, we’re so worried we’re even checking for it in case you sneak something past us”…

They should comply with the law, try to prevent and also check for child porn… but otherwise just let users use the damn tool, if someone wants an Obama hope poster of a sexualised Mussolini jerking off onto a balloon animal… why the heck do they feel the need to say no to that. It’s a deeply repressive instinct that should be fought against whenever people start to “police” what is acceptable in artistic mediums.

I look forward to the reimplemented versions of this from efforts like EuletherAI and others.

hemreldop · on April 8, 2022

Nonsense, I think the opposite is true where if you can satisfy your urges in a way that doesn’t put you in jail for a decade, most people will take that route.

cm2012 · on April 6, 2022

I suspect that if a free version of this comes out and allows adult image generation, 90% of what it will be used for is adult stuff (see the kerfuffle with AIDungeon).

I can get why the people who worked hard on it and spent money building it don't want to be associated with porn.

krageon · on April 8, 2022

> I can get why the people who worked hard on it and spent money building it don't want to be associated with porn.

Why? Is there something inherently wrong with porn? Is it not noble to supply a base human need, based on some arbitrary cultural artifact that you possess?

Siira · on April 6, 2022

The problem might be that people are simply lying. Their real reasons are religious/ideological, but they cite humanitarian concerns (which their own religious stigma is partly responsible for).

wolverine876 · on April 7, 2022

> Their real reasons are religious/ideological, but they cite humanitarian concerns

Are you asserting that nobody has humanitarian concerns? If so, that's quite a statement; what basis is there? I've seen so many humanitarian acts, big and small, that I can't begin to count. I've seen them today. I hear people express humanitarian beliefs and feelings all the time. I do them and have them myself. Maybe I misunderstand.

Siira · on April 7, 2022

“Strawman argument”

soheil · on April 7, 2022

It'd be ironic if we ended up destroying our planet by using so much electricity to train models to generate a maximally optimal version of the type of content that you refer to similar to crypto mining.

gitfan86 · on April 7, 2022

When you combine advanced versions of this with advanced versions of GTP-3 you will not be able to tell the difference between AI and only fans.

I'm not saying that AI will pass all Turing tests. But as far as having a virtual girlfriend/prostitute.

wolverine876 · on April 7, 2022

> a virtual girlfriend/prostitute

I'm not picking on the commenter - by itself it's not a big deal - but look at the assumptions behind that comment, which I almost didn't notice on HN.

gitfan86 · on April 7, 2022

what assumptions? That there may be a market for AI girlfriends?

wolverine876 · on April 10, 2022

The assumptions that HN commenters find cis-normal females sexually appealing, and that they can't get a date.

techdragon · on April 7, 2022

Yeah you will. It’s not going to be very good at reproduction of the same exact thing each time. In some of the examples you see the textures changing wildly and it’s a classic problem with these models. The same input does not generate the same output, so it will be obvious that it’s generated when you can’t get the “model” to look the same between two photos in the same “photo shoot”

Synaesthesia · on April 7, 2022

Or maybe we don't want to encourage that behavior more.

thom · on April 6, 2022

People take their experiences of porn into real relationships, so I do not think this removes suffering overall, no.

hemreldop · on April 8, 2022

When you put it that way… yes since no one is hurt in the process and people with pedophilic conditions may be deterred from doing something in real life.

minimaxir · on April 6, 2022

A few comments by someone who's spent way too much time in the AI-generated space:

* I recommend reading the Risks and Limitations section that came with it because it's very through: https://github.com/openai/dalle-2-preview/blob/main/system-c...

* Unlike GPT-3, my read of this announcement is that OpenAI does not intend to commercialize it, and that access to the waitlist is indeed more for testing its limits (and as noted, commercializing it would make it much more likely lead to interesting legal precedent). Per the docs, access is very explicitly limited: (https://github.com/openai/dalle-2-preview/blob/main/system-c... )

* A few months ago, OpenAI released GLIDE ( https://github.com/openai/glide-text2im ) which uses a similar approach to AI image generation, but suspiciously never received a fun blog post like this one. The reason for that in retrospect may be "because we made it obsolete."

* The images in the announcement are still cherry-picked, which is therefore a good reason why they tested DALL-E 1 vs. DALL-E 2 presumably on non-cherrypicked images.

* Cherry-picking is relevant because AI image generation is still slow unless you do real shenanigans that likely compromise image quality, although OpenAI has likely a better infra to handle large models as they have demonstrated with GPT-3.

* It appears DALL-E 2 has a fun endpoint that links back to the site for examples with attribution: https://labs.openai.com/s/Zq9SB6vyUid9FGcoJ8slucTu

YeGoblynQueenne · on April 7, 2022

Regarding cherry-picking, the images of astronauts on horses look stunning, except for their hands. There's something seriously wrong with their hands.

Maybe give it another five years, a few more $billion and a few more petabytes/flops and it will be good. Then finally everyone can generate art for their own Magic: the Gathering cards.

(That's the end goal, right?)

gwern · on April 7, 2022

As I keep telling people: "hands are hard". This is why I went so far as to make a hand-specific dataset ("PALM" https://www.gwern.net/Crops#palm which of course now everyone is going to confuse with 'PaLM'...). Hands are just way too variable to learn easily.

My dataset is a start, but it may benefit from focused training, the way Facebook's new Make-A-Scene https://arxiv.org/abs/2203.13131#facebook (not DALL-E 2 quality but not far from it) has focused losses on faces.

a_bonobo · on April 7, 2022

Interestingly, hands are also something humans struggle to draw.

They're a very complex anatomical form, many small tendons and muscles. Many artists struggle to depict hands. They're not made out of a few straight lines like a torso, there's lots of skew going on. They're probably the hardest structure of the human body to 'learn' for a ML system.

matthew-wegner · on April 7, 2022

I think some of this is because hands are very involved in both communication and threat assessment, so we as humans put a lot of automatic attention on them. We aren't even usually aware of it--unless something looks off

Personally I really like leaning into the discontinuities and quirkiness of generated images. This is output from GLID-3: https://twitter.com/mwegner/status/1511139661095178241

yokto · on April 7, 2022

Hands are notoriously hard to even photograph. You very quickly get weird unnatural results with a camera in front of hands, so in a way I'm not surprised AI models struggle to produce satisfying imagery there too.

MichaelDickens · on April 8, 2022

Just a thought, I wonder if DALL-E would generate more realistic hands if you prompted it with something like

> an astronaut riding a horse, and the astronaut has five fingers on each hand

CivBase · on April 7, 2022

The Risks and Limitations section is particularly interesting to me. It's like a time capsule of society's current fears about technology. They talk about many ways this tech could be misused, but I don't think they've even scratched the surface.

An example off the top of my head: this could be used as advertising or recruitment for controversial organizations or causes. Would it be wrong for the USA to use this for military recruitment? Israel? Ukraine? Russia?

Another example: this could be used to glorify and reinforce actions which our society does not consider to immoral but other societies - or our own future society - will. It wasn't long ago that the US and Europe did a full 180 on their treatment of homosexuality. Will we eventually change our minds about eating meat, driving cars, etc?

Have they gone too far in a desperate bid to prevent the AI from being capable of harm? Have they not gone far enough? I don't know. If I was that worried about something being misused, I don't think I could ever bring myself to work on it in the first place. But I suppose the onward march of technology is inevitable.

bufferoverflow · on April 6, 2022

Not-so-open.ai

btdmaster · on April 6, 2022

https://www.eleuther.ai (text, not images, but free as in freedom)

refulgentis · on April 6, 2022

Katherine Crawson is @ Eletheur & IMHO is indisputably most responsible for the advances in text=>image generation. Dall-E 2 is Dall-E and her insight to use diffusion, the intermediate proof of concept of diffusion + Dall-E is GLIDE.

https://twitter.com/RiversHaveWings & https://github.com/crowsonkb

djoldman · on April 7, 2022

The paper cites K. Crowson 4 times:

[5] Katherine Crowson. Ava linear probe. https://twitter.com/RiversHaveWings/status/ 1472346186728173568?s=20&t=T-HRr3Gw5HRGjQaMDtRe3A, 2021.

[6] Katherine Crowson. Clip guided diffusion hq 256x256. https://colab.research.google.com/ drive/12a_Wrfi2_gwwAuN3VvMTwVMz9TfqctNj, 2021.

[7] Katherine Crowson. Clip guided diffusion 512x512, secondary model method. https://twitter. com/RiversHaveWings/status/1462859669454536711, 2021.

[8] Katherine Crowson. v-diffusion. https://github.com/crowsonkb/v-diffusion-pytorch, 2021.

learndeeply · on April 6, 2022

Diffusion models existed long before this announcement. I have no idea who this person is, but they did not invent this idea.

Edit: Diffusion models guided by CLIP*

refulgentis · on April 7, 2022

They did invent the idea of applying it to image generation, leading to OpenAI citing her _tweets_ (how cool is that?) in a paper for GLIDE, which as other comments note, looks just like a proof of concept of DallE-2.

qeternity · on April 6, 2022

open-your-wallet.ai

andybak · on April 6, 2022

Some freely available models

GLID-3: https://colab.research.google.com/drive/1x4p2PokZ3XznBn35Q5B...

and a new Latent Diffusion notebook: https://colab.research.google.com/github/multimodalart/laten...

have both appeared recently and are getting remarkably close to the original Dall-E (maybe better as I can't test the real thing...)

So - this was pretty good timing if OpenAI want to appear to be ahead of the pack. Of course I'd always pick a model I can actually use over a better one I'm not allowed to...

Jack000 · on April 6, 2022

With glide I think we've reached something of a plateau in terms of architecture on the "text to image generator S curve". DALL-E-2 is a very similar architecture to glide and has some notable downsides (poorer language understanding)

glid-3 is a relatively small model trained by a single guy on his workstation (aka me) so it's not going to be as good. It's also not fully baked yet so ymmv, although it really depends on the prompt. The new latent diffusion model is really amazing though and is much closer to DALLE-2 for 256px images.

I think the open source community will rapidly catch up with Openai in the coming months. The data, code and compute are all there to train a model of similar size and quality.

andybak · on April 6, 2022

Wow. Thanks for GLID-3. It was genuinely exciting for a few days but then I must admit latent diffusion stole my attention somewhat ;-)

What kind of prompts is GLID-3 especially good for? I remember getting lucky when I was playing around a few times but I didn't do it systematically.

Jack000 · on April 6, 2022

glid-3 is trained specifically on photographic-style images, and is a bit better at generalization compared to the latent diffusion model.

eg. prompt: half human half Eiffel tower. A human Eiffel tower hybrid (I get mostly normal Eiffel towers from LDM but some sensical results from glid-3)

glid-3 will be worse for things that require detailed recall, like a specific person.

With smaller models you kind of have to generate a lot of samples and pick out the best ones.

rahimnathwani · on April 7, 2022

Thanks!

Do you happen to know how much GPU RAM I need to run glid-3 and/or the latent diffusion model, if I don't want to run on colab?

Jack000 · on April 7, 2022

Just tried glid-3 with a batch size of one and I'm getting 4781MiB. The latent diffusion model peaks at 8403MiB

These are fp16 numbers though, you might need a recent nvidia card to run it.

rahimnathwani · on April 7, 2022

I'll try them out. I have an RTX 2070, which apparently supports fp16. But it only has 8GB RAM.

I used the instructions here to check: https://github.com/wang-xinyu/tensorrtx/blob/master/tutorial...

hwers · on April 6, 2022

They're also not censored on the dataset front and thus produce much more interesting outputs.

OpenAI has a low resolution checkpoint for similar functionality as this - called GLIDE - and the output is super boring compared to community driven efforts, in large part because of similar dataset restrictions as this likely has been subjected to.

FreeHugs · on April 6, 2022

How do you run such a Google Colab thing?

I don't see a run button?

On.. maybe "Runtime -> Run All" from the menu ...

Shows me a spinning circle around "Download model" ...

26% ...

Fascinating, that Google offers you a computer in the cloud for free ..

Now it is running the model. Wow, I'm curious ..

Ha, it worked!

Nothing compared to the images in the Dall-E 2 article but still impressive.

minimaxir · on April 6, 2022

Google is a company with a lot of spare VMs and GPUs.

However, the free GPU is now a K80 which is obsolete and barely sufficient for running these types of models.

nl · on April 6, 2022

You sometimes still get T4s. I got one last week and it was great.

loufe · on April 6, 2022

I think this is really neat, but definitely not on the same tier as DALL-E 2, at least from the cherry-picked images I saw.

andybak · on April 6, 2022

I'm not sure what you've seen but I've been very impressed indeed by some results I've obtained. Some less so.

It's hard to compare because we don't know how much cherry picking is going on with published Dall-E results (either v1 or v2)

My gut feeling is that it's in the same ballpark as Dall-E 1

perdanafm · on April 7, 2022

a cow and a farmer in their field looking at the sky

fbanon · on April 6, 2022

A friend of mine was studying graphic design, but became disillusioned and decided to switch to frontend programming after he graduated. His thesis advisor said he should be cautious, because automation/AI will soon take the jobs of programmers, implying that graphic design is a safer bet in this regard. Looks like his advisor is a few years from being proven horribly wrong.

educaysean · on April 6, 2022

I have degrees and several years of experience in both fields, and I can tell you that both are creative professions where output is unbounded and the measure of success is subjective; these are the fields that will be safe for a while. IMO it's fields such as aircraft pilots who should be most worried.

zarzavat · on April 6, 2022

The jobs of commercial pilots are very safe.

Pilots are not there to fly the aircraft, the autopilot already does that. They are there to command the aircraft, in a pair in case one is incapacitated, making the best decisions for the people on board, and to troubleshoot issues when the worst happens.

No AI or remote pilot is going to help when say... the aircraft loses all power. Or the airport has been taken over in a coup attempt and the pilot has to decide whether to escape or stay https://m.youtube.com/watch?v=NcztK6VWadQ

You can bet on major flights having two commercial pilots right up until the day we all get turned into paperclips.

javajosh · on April 6, 2022

>You can bet on major flights having two commercial pilots right up until the day we all get turned into paperclips.

Yes, this is the sane approach, since a jet represents an enormous amount of energy that can be directed anywhere in the world (just about). But that said, there seems to be enormous pressure to allow driverless vehicles, which also direct large amounts of energy directed anywhere in your city. IOW it seems like a matter of time before we say, collectively, screw it, let the computers fly the plane and if loss of power is a catastrophe, so be it.

throwaway675309 · on April 7, 2022

It's not as safe as you believe it to be, in the case of total electrical power failure in a fly by wire airliner, and the corresponding loss of hydraulic pressure there's very little that a pilot can do with that point.

As far as the extremely unlikely hostage situation goes, if it were AI controlled that would be even less likely attempts from people to hijack an airplane in the first place since there wouldn't be a human element a.k.a. a pilot that they could appeal to their emotion.

Workaccount2 · on April 7, 2022

I would agree that a bit more is required of Pilots, but similar to truck drivers, the skill required and hence the salary provided, will go down as the AI gets better and better.

I can easily imagine that at some point, pilots are replaced with technicians who are just there to fix redundant AI systems in case of failure.

zarzavat · on April 8, 2022

You’re describing the world we already live in. “technicians who are just there to fix redundant AI systems in case of failure” is one of the jobs of a modern pilot. It turns out that troubleshooting the redundant systems of a modern aircraft while it is in flight is also the hardest part of being a pilot, as it requires knowing every system inside and out, hence why no amount of automation will threaten their jobs.

nullc · on April 6, 2022

Interesting. Right now these ML models seem like essentially ideal sources of "hotel art" particularly because it's so subjective... you only need a human (the buyer!) to just briefly filter some candidates, which they would have been doing with an artist in the loop in any case.

For things like aircraft pilots, it's both realtime-- which means 'reviewer' per output-- you haven't taken a highly trained pilot out of the loop, even if you relegated them to supervising the computer-- and life critical so merely "so/so" isn't good enough.

bufferoverflow · on April 6, 2022

If this paper presents this neural net fairly, it pretty much destroys the market of illustrators. Most of the time when an illustration needed, it's described like "an astronaut on a horse in the style xyz".

dbspin · on April 6, 2022

You're describing the market for low end commodified illustration. e.g.: cheapest bidder contracts on Upwork or similar 'gig work' services.

In practice in illustration (as in all arts) there are a variety of markets where different levels of talent, originality, reputation and creative engagement with the brief are more relevant. For editorial illustration, it's certainly not a case of 'find me someone who can draw X', and probably hasn't been since printing presses got good enough to print photographs.

stubish · on April 7, 2022

I'd argue that the market has already been destroyed at this point, at least in some areas. Book covers seem to have been stock image overlaid with text for a long time now, and a race to the bottom for both the people producing the stock images and the intern adding typography. By cutting costs and quality, the bar has been lowered to the point the task can be completely automated. Our AI overlords already have an advantage in that they have time to actually read the book, a potentially useful input. Maybe they won't even need the prompt - just generate an image for what is happening in the story for interesting looking paragraphs and let the author or editor pick. Given the cost cutting in publishing generally, editors will be next followed by the publishing houses themselves as the value they add gets lowered while the automation at Amazon gets better.

astrange · on April 7, 2022

The best reason to hire a person to do something is they'll give you what you need, not what you asked for. An ML model does not do this.

Sateeshm · on April 8, 2022

I agree. But i'm sure someone will create an ML model that might give what you need, not just what you asked for. Good enough for commercial purposes, mostly.

Personally, I would never buy a painting generated by an ML model, or even a commercial illustration, if i can help it. The artist and their life experience is half the point of art, IMO.

pingeroo · on April 6, 2022

I mean was he really wrong? As models like OpenAI Codex get more powerful over time, they will start eating into large chunks of dev work as well...

chrisco255 · on April 6, 2022

Yes. Translating business requirements, customer context, engineering constraints, etc. into usable, practical, functional code, and then maintaining that code and extending it is so far beyond the horizon, that many other skillsets will replaced before programming is. After all, at that point, the AI itself, if it's so smart, should be able to improve itself indefinitely. In which case we're fucked. Programming will be the last thing to be automated before the singularity.

Unlike artwork, precision and correctness is absolutely critical in coding.

carnitine · on April 6, 2022

The tail end of programming will be the last thing to be replaced, maybe. I don’t see why CRUD apps get to hide under the umbrella of programming ultra-advanced AI.

chrisco255 · on April 6, 2022

Let me know when you can speak English to a computer and have it generate CRUD code that satisfies all engineering and design constraints. The AI will need to be dynamic enough to understand nuance, missing gaps in the requirements spec, have context on the application being built, able to suggest improvements on product design, know how to make changes through the same conversational interface, etc.

Accomplishing that is achieving general AI.

In the meantime, there are plenty of boilerplate ORMs and simplistic API template tools that make production of bog standard CRUD apps dead simple. Of course, they all have their drawbacks and trade-offs, and aren't always suitable. But I don't see the amount of software engineering work reducing as a result of these no-code, low-code tools, do you?

atleta · on April 7, 2022

Probably not. People tend to think that tasks that make us think hard require general intelligence just because that's the tool we use to solve that problem. The AI doesn't have to be very good to be able to replace CRUD web app developers (that is, most of us).

As I see, the real challenge to solve is for it to be able to hold context and be able to communicate iteratively. Also, as you say find missing gaps. That's important. Other than that, you tell it what you want, it creates something and then you tell it to change things around. Which is, BTW, pretty similar to how it works with biological life based developers. Though as we're lazy, we like to clarify a lot of things up front (and either drive customers crazy or teach them that this is the way it works). If you have an AI that spits out code in a few minutes, it may not matter a lot.

Most of the programming jobs are indeed about making relatively simple stuff from standard components.

nl · on April 7, 2022

Let me know when you can speak English to a computer and have it generate CRUD code that satisfies all engineering and design constraints. The AI will need to be dynamic enough to understand nuance, missing gaps in the requirements spec, have context on the application being built, able to suggest improvements on product design, know how to make changes through the same conversational interface, etc.

Let me know when you find a single programmer who can do that reliably.

la64710 · on April 7, 2022

Is it that hard to do? Just design a solution that uses Alexa voice services to parse the vocal input via NLP and then invoke a lambda function to call a sagemaker or gpt-3 model to generate code. Granted it will take a little while to be perfect but are we really far from it?

nlh · on April 6, 2022

Large chunks, yes, but all that means is that engineers will move up the abstraction stack and become more efficient, not that engineers will be replaced.

Bytecode -> Assembly -> C -> higher level languages -> AI-assisted higher-level languages

bckr · on April 6, 2022

> engineers will move up the abstraction stack and become more efficient

Above a certain threshold of ability, yes.

The same will hold true for designers. DALL-E-alikes will be integrated with the Adobe suite.

The most cutting edge designers will speak 50 variations of their ideas into images, then use their hard-earned granular skills to fine-tune the results.

They'll (with no code) train models in completely new, unique-to-them styles--in 2D, 3D, and motion.

Organizations will pay top dollar for designers who can rapidly infuse their brands with eye-catching material in unprecedented volume. Imitators will create and follow YouTube tutorials.

Mom & pop shops will have higher fidelity marketing materials in half the time and half the cost.

All will be ever as it was.

hackinthebochs · on April 6, 2022

History isn't a great guide here. Historically the abstractions that increased efficiency begat further complexity. Coding in Python elides over low-level issues but the complexity of how to arrange the primitives of python remains for the programmer to engage with. AI coding has the potential to elide over all the complexity that we identify as programming. I strongly suspect this time is different.

The space for "AI-assisted higher-level languages" sufficiently distinct from natural language is vanishingly small. Eventually you're just speaking natural language to the computer, which just about anyone can do (perhaps with some training).

dragonwriter · on April 6, 2022

The hard part of programming has always been gathering and specifying requirements, to the point where in many cases actually using natural language to do the second part has been abandoned in favor of vague descriptions that are operationalized through test cases and code.

AI that can write code from a natural language description doesn't help as much as you seem to think if natural language description is too hard to actually bother with when humans (who obviously benefit from having a natural language description) are writing the code.

Now, if the AI can actually interview stakeholders and come up with what the code needs to do...

But I am not convinced that is doable short of AGI (AI assistants that improve productivity of humans in that task, sure, but that expands the scope for economically viable automation projects rather than eliminating automators.)

Isinlor · on April 6, 2022

At some point we will be "replaced". When you get AI to be able to navigate all user interfaces, communicate with other agents, plan long term and execute short term, we will no longer be the main drivers of economical growth.

At some point AI will become as powerful as companies.

And then AI will be able to sustain positive feedback loop of creating more powerful company like ecosystems that will create even more powerful ecosystems. This process will be fundamentally limited by available power and the sun can provide a lot of power. Eventually AI will be able to support space economy and then the only limit will be the universe.

visarga · on April 6, 2022

> At some point we will be "replaced".

We will be united with the AI, we're already relying on it so much that it has become a part of our extended minds.

creata · on April 6, 2022

> we're already relying on it so much that it has become a part of our extended minds.

What's this in reference to?

nlh · on April 7, 2022

Google, Wikipedia, Siri, Alexa, the iPhone, etc.

plutonorm · on April 6, 2022

Just like all the horses replaced by cars who became traffic police?

0F · on April 6, 2022

Literally everyone on this website is in denial. They all approach it by asking which fields will be safe. No field is safe. “But it’s not going to happen for a long time.” Climate deniers say the same thing and you think they should be wearing the dunce hat? The average person complains bitterly about climate deniers who say that it’s “my grandkids problem lol” but when I corner the average person into admitting AI is a problem the universal response is that it’s a long way off. And that’s not even true! The drooling idiots are willing to tear down billionaires and governments and any institution whatsoever in order to protect economic equality and a high standard of living — they would destroy entire industries like a rampaging stampede of belligerent buffalos if it meant reducing carbon emissions a little but when it comes to the biggest threat to human well-being in history, there they are in the corner hitting themselves on their helmeted head with an inflatable hammer. Fucking. Brilliant.

dntrkv · on April 6, 2022

I don't think anyone is in denial about this, it's just not something anyone should concern themselves with in the foreseeable future. AI that can replace a dev or designer is nowhere close to becoming a reality. Just because we have some cool demos that show some impressive capabilities in a narrow application does not mean we can extrapolate that capability to something that is many times more complex.

jug · on April 7, 2022

I agree. It bears repeating: Where modern AI shines is where it does not matter to be precise, where programming absolutely _depends_ on being precise.

So, today some good AI applications are face detection, fingerprint detection, or generating art. Where you need to catch or generate the general gist of it without pixel precision.

Of course, programming might be under greater threat than we imagine. I can also not claim that anyone holding that position is just plain _wrong_. But I do believe that would take an AI breakthrough that is yet to happen. That breakthrough would also have absolutely crazy consequences beyond programming, because now we would have "exact AI" and the thought of that boggles my mind for sure.

0F · on April 6, 2022

I strongly and emphatically disagree. You frame it like we invented these AIs. Did we write the algorithms that actually run when it’s producing its output? Of course not, we can’t understand them let alone write them. We just sift around until we find them. So obviously the situations lends its self to surprises. Every other year we get surprised by things that all the “experts” said was 50 years off or impossible, have you forgotten already?

btmiller · on April 7, 2022

This comment settles it for me. You’re thoroughly way too hyperbolic in your assessment. If this was closer to reality you’d have been able to state your case in clear, realistic terms. That’s something no one has been able to do so far.

astrange · on April 7, 2022

I do deny it. Automation does not destroy jobs even if you're impressed at how good it is at painting; see "Luddite fallacy" and "lump of labor".

Claiming AIs are going to take over or destroy the world has been a basis of "AI safety" research since the 90s, but that isn't real research, it's a new religion run by Berkeley rationalists who read too many SF novels.

atleta · on April 7, 2022

The assumption that automation creates (or at least does not destroy) jobs is an extrapolation from the past despite the fact that the nature of automation is constantly changing/evolving.

Also, one thing that everyone seems to ignore is that even if the number of jobs are not reduced, the skill/talent level for doing those jobs may (actually DO) increase and also, switching careers does not work for everyone. So you'll inevitably have people without a job even if it's just that the job market is shifting.

But I argue that as automation reaches jobs with higher levels of sophistication, i.e. the jobs of more skilled workers, some people will simply be left out because of their talent won't be enough to do any job that has not been automated.

hackinthebochs · on April 6, 2022

What does nowhere close mean to you? 10 years? 50 years?

coldpie · on April 6, 2022

I'm trying to understand your point, because I think I agree with you, but it's covered in so much hyperbole and invective I'm having a hard time getting there. Can you scale it back a little and explain to me what you mean? Something like: AI is going to replace jobs at such scale that our current job-based economic system will collapse?

0F · on April 6, 2022

Most people get stuck where you are. The fastest way possible to explain it is that it will bring rapid and fundamental change. You could say jobs or terminators but focusing on the specifics is a red herring. It will change everything and the probability of a good outcome is minuscule. It’s playing Russian roulette with the whole world except rather that 1/6 for the good, it’s one in trillions for the bad. The worst and stupidest thing we have ever done.

coldpie · on April 7, 2022

Okay. What do you want me to do about it?

0F · on April 7, 2022

Just know it. Really think deeply about this important issue and try to understand it thoroughly so that you have a chance at converting others. Awareness precedes any preventative initiatives.

Algorithm space is large and guess-checking through it takes a lot of effort even when it’s automated like now. It requires huge amounts of compute. And meaningful progress requires the combined effort of the entire worlds intellectual and compute resources. It sounds implausible at first but this machine learning ecosystem is in fact subject to sanctions. There are extreme but plausible ways of reducing the stream of progress to a trickle. It just requires people to actually wake up to what’s happening.

pingeroo · on April 6, 2022

I agree that many of us are not seeing the writing on the wall. It does give me some hope that folks like Andrew Yang are starting to pop up, spreading awareness about, and proposing solutions to the challenges we are soon to face.

visarga · on April 6, 2022

> but when it comes to the biggest threat to human well-being in history

Evolution doesn't stop for anyone, don't think like a dinosaur.

plutonorm · on April 6, 2022

Ignorance is bliss in this case, because this is even more unstoppable than climate change.

You thought climate change is hard to hold up? Try holding up the invention of AI. The whole world is going to have to change and some form of socialism/UBI will have to be accepted, however unpalatable.

pizza · on April 6, 2022

No worry, the one thing humans can do that robots can't (yet) is fill spare time with ever more work https://en.wikipedia.org/wiki/Parkinson's_law

throwaway675309 · on April 6, 2022

I mean not really, even a layman non-artist can take a look at a generated picture from DALLE and determine if it meets some set of criteria from their clients.

But the reverse is not true, they won't be able to properly vet a piece of code generated by an AI since that will require technical expertise. (You could argue if the piece of code produced the requisite set of output that they would have some marginal level of confidence but they would never really know for sure without being able to understand the actual code)

csomar · on April 6, 2022

For computer work, I think there will be two category: Work with localized complexity (ie: draw an image of a horse with a crayon) and work with unbounded complexity (adding a button to VAT accounting after several meetings and reading on accounting rules).

For the first category, Dall-E 2 and Codex are promising but not there yet. It's not clear how long it'll take them to reach the point where you no longer need people. I'm guessing 2-4 years but the last bits can be the hardest.

As for the second category, we are not there yet. Self-driving cars/planes, and lots of other automation will be here and mature way before an AI can read and communicate through emails, understand project scope and then execute. Also lots of harmonization will have to take place in the information we exchange: emails, docs, chats, code, etc... That is, unless the browser is able to open a navigator and type an address.

esjeon · on April 7, 2022

What people ALWAYS miss is that AI can augment people. This AI is still a tool, and, with it, designers and illustrator can churn out better images faster than before, even without using stock images.

It's important to note that we still need professionals to guarantee the quality of the output from AIs, including this one. As noted in their issue tracker, DALL-E has very specific limitations, but these can be easily solved by employing dedicated professionals, who are trained to tame the AI and properly finish the raw output.

So, if I were running OpenAI, I'll clearly be experimenting with how their AIs and human interact, and build a training program around it for producing practical outputs. (Actually, I work in consumer robotics, and human adoption has been the biggest hurdle here. Thus, my claim here.)

--

In case of fine art, thou, I don't think they'll not get hit by this AI advancement. The biggest problem is that you simply can't get the exact image you want wit this AI. Even humans cannot transfer visual information in verbal form without a significant loss of details, thus a loss of quality. It's the same with AI, but, worse, because AI rely on the bias in a specific set of training data, and it never truly understands the human context in it (in the current level of technology).