Stable Diffusion 2.0

kmeisthax · on Nov 24, 2022

Is there a good explanation of how to train this from scratch with a custom dataset[0]?

I've been looking around the documentation on Huggingface, but all I could find was either how to train unconditional U-Nets[1], or how to use the pretrained Stable Diffusion model to process image prompts (which I already know how to do). Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working. I'm pretty sure I also need some other trainables at some point, too.

[0] Specifically, Wikimedia Commons images in the PD-Art-100 category, because the images will be public domain in the US and the labels CC-BY-SA. This would rule out a lot of the complaints people have about living artists' work getting scraped into the machine; and probably satisfy Debian's ML guidelines.

[1] Which actually does work

TaylorAlexander · on Nov 24, 2022

Ah I am glad to see someone else talking about using public domain images!

Honestly it baffles me that in all this discussion, I rarely see people discussing how to do this with appropriately licensed images. There are some pretty large datasets out there of public images, and doing so might even help encourage more people to contribute to open datasets.

Also if the big ML companies HAD to use open images, they would be forced to figure out sample efficiency for these models. Which is good for the ML community! They would also be motivated to encourage the creation of larger openly licensed datasets, which would be great. I still think if we got twitter and other social media sites to add image license options, then people who want to contribute to open datasets could do so in an easy and socially contagious way. Maybe this would be a good project for mastodon contributors, since that is something we actually have control over. I'd be happy to license my photography with an open license!

It is really a wonderful idea to try to do this with open data. Maybe it won't work very well with current techniques, but that just becomes an engineering problem worth looking at (sample efficiency).

acadapter · on Nov 24, 2022

Human artists derive their inspiration and styles from a large set of copyrighted works, but they are free to produce new art despite of that. Art would have developed much slower and be much poorer if, for example, Impressionism or Cubism had been entangled in long ownership confrontations in courts.

Then there's the fact that humanity has been able to develop and share art and literary works for thousands of years without the modern copyright system.

It would be interesting to see if this technology can erode the copyright concept a bit. Maybe not remove it completely, but perhaps influence people to create wider definitions for "fair use", and undo the extensions that Disney lobbyists have created.

Applejinx · on Nov 24, 2022

That is a very apropos reference. If you're familiar with Cubism, you know that there's Picasso, and then there's Braque. The one is an art celebrity beyond almost any other, and the other isn't.

But they developed Cubism in parallel. There were periods where their work was almost indistinguishable. "Houses at l'Estaque", the trope namer for Cubism thanks to the remarks of a critic, was in fact by Braque.

You can generate infinite recognizable Basquiat from an AI, but is it Basquiat? No, of course not, because Basquiat's style operates within the context of a specific individual human making a point about expectations and the interface between his race and his artistic boldness and audacity as experienced by his wealthy audience. Making an AI 'ape' (!) his art style is itself quite the artistic statement, but it's not the same thing in the slightest.

You can generate infinite Rothko as 512x512 squares, but if you don't understand how the gallery hangings work and their ability to fill your entire visual field with first carefully chosen color, and then a great deal of detail at the threshold of perception of distinctions between color shades meant to further drive home the reaction to the basic color's moods, what you generate is basically arbitrary and nothing. Rothko isn't 'just a random color', Rothko is about giving you a feeling through means that aren't normal or representational, and the unusualness of this (reasonably successful) effort is what gave the work its valuation.

Ownership of the experience by a particular artist isn't the point. Rothko isn't solely celebrity worship and speculation. Picasso isn't all of Cubism. Art is things other than property of particular artists.

What makes it awkward is the great ease by which AI can blindly and unhelpfully wear the mask of an artist, such as Basquiat, to the detriment of art. It's HOW you use the tools, and it's possible to abuse such tools.

tweetle_beetle · on Nov 24, 2022

> You can generate infinite recognizable Basquiat from an AI, but is it Basquiat? No, of course not, because Basquiat's style operates within the context of a specific individual human making a point about expectations and the interface between his race and his artistic boldness and audacity as experienced by his wealthy audience.

I'm not sure how I feel about this - I agree with the conclusion, but not the reasoning. For me, AI-generated Basquiat is not Basquiat simply because he had no ownership or agency in the process of its creation.

It feels like an overly romantic notion that art requires specific historical/cultural context at the moment of its creation to be valid.

If I could hypothetically pay Basquiat $100 to put his own work into a stable diffusion model that created a Basquiat-esque work, that's still a Basquiat. If I could pay him to draw a circle with a pencil, that's his work - and if I used it in an AI model, then it's not.

It's about who held the paintbrush, or who delegated holding the paintbrush, not a retrospectively applied critical theory.

Applejinx · on Nov 25, 2022

On reflection, I'm going to say 'nope'. Because it's Basquiat, I'm pretty sure you couldn't get him to make a model of himself (maybe he would, and call it 'samo'?). I don't think you could pay him to draw a circle with a pencil: I think he'd have been offended and angry. And so that is not 'his work'. It trips over what makes him Basquiat, so doing these things is not Basquiat (though it's very, very Warhol).

Even more than that, you couldn't do Rothko that way: the man would be beyond offended and would not deal with you at all. But by contrast, you ABSOLUTELY are doing a Warhol if you train an AI on him and have it generate infinite works, and furthermore I think he'd be absolutely delighted at the notion, and would love exploring the unexplored conceptual space inside the neural net.

In a sense, an AI Warhol is megaWarhol, an unexplored level of Warholiness that wasn't attainable within his lifetime.

Context and intent matter. All of modern art ended up exploring these questions AS the artform itself, so boiling it down to 'did a specific person make a mark on a thing' won't work here.

tweetle_beetle · on Nov 30, 2022

This seems to me to confuse agency with interpretation - romanticising the life and character of the artist after their heyday and death, talking about what they would have done.

Any drawing Basquiat did is a piece of art by Basquiat, whether or not it fits into the narrative of a book/thesis/lecture/exhibition. The circle metaphor isn't important - replace it with anything else. Artists regularly throw their own work away. Some of this is saved and celebrated posthumously, some never sees the light of day in accordance with their wishes. Scraps that fell on Picasso's floor sell for huge amounts of money.

Does everything he did fit the "brand" that some art historians have labelled him with, or the "brand" that auction houses promote to increase value, or the "brand" which a fashion label licenses for t-shirts? No, but I suspect this is probably what you are talking about ie. a "classic" Basquiat™ with certificate of authenticity?

Is it by Basquiat? vs Is it a Basquiat?

pnt12 · on Nov 24, 2022

Human artists cannot produce thousands of works in a few hours.

This arguments come up in every thread, and I'm baffled that people don't think the scale matters.

You may also be observed in public areas by police, but it would be an orwellian dystopia to have millions of cameras in spaces analyzing everyone's behavior in public.

Scale matters.

(But I'm indeed in favor of weaker copyright laws! But preferably to take power away from the copyright monopolies than the individual artists who barely get by with their profits)

musicale · on Nov 25, 2022

> it would be an orwellian dystopia to have millions of cameras in spaces analyzing everyone's behavior in public.

Aren't there already 80M+ surveillance cameras in the US?

Outside of the US, London seems to have a lot of CCTV cameras.

Do privacy laws restrict how they can be used and whether they can be monitored by AI systems?

ajuc · on Nov 24, 2022

> It would be interesting to see if this technology can erode the copyright concept a bit

Copyright law (especially in US) only ever changes in the direction that suits corporations. So - no.

What I expect instead is artists being sued by a big tech company for copyright violations because that big tech company used the artist Public Domain image for training their copyrighted AI and as a result it created a copyrighted copy of the original artist's image.

juliendorra · on Nov 24, 2022

My bet is that big corporations won’t risk suing anyone over a supposed copyright on generated images,as there is a good chance that a court ends up stating that all AI generated images are in fact public domain (no author, not from the original intent and idea of a human)

You can already see the quite strange and toned down language they use on their sites. (And for some the revealing reversal from we licence to you to you licence to us)

Some smaller AI companies might believe they own a clear cut copyright and sue, but it would make sense that they would either be thrown out or loose

kmeisthax · on Nov 24, 2022

So, the US Copyright Office will already refuse to issue a copyright for text-prompt-generated AI art, at least if you try a stunt like naming the artist to be the AI program itself.

However, even if an image is not copyrightable, it can still infringe copyright. For example, mechanical reproductions of images are not copyrightable in the US[0] - which is why you even can have public domain imagery on the web. However, if I scan a copyrighted image into my computer, that doesn't launder the copyright away, and I can still be sued for having that image on my website.

Likewise, if I ask an AI to give me someone else's copyrighted work[1], it will happily regurgitate its training set and do that, and that's infringement. This is separate from the question of training the AI itself; even if that is fair use[2], that does nothing for the people using the AI because fair use is not transitive. If I, say, take every YouTube video essay and review on a particular movie and just clip out and re-edit all the movie clips in those reviews, that doesn't make my re-edit fair use. You cannot "reach through" a fair use to infringe copyright.

[0] In Europe there's a concept of neighboring rights, where instead of issuing you a full copyright you get 20 years of ownership instead. This is intended for things like databases and the like. This also applies to images; copyright over there distinguishes between artistic photography (full copyright) and other kinds of photography (20 years neighboring right only). This is also why Wikimedia Commons has a hilarious amount of Italian photos from the 80s in a special PD-Italy category.

[1] Which is not too difficult to do

[2] My current guess is that it is fair use, because the AI can generate novel works if you give it novel input.

astrange · on Nov 24, 2022

> So, the US Copyright Office will already refuse to issue a copyright for text-prompt-generated AI art, at least if you try a stunt like naming the artist to be the AI program itself.

That’s because only humans can own copyrights. People can and have registered copyrights for Midjourney outputs.

swores · on Nov 24, 2022

> Copyright law (especially in US) only ever changes in the direction that suits corporations. So - no.

There's certainly arguments to be made in this direction, for example corporations tending to have the most money they can afford to spend on lobbying to get their way, but the attitude of "it hasn't been good up 'til now so it definitely can't ever be good" is pretty defeatist and would imply that positive change is impossible in any area.

phtrivier · on Nov 24, 2022

In this situation, it would seem like the suit would end up at "comparing the timestamp at which the public domain and copyrighted versions were published", wouldn't it ?

There is nothing that the generative AI can do in this process that's legally different from copy pasting the image, editing it a bit by hand, and somehow claiming intellectual property of the _initial_ image, no ?

ajuc · on Nov 24, 2022

In theory yes, in practice you have to pay your legal expanses in US even if you win the case. Which means you can bankrupt because a big company thought you infringed on their rights even if you didn't. Simply because you can't afford the costs.

It's absurd.

nceqs3 · on Nov 25, 2022

>Copyright law (especially in US) only ever changes in the direction that suits corporations. So - no.

Just objectively false.

ajuc · on Nov 26, 2022

Counterexample?

ksrm · on Nov 24, 2022

AI tools aren't people. We don't have to treat them the same.

Xelynega · on Nov 24, 2022

Doesn't your argument in the first paragraph assume that the methods by which humans derive new works from past experiences is equivalent to the way statistical models iteratively remove noise from images based on a set of abstract features derived from an input prompt?

That seems to be the core of the issue, and a much more interesting conversation to have. So why do I keep seeing a version of your first paragraph everywhere and not an explanation on why the assumption can be made?

thot_experiment · on Nov 24, 2022

The problem is not that people aren't owning ideas hard enough, ideas shouldn't be ownable in this way, the problem is that we've created a system that's obsessed with scarcity and collecting rents. Being able to own and trade ideas a la copyright/patents helps people who can buy copyrights and patents stifle creativity more than it helps artists gather reward for their creation (though it does both).

Human endeavor is inherently collaborative. The idea that my art is my virgin creation is an illusion perpetuated by capitalists. My art is the work of thousands who came before me with my slight additions and tweaks.

Your (and in general, our) suggestion that we should be concerned with respecting or even expanding these protections is incorrect if you want human creativity to flourish.

TaylorAlexander · on Nov 24, 2022

You misunderstand me. I am strongly in favor of abolishing all intellectual property restrictions. Here is me arguing just that two days ago: https://news.ycombinator.com/item?id=33697341

But I am absolutely not in favor of keeping IP restrictions in place and then letting big corporations scoop up the works of small independent artists for their ML models.

Think of it in terms of software licenses. The people who write GPL protected software are leveraging existing copyright laws to enforce distribution of their code. They would probably be in favor of abolishing the entire IP rights system. But if a big corporation was copying a project from an independent creator that was GPL licensed, they’d sure as hell want to prosecute.

I believe strongly that IP restrictions are harmful. But keeping them in place while letting big corporations benefit from the work of independent artists who don’t want their work used in this way seems wrong to me. As long as artists wouldn’t expect anyone else to be able to copy their works, I’d like them to be able to consent to their work being used in these systems.

thot_experiment · on Nov 24, 2022

Ahh, I don't think that stance is evident from the GP but fair enough. I may even have a less fervent hate for IP protections than you do.

> But keeping them in place while letting big corporations benefit from the work of independent artists who don’t want their work used in this way seems wrong to me.

I see what you're saying here. My concern is that should copyright style protection be extended to the "vibe" or "style" of a painting it is going to be twisted in a way that ends up being used to silence/abuse artists in the same way that copyright strikes are already.

I think the idea that art is mostly individually creative vs mostly drawing upon the work of all the artists and art-appreciators around you and before you is already really problematic. The corrupting power of the idea is what I worry about. Similarly to crypto/NFTs, the idea that scarcity should exist in the digital world is the most dangerous thing, most of the other bad stems from that.

IMO the most important thing to work on is getting people to reject the idea itself as harmful.

I worry that any short term fix to try to prop up artists' rights in response to this changing landscape will become a long term anchor on our society's equity and cultural progress in the exact same way copyright is.

hdjjhhvvhga · on Nov 24, 2022

When I was younger, I also thought that way. I also felt that being artist has nothing to with money: a true artist will always create out of their internal need, not for money.

Then came the brutal reality: creating high-quality artwork needs time. Some can be created after work, but not that much. Some forms of art require expensive instruments. Some, like filmmaking, require collaboration and coordination of many people. So yes, I could do some forms of art part-time using the money from my day job, but I knew it was a far cry from what I could do when working on it full time. It's not capitalism, it's just reality.

BurningFrog · on Nov 24, 2022

Yeah, if you want artists to be able to devote their lives to their craft and reach the highest possible levels, they have to get paid enough to do that.

If all artists are "weekend warriors", they will still produce a lot of art, and some of it will be the best in that world. But the quality will be far from what we enjoy today.

That said, there are of course other ways to pay artists than the capitalist way of having customers pay for what they like. But I think the track record firmly favors a capitalist system.

kikokikokiko · on Nov 24, 2022

It's almost like "capitalism" isn't something that needs to be created and forced upon people, it's just the way a world where energy isn't free and can not be created from thin air works. Capitalism is just that, the realization that there's no free lunches and no UBIs are possible without some serious unintended consequences. I pirate everything I consume, but I would never be such an hypocrite to say that all copyright must be abolished.

less_less · on Nov 24, 2022

What? No. Capitalism is a more specific system for organizing goods and services, wherein the means of production and distribution of those goods and services (buildings, land, machines and other tools, vehicles etc) are privately owned and operated by workers (who are paid a wage) for the profit of the owners. That's only been the norm for a few hundred years, and only in certain places. Also, capitalism is separate from copyright and other IP, though IP as currently implemented is pretty obviously a capitalist concept.

hdjjhhvvhga · on Nov 24, 2022

> That's only been the norm for a few hundred years, and only in certain places.

Can you point to a system that worked well before that you'd like to go back to?

KingMob · on Nov 25, 2022

Your question assumes the only alternatives involve going back, not forwards. There are still many untried sociopolitical systems.

less_less · on Nov 24, 2022

At the moment I'd rather not get involved in an online argument about which economic systems are better than than which other ones... especially not on a forum run by a startup accelerator, with a constraint that my preferred system has to be more than 300 years old.

I just wanted to point out that capitalism is in fact a specific economic system. It's not a law of nature, or another word for "markets" or "freedom", or a realization that some other system doesn't work.

andrepd · on Nov 24, 2022

That's one of the great victories of capitalism: somehow it has convinced people that a 300 year-old economic system originating in north-western Europe is as natural as the air we breathe, and as inevitable as gravity or any natural law.

CamperBob2 · on Nov 24, 2022

You have to threaten to shoot people to get them to practice any other -ism.

So, yes, capitalism in the sense of the freedom to trade one's labor does appear to be naturally and universally emergent in advanced human societies, in the absence of violent interference.

andrepd · on Nov 28, 2022

Capitalism has violent coercion at its core, in order to enforce its property rights. You simply think that that violence is legitimate and unproblematic because you believe the system it upholds is "natural" and legitimate, but at this point you're arguing in circles. But to say that capitalism is not violent is laughable.

hyperlogos · on Nov 26, 2022

Capitalism is certainly not characterized by the absence of violent interference.

CamperBob2 · on Nov 27, 2022

Yes, it is. The violence comes in when you interfere with capitalism. It's not imposed upon you forcefully, you just aren't allowed to get in the way.

To the extent that certain aspects of capitalism lead to violence, those are elements that other parties -- generally corporations or governments rather than writers or philosophers -- added to the ideology.

People die trying to break out of non-capitalist countries, while they die trying to break in to capitalist ones. That's one possible way to tell the good guys from the bad guys.

andrepd · on Nov 28, 2022

> Yes, it is. The violence comes in when you interfere with capitalism.

Ahahah, I absolutely love this sentence. You might have said the quiet part out loud though.

“You gots to understand”, said Fat Tony, “I'm not a violent man. The violence simply comes in when you interfere with my business.”

CamperBob2 · on Nov 28, 2022

(Shrug) Taking peoples' rights away, including their economic rights, is likely to get the hurt put on you. Ric Romero has more on this late-breaking story at 11.

kleene_op · on Nov 29, 2022

It sounds funny but he may have a point. It's not a quality of capitalism per se, had it been communism instead then communism would have been the best system for the present moment.

But capitalism prevails and may be the best system there is for now because I cannot fathom a change in system overnight that would not result in mass suffering for (almost) everyone.

astrange · on Nov 24, 2022

Paying people to make art is older than “capitalism”. Capitalism is when you can own and trade capital, not when you pay people to do things.

arbol · on Nov 24, 2022

The restrictions on creating art are the product of the society you live in, which means they are the product of capitalism if you live in a capitalist society. The way society is organised determines the cost of people's time, the cost of the tools, and the cost of the materials.

Xelynega · on Nov 24, 2022

Yea I find when people say "ideas shouldn't be ownable" it's really the more general "deriving profit from private ownership was a mistake". Like you kinda point out, most of the reason I can think of that a person would want control of their intellectual property is to derive profit from it.

That reason has nothing to do with intellectual property or how it's created, it's a consequence of living in a capitalist society.

klqwart · on Nov 24, 2022

Perhaps no one wants "your art"? 99% of artists who produce something worthwhile very much care about money/copyright.

The there still is the question of attribution, which 100% of real artists care about.

fragmede · on Nov 24, 2022

So anybody who just wanted a thing to exist, and don't care who gets the credit, aren't "real artists"? You must not work on any large art projects that involve other people.

theCrowing · on Nov 24, 2022

99%? You might have it in reverse because most art is not produced by "fulltime" artists. I would even go as far and say 99% of art is not produced to earn money.

smeagull · on Nov 24, 2022

Yup, it really is a good thing they haven't been forced to use open images.

bowsamic · on Nov 24, 2022

No one is ever going to stop using all the available images until there is a law against it. Why would they?

4RealFreedom · on Nov 24, 2022

I've seen many arguments about getting laws on the books around ML learning. I would suggest people create a project that creates movies using ML and train it using existing Hollywood movies. I realize this isn't easy but the issue needs to be pushed to people that have the means to force change.

LunaSea · on Nov 24, 2022

There are already laws against it but enforcement is laking, as always.

bonoboTP · on Nov 24, 2022

If you can't process/digest copyrighted content with algorithms/machine learning then Google Search (the whole thing, not just Image Search) is dead.

So no, it's not at all clear where the legal lines are drawn. There have been no court cases yet, regarding the training of ML models. People are trying to draw analogies from other types of cases, but this has not been tried in court yet. And then the answer will likely differ based on country.

Xelynega · on Nov 24, 2022

> If you can't process/digest copyrighted content with algorithms/machine learning then Google Search (the whole thing, not just Image Search) is dead.

Not if Google honors the robots.txt like they say they do. Hosting content with a robots.txt saying "index me please" is essentially an implicit contract with Google for full access to your content in return for showing up in their search results.

Hosting an image/code repository with a very specific license attached and then having that licensed ignored by someone who repackages that content and redistributes it is not the same as sites explicitly telling Google to index their content.

A much closer comparison IMO would be someone compressing a massive library of copyrighted content and then redistributing it and arguing it's legal because "the content has been processed and can't be recovered without a specific setup". I don't think we'd need prior court cases to argue that would most likely be illegal, so I don't see how machine learning models differ.

astrange · on Nov 24, 2022

LAION/StableDiffusion is already legal under the same exemptions as Google Image Search and does respect robots.txt. It was also created in Germany so US court cases wouldn’t apply to it.

bonoboTP · on Nov 24, 2022

Google also indexes sites without robots.txt. Also, it's not mere classic indexing but ML processing too.

bowsamic · on Nov 24, 2022

No, it has not yet been demonstrated that the current copyright laws forbid the use of copyrighted images to train neural networks.

LunaSea · on Nov 24, 2022

The moment you make money from it the law is pretty clear.

bowsamic · on Nov 24, 2022

No, it isn’t. Why are you lying?

smeagull · on Nov 24, 2022

Can't wait for law enforcement to start arresting people for looking at images.

Xelynega · on Nov 24, 2022

As long as they aren't repackaging and redistributing them, why would looking at them be illegal?

smeagull · on Nov 30, 2022

You're training your model. Maybe it's producing art that becomes illegal.

__rito__ · on Nov 24, 2022

Well, you can learn about generative models from MOOCs like the ones taught at UMich, Universitat Tubingen, or New York University (taught by Yann LeCun), and can gain knowledge there.

You can also watch the fast.ai MOOC titled Deep Learning from Scratch to Stable Diffusion [0].

You can also look at open source implementation of text2image models like Dall-E Mini or the works of lucid rain.

I worked on the Dall-E Mini project, and the technical knowhow that you need isn’t closely taught at MOOCs. You need to know, on top of Deep Learning theory, many tricks, gotchas, workarounds, etc.

You could follow the works of Eluther AI, follow Boris Dayma (project leader of Dall-E Mini) and Horace Ho on twitter. And any such people who have significant experience in practical AI and regularly share their tricks. The PyTorch forums is also a good place.

Learn PyTorch and/or JAX/Flax really well.

[0]: https://www.fast.ai/posts/part2-2022.html

echelon · on Nov 24, 2022

> train this from scratch

If you're talking about training from scratch and not fine tuning, that won't be cheap or easy to do. You need thousands upon thousands of dollars of GPU compute [1] and a gigantic data set.

I trained something nowhere near the scale of Stable Diffusion on Lambda Labs, and my bill was $14,000.

[1] Assuming you rent GPUs hourly, because buying the hardware outright will be prohibitively expensive.

kmeisthax · on Nov 24, 2022

I have... ~11TBs of free disk space and a 1080ti. Obviously nowhere close to being able to crunch all of Wikimedia Commons, but I'm also not trying to beat Stability AI at their own game. I just want to move the arguments people have about art generators beyond "this is unethical copyright laundering" and "the model is taking reference just like a real human".

ftufek · on Nov 24, 2022

To put things in perspective, the dataset it's trained on is ~240TB and Stability has over ~4000 Nvidia A100 (which is much faster than a 1080ti). Without those ingredients, you're highly unlikely to get a model that's worth using (it'll produce mostly useless outputs).

That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

But if you want to create custom versions of SD, you can always try out dreambooth: https://github.com/XavierXiao/Dreambooth-Stable-Diffusion, that one is actually feasible without spending millions of dollars on GPUs.

nuccy · on Nov 24, 2022

As pointed out in [1], it seems machine learning takes the same path as physics already did. In the mid-20th century there was a "break" in physics, before individuals were making ground breaking discoveries in their private/personal labs (think Newton, Maxwell, Curie, Roentgen, Planck, Einstein, and many others) later huge collaborations (LHC/CERN, Icecube, EHT, et al.) are required, since the machinery, simulations, models are so complex, that groups of people are needed to create, comprehend and use them.

1. https://www.youtube.com/watch?v=cdiD-9MMpb0 Lex Fridman podcast with Andrej Karpathy

P.S. To counteract that (unintentionally actually, likely because of a simple optimization of instruments' duty cycle) in astronomy people come up with a concept of "observatory" (Like Hubble, JWST) instead of "experiment" (like LHC, HESS telescopes) where outside people can submit their proposals, and if selected get observational time. Along with raw data authors of the proposals get required expertise from the collaboration to process and analyze that data.

otabdeveloper4 · on Nov 24, 2022

> when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

This is just lossy compression with a large and well-tuned (to the expected problem domain) dictionary.

Video compression codecs can achieve a 500x compression ratio, and they are general-purpose.

FeepingCreature · on Nov 24, 2022

The dataset, LAION-5B, is 240TB of already compressed data. (5 billion pairs of text to 512x512 image.)

Uncompressed, LAION-5B would be 4PB, for a compression ratio into SD of ~780kx, or one byte per picture.

otabdeveloper4 · on Nov 24, 2022

The point is that there's is no practical limit on compression. You don't need "AI" or anything besides very basic statistics to get astronomical compression ratios. (See: "zip bomb".)

The only practical limit is the amount of information entropy in the source material, and if you're going to claim that internet pictures are particularly information-dense I'd need some evidence, because I don't believe you.

FeepingCreature · on Nov 24, 2022

Correct, however "compression is equivalent to general intelligence" (http://prize.hutter1.net/hfaq.htm#compai ) and so in a sense, all learning is compression. In this case, SD applies a level of compression that is so high that the only way it can sustain information from its inputs is by capturing their underlying structure. This is a fundamentally deeper level of understanding than image codecs, which merely capture short-range visual features.

otabdeveloper4 · on Nov 24, 2022

I fail to see the difference between "underlying structure" and "short-range visual features".

Both are just simple statistical relationships between parameters and random variables.

FeepingCreature · on Nov 24, 2022

Sure, but why would that not apply to humans? And we don't consider it copyright violation if a human learns painting by looking at art.

otabdeveloper4 · on Nov 24, 2022

Depends on what you mean by "humans".

Most human behavior is easy to describe with only a few underlying parameters, but there are outlier behaviors where the number of parameters grows unboundedly.

("AI" hasn't even come close to modeling these outliers.)

Internet pictures squarely falls into the "few underlying parameters" bucket.

Xelynega · on Nov 24, 2022

Because we made the algorithms and can confirm these theories apply to them.

We can speculate they apply to certain models of slices of human behaviour based on our vague understanding of how we work, but not nearly to the same degree.

FeepingCreature · on Nov 24, 2022

Hang on, but- plagiarism is a copyright violation, and that passes through the human brain.

When a human looks at a picture and then creates a duplicate, even from memory, we consider that a copyright violation. But when a human looks at a picture and then paints something in the style of that picture, we don't consider that a copyright violation. However we don't know how the brain does it in either case.

How is this different to Stable Diffusion imitating artists?

permo-w · on Nov 24, 2022

human memory is lossy compression

wmwragg · on Nov 24, 2022

Well that would be ~4000 people each with an Nvidia A100 equivalent, or more with less, this would be an open effort after all. Something similar to folding@home could be used. Obviously the software for that would need to be written, but I don't think the idea is unlikely. The power of the commons shouldn't be underestimated.

govg · on Nov 24, 2022

It's not super clear whether the training task can be scaled in a manner similar to protein folding. It's a bit trickier to optimise ML workflows across computation nodes because you need more real time aggregation and decision making (for the algorithms).

dmingod666 · on Nov 24, 2022

A100 costs 10-12k USD 40GB/80GB vram and it's not even targeted at the individual gamer (not effective at gaming) -- they don't even give these things to big YouTube reviewers(LTT). So 4k people will be hard to find. 3090, you can find, that's a 24GB vram card. But that's expensive too and it's a power guzzler compared to the A100 series.

macrolime · on Nov 24, 2022

AFAIK. This is not possible at the moment and would need some breakthrough in training algorithms, the required bandwidth between the GPUs is much higher than internet speed.

edude03 · on Nov 24, 2022

Unlike folding@home the problem isn't very distributable because weights needs to be shared between GPUs via very high speed link

wokwokwok · on Nov 24, 2022

Quite right, but…

> That argument also makes little sense when you consider that the model is a couple gigabytes itself, it can't memorize 240TB of data, so it "learned".

The matter is really very nuanced and trivialising it that way is unhelpful.

If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?

If an AI model can generate statistically significantly similar images from the training data, with a trivial guessable prompt (“a picture by xxx” or whatever) then it’s entirely arguable that the model is similarly infringing.

The exact compression algorithm, be it model or jpg or zip is irrelevant to that point.

It’s entirely reasonable to say, if this is so good at learning, why don’t you train it without the art station dataset.

…because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

If not, then it’s not just learning technique, it’s copying.

So; tldr: there’s plenty of scope for trying to train a model on an ethically sourced dataset, and investigation of techniques vs copying in generative models.

It is 100% not something we can just brush off.

CamperBob2 · on Nov 24, 2022

If I recompress 240TB as super low quality jpgs and manage to zip them up as single file that is significantly smaller than 240TB (because you can), does the fact they are not pixel perfect matches for the original images mean you’re not violating copyright?

If you compress them down to two or three bytes each, which is what the process effectively does, then yes, I would argue that we stand to lose a LOT as a technological society by enforcing existing copyright laws on IP that has undergone such an extreme transformation.

wokwokwok · on Nov 24, 2022

Maybe?

Does that mean it’s worthless to try to train an ethical art model?

Is it not helpful to show that you can train a model that can generate art without training it on copyrighted material?

Maybe it’s good. Maybe not. Who cares if people waste their money doing it? Why do you care?

It certainly feels awfully convenient for that there are no ethically trained models because it means no one can say “you should be using these; you have a choice to do the right thing, if you want to”.

I’m not judging; but what I will say is that there’s only one benefit in trying to avoid and discourage people training ethical models:

…and that is the benefit of people currently making and using unethically trained models.

CamperBob2 · on Nov 24, 2022

We don't agree on what "ethical" means here, so I don't see a lot of room for discussion until that happens. Why do you care if people waste computing time programming their hardware to study art and create new art based on what it learns? Who is being harmed? More art in the world is a good thing.

pygy_ · on Nov 24, 2022

> Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

You couldn't teach a human to do that without them having seen Greg's art. There are elements of stroke, palette, lightning and composition that can't be fully captured by natural language (short of encoding a ML model, which defeats the point).

xeyownt · on Nov 24, 2022

Copyrights say you cannot reproduce, distribute, etc a work without consent from the author, whatever the mean. The copy doesn't need to be exact, only sufficiently close.

However, copyright doesn't prevent someone to look at the work and study it. Even study it by heart. Infringement comes only if that someone would make a reproduction of that work. Also, there are provision for fair use, etc.

ENGNR · on Nov 24, 2022

> …because if it’s just learning techniques, generic public domain art should be fine right? Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

Is it fair to hold it to a higher standard than humans though? To some degree it's the whole "xxx..... on a computer!" thing all over again if we go that way

kelseyfrog · on Nov 24, 2022

> Can’t you just engineer the prompting better so that it generates “by Greg Rutkowski“ images without being trained on actual images by Greg?

Can you please rewrite this in the writing style of Socrates?

visarga · on Nov 24, 2022

> The matter is really very nuanced and trivialising it that way is unhelpful.

Harping about copyrights in the Age of Diffusion Models is unhelpful (for artists) like protesting against a tsunami. It's time to move up the ladder.

ML engineers have a similar predicament - GPT-3 like models can solve at first try, without specialised training, tasks that took a whole team a few years of work. Who dares still use LSTMs now like it's 2017? Moving up the ladder, learning to prompt and fine-tune ready made models is the only solution for ML eng.

The reckoning is coming for programmers and for writers as well. Even scientific papers can be generated by LLMs now - see the Galactica scandal where some detractors said it will empower people to write fake papers. It also has the best ability to generate appropriate citations.

The conclusion is that we need to give up some of the human-only tasks and hop on the new train.

visarga · on Nov 24, 2022

It's "keeping" 1 byte worth of information from each input example. The SD models are 5GB together, and the dataset 2.3B images.

blueboo · on Nov 24, 2022

Stable Diffusion 1 was trained with 256 A100s running for a little over three weeks. These days that would cost less than a Tesla…

pkdpic · on Nov 24, 2022

I think it's a great idea regardless of practicality / implementation which I think is generally understood to be largely a matter of time, money and hardware. I feel like you write it up so the idea gets out there or you can pitch it to someone if the opportunity arises.

Oh and also I second the fast.ai suggestion, part 2 is 100% focused on implementing stable diffusion from scratch in the python standard library and it's amazing all around. The course is still actively coming out but the first few lessons are freely available already and the rest sounds like it will be made freely available soon.

Jack000 · on Nov 24, 2022

Depends on the dataset. You can probably get decent results by restricting the modality of the images (faces, cars, bedrooms etc)

I trained from scratch with 4x3090 and while it’s not as good as SD it’s surprisingly better with hands.

hackater · on Nov 24, 2022

Can you go into a bit more detail? What architecture did you use? Is the month training time really just training with mini batches with a constant learning rate? Or are these many failed attempts until you trained a successful model for a few days in the end?

I particularly interested in the image generation part (the DDPM/SGM)

Jack000 · on Nov 24, 2022

Yeah I did have a few false starts. Total time is more like 3 months vs 1 month for the final model. For small scale training I found it’s necessary to use a long lr warmup period, followed by constant lr.

There’s code on my GitHub (glid3)

edit: The architecture is identical to SD except I trained on 256px images with cosine noise schedule instead of linear. Using the cosine schedule makes the unet converge faster but can overfit if overtrained.

edit 2: Just tried it again and my model is also pretty bad at hands actually. It does get lucky once in a while though.

rrobukef · on Nov 24, 2022

I keep wondering if using not only statistical noise but also deformations would help with the generation of deformable things - say human hands.

dmingod666 · on Nov 24, 2022

F222 does a little more coherent anatomy..not surprising given its background

jbellis · on Nov 24, 2022

What kind of form factor do you use for 4x3090? Don't people usually use the datacenter product line when they're trying to get more than one into a box?

Jack000 · on Nov 24, 2022

The datacenter cards are 3-4x the price for the same speed + double the vram. Gaming cards are a lot more cost effective if your model fits in under 24gb.

I use an open air rig like the ones used for crypto mining. 4x3090 would normally trip the breakers without mods but if you under volt the cards the power draw is just under the limit for a home AC outlet.

svnt · on Nov 24, 2022

How long did the training take on 4 3090s?

Jack000 · on Nov 24, 2022

About 1 month actual training time. It’s a smaller (650m) model and probably still undertrained. Glid3 on GitHub.

pfd1986 · on Nov 25, 2022

Hi, do you have a writeup of that anywhere? Would love to hear (read) more about it

fastball · on Nov 24, 2022

[flagged]

unclad5968 · on Nov 24, 2022

"nowhere near"

fastball · on Nov 24, 2022

Yep, missed the word "nowhere". My mistake.

TeMPOraL · on Nov 24, 2022

> Specifically, Wikimedia Commons images in the PD-Art-100 category, because the images will be public domain in the US and the labels CC-BY-SA.

Doesn't the "BY" part of the license mean you have to provide attribution along with your models' output[0]? I feel you'll have the equivalent of Github Copilot problem: it might be prohibitive to correctly attribute each output, and listing the entire dataset in attribution section won't fly either. And if you don't attribute, your model is no different than Stable Diffusion, Copilot and other hot models/tools: it's still a massive copyright violation and copyright laundering tool.

----

[0] - https://creativecommons.org/licenses/by-sa/4.0/

etienne618 · on Nov 24, 2022

I feel quite strongly that there is a large difference between Stable Diffusion and Copilot: with the size of the training set vs the number of parameters, it should be very difficult if not impossible for Stable Diffusion to memorize and, by extension, copy paste to produce its outputs. Copilot is trained on text and outputs text. Coding is also inherently more difficult for an AI model to do. I expect it will memorize large portions of its input and is copy pasting in many cases to produce output. I therefore believe Copilot is doing "copyright laundering" but Stable Diffusion is not. Furthermore, I do not believe, for example, that artists should be able to copyright a "style" - but I would like to see them not be negatively impacted by this. Its complicated.

Xelynega · on Nov 24, 2022

Let me guess that you write more code than visual art?

Isnt it a bit anthropomorphic to compare the two algorithms by "how a human believes they work" instead of "what they're actually doing different to the inputs to create the outputs"?

These are algorithms and we can look at how they work, so it feels like a cop-out to not do that.

kmeisthax · on Nov 24, 2022

If I was generating image labels I absolutely would need to worry about that. However, since we're only generating images alone, we don't need to worry about bits of the labels getting into the output images.

The attribution requirement would absolutely apply to the model weights themselves, and if I ever get this thing to train at all I plan to have a script that extracts attribution data from the Wikimedia Commons dataset and puts it in the model file. This is cumbersome, but possible. A copyright maximalist might also argue that the prompts you put into the model - or at least ones you've specifically engineered for the particular language the labels use - are derivative works of the original label set and need to be attributed, too. However, that's only a problem for people who want to share text prompts, and the labels themselves probably only have thin copyright[0].

Also, there's a particular feature of art generators that makes the attribution problem potentially tractable: CLIP itself was originally designed to do image classification. Guiding an image diffuser is just a cool hack. This means that we actually have a content ID system baked into our image generator! If you have a list of what images were fed into the CLIP trainer and their image-side outputs[1], then you can feed a generated image back into CLIP and compare the distance in the output space to the original training set and list out the closest examples there.

[0] A US copyright doctrine in which courts have argued that collections of uncopyrightable elements can become copyrightable, but the resulting protection is said to be "thin".

[1] CLIP uses a "dual headed" model architecture, in which both an image and text classifier are co-trained to output data into the same output parameter space. This is what makes art generators work, and it can even do things like "zero-shot classification" where you ask it to classify things it was never trained on.

good_boy · on Nov 26, 2022

>If I was generating image labels I absolutely would need to worry about that. However, since we're only generating images alone, we don't need to worry about bits of the labels getting into the output images.

Just to be correct, SD generates labels on images sometimes, so, we need to worry ;)

astrange · on Nov 25, 2022

> This is cumbersome, but possible.

This is not possible because the model is smaller than the input weights. Just as any new image it generates is something it made up, any attributions it generated would also be made up.

CLIP can provide “similarity” scores but those are based on an arbitrary definition of “similarity”. Diffusion models don’t make collages.

amadvance · on Nov 24, 2022

The SA part (ShareAlike) is even more restrictive, as it imposes a license on the derivative work.

"— If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original"

Xelynega · on Nov 24, 2022

How is that restrictive? Doesn't it just mean that any outputs of the model also fall under the same license so they can be used in public datasets?

ShamelessC · on Nov 24, 2022

> Writing a training loop for CLIP manually wound up with me banging against all sorts of strange roadblocks and missing bits of documentation, and I still don't have it working.

There is working training code for openCLIP https://github.com/mlfoundations/open_clip

But training multi-modal text-to-image models is still a _very_ new thing, in terms of the software world. Given that, my experience has been that it's never been easier to get to work on this stuff from the software POV. The hardware is the tricky bit (and preventing bandwidth issues on distributed systems).

That isn't to say that there isn't code out there for training. Just that you're going to run into issues and learning how to solve those issues as you encounter them is going to be a highly valuable skill soon.

edit:

I'm seeing in a sibling comment that you're hoping to train your own model from scratch on a single GPU. Currently, at least, scaling laws for transformers [0] mean that the only models that perform much of anything at all need a lot of parameters. The bigger the better - as far as we can tell.

Very simply - researchers start by making a model big enough to fill a single GPU. Then, they replicate the model across hundreds/thousands of GPU's, but feed each on a different set of the data. Model updates are then synchronized, hopefully taking advantage of some sort of pipelining to avoid bottlenecks. This is referred to as data-parallel.

[0] https://www.lesswrong.com/tag/scaling-laws

jimmySixDOF · on Nov 24, 2022

All this horsepower deployed to image generation is interesting but somebody wake me up when there is a stable diffusion for SQL or when on demand generative User Interfaces are spun up on the fly to suit the purpose.

ShamelessC · on Nov 24, 2022

Will do!

sabalaba · on Nov 24, 2022

Here’s a tutorial on how to fine tune stable diffusion form the guy who made text-to-pokemon:

https://lambdalabs.com/blog/how-to-fine-tune-stable-diffusio...

Der_Einzige · on Nov 24, 2022

A paper was already presented at this workshop at COLING 2022 by Nvidia which already does this

https://arxiv.org/abs/2209.14697

j0hnM1st · on Nov 24, 2022

It will be worthwhile to use images from commons. I have found that my photography is used in the stable diffusion data set. What was funny is that they have taken the images from other URLs than my flickr account.

LASR · on Nov 24, 2022

I am a solo dev working on a creative content creation app to leverage the latest developments in AI.

Demoing even the v1 of stable diffusion to the non-technical general users blows them away completely.

Now that v2 is here, it’s clear we’re not able to keep pace in developing products to take advantage of it.

The general public still is blown away by autosuggest in mobile OS keyboards. Very few really know how far AI tech has evolved.

Huge market opportunity for folks wanting to ride the wave here.

This is exciting for me personally, since I can keep plugging in newer and better versions of these models into my app and it becomes better.

Even some of the tech folks I demo my app to, are simply amazed how I can manage to do this solo.

mberning · on Nov 24, 2022

I don’t know anybody that is blown away by keyboard auto suggest. It’s wrong as often as it is right. Not saying it isn’t useful, but let’s not oversell it.

SkyPuncher · on Nov 24, 2022

Lol. Especially the AI version of keyboard auto suggest.

Let's take a deterministic algorithm that predictably corrects your typos and build it on AI. It will offer you no benefits, but it will completely destroy the utility since it will never work predictably or accurately.

FPGAhacker · on Nov 24, 2022

Auto correct and auto suggest are related but different things.

Suggest puts up options for the next word.

SkyPuncher · on Nov 24, 2022

My comment would remain exactly the same for auto-correct. They are essentially the same thing, just pre and post typing.

They both serve the same purpose of helping the user quickly and accurately communicate on a cell phone. Like auto-suggest, I rely on auto-correct to fix things that I know I commonly mistype. When it doesn't work predictably, it's useless.

hackernewds · on Nov 24, 2022

Strongly believe selection bias among the folks you're getting this impression from. The avg user is not our circle.

LASR · on Nov 24, 2022

This is exactly it.

Honestly I was quite surprised at how regular people are impressed by this tech. I was also surprised by how little regular people are aware of this tech even existing.

We, on hackernews, on a thread about Stable Diffusion, are of course not too unimpressed.

But that’s not the vast majority of people.

postexitus · on Nov 24, 2022

Your insistence on not understanding the point made makes me think you, my friend, are also an AI.

bongobingo1 · on Nov 24, 2022

> Impressive isn't it!

>> Yeah! Don't they make a trillion dollars a year? How is it so crappy?

Godel_unicode · on Nov 24, 2022

This is universally the result that I see from my nontechnical friends; Apple has literally all the money and Siri has the listening comprehension of a drunk beagle.

causi · on Nov 24, 2022

It’s wrong as often as it is right.

And for some damn reason they refuse to stop changing "ok" to "OK" like we're all octogenarians on Facebook.

LouisSayers · on Nov 24, 2022

Maybe blown away at how terrible it is... like how many times do we need to correct it for it to show us the same shitty suggestions. I'm not sure I'd even notice if it were turned off.

dorfuss · on Nov 24, 2022

I am blown away that auto suggest is so wrong. I simply have this feature turned off.

prakhar897 · on Nov 24, 2022

I'm in a kind of same boat. I think indie games are the way to show true potential of SD.

Hence, I'm working on http://diffudle.com/ which is a mix of Wheel Of Fortune + Stable Diffusion + Wordle. I Can't figure it out but feels to me like its lacking something.

stymaar · on Nov 24, 2022

> Hence, I'm working on http://diffudle.com/ which is a mix of Wheel Of Fortune + Stable Diffusion + Wordle. I Can't figure it out but feels to me like its lacking something.

That's awesome, I love it!

prakhar897 · on Nov 24, 2022

Thanks :)

UweSchmidt · on Nov 24, 2022

Very creative and a fun way to interact with SD. I would encourage you to explore this idea further, as interest in SD might grow and people want to engage with the topic in an accessible way. I like the idea of hard-limiting play (1 quizz per day) but a small backlog of previous pictures could be nice to explore a little.

prakhar897 · on Nov 24, 2022

Thanks!! I'll add a "Past Games" section from where you can play more if you want.

rob74 · on Nov 24, 2022

Wow, that's really cool actually, have to bookmark it! One feature I would add would be going back through the previously shown images, that would make it easier to guess what they have in common. Also, larger images would look nicer, but I guess that would drive the costs up?

prakhar897 · on Nov 24, 2022

Thanks!! I'm debating whether to show history of images as it will reduce the difficulty by a lot. Larger images is a great suggestion, I'll add them ASAP.

Zitrax · on Nov 24, 2022

It confused me that the letter boxes were divided in 7+3, thus I thought it would be two words while the correct answer was a single 10 letter word. Maybe try to avoid wrapping words.

prakhar897 · on Nov 24, 2022

Nice Observation!! I'm thinking including a start and end mark to improve the UX would work well. I can't avoid wrapping as the prompt might be very large.

momojo · on Nov 24, 2022

If the word is always going to be a single word, I would clarify that with some examples in the tutorial

WithinReason · on Nov 25, 2022

Put a hyphen at the end!

CamelRocketFish · on Nov 25, 2022

Bookmarked! I love it, would be good to play past games

prakhar897 · on Nov 26, 2022

Thanks :)

myrloc · on Nov 24, 2022

I really want to try this. Please add mobile iOS support!

prakhar897 · on Nov 24, 2022

I'm able to use this on my Iphone browser. Can you elaborate if you're facing any difficulties?

CamelRocketFish · on Nov 25, 2022

It worked eventually for me but the scrolling seems stuck in the beginning or maybe only certain areas are scrollable?

jonashus · on Nov 24, 2022

I don't see anything below the Hint button. And scrolling seems disabled.

prakhar897 · on Nov 26, 2022

Sorry I broke it between updates. Should be working now!!

prakhar897 · on Nov 26, 2022

It's working now. Didn't notice it was broken before!!

monero-xmr · on Nov 24, 2022

There just isn’t a lot of market opportunities where being right 99% of the time is good enough. If you are operating at scale and 1/100 decisions are wrong, the outcome is poor and often highly off-putting to users.

It’s possible this time is different, but people at my company were entertained by DALLE for all of 5 minutes before no one ever mentioned it again. The value proposition is simply low.

futureshock · on Nov 24, 2022

Are you kidding? Many times corporate decisions are being made effectively at random. Thinking that the average company operates with a 999 batting average is a total fantasy.

xnyan · on Nov 24, 2022

When our c suite decides on an ad campaign and tells our artists to draw normal humans, those people have 3 legs or upside down teeth exactly 0% of the time. Humans have many many limitations, but with every model I’ve tested there’s a set of errors that would virtually never be made by any human.

wizeman · on Nov 24, 2022

> with every model I’ve tested there’s a set of errors that would virtually never be made by any human

I guess you've never seen my drawings...

vintermann · on Nov 24, 2022

I think it's interesting that drawing too many fingers is a mistake kids make, too, although with less photorealism otherwise. I guess there's a reason all thosr famous artists drew hundreds of hands as practice as well.

boardwaalk · on Nov 24, 2022

No one is suggesting anyone take the first image out of a model without any human-based filtering.

pardon_me · on Nov 24, 2022

Exactly. I have a catchphrase for this "AI generate, human mediate".

This revolution is allowing us to conduct the orchestra instead of playing each instrument.

iceburgcrm · on Nov 24, 2022

In fairness ad campaigns have people who review the creative. Now I could see a band using an image like that or something edgy

kennyloginz · on Nov 24, 2022

As an outsider, this rings true to me. I still don’t see any reduction of hours involved in producing professional level works. Generating YouTube thumbnails, sure.

fragmede · on Nov 24, 2022

I agree. Cars break down and crash, they'll never replace horses.

foobazgt · on Nov 24, 2022

I think this analogy doesn't hold water - horses aren't exactly a beacon of reliability (having owned one).

I've already seen tools that support workflows where you compose art by iteratively generating a piece of it, performing some correction, and repeating. So, I think there's room in the art world for less than perfectly generated art. That said, let's not kid ourselves that the typical failure modality of ML today (99% correct enough, 1% disastrously incorrect) doesn't either cause it to be entirely useless in many applications or end up wreaking havoc on end users in others.

fragmede · on Nov 24, 2022

It's only an analogy, but it serves to underscore the last point you make. Initial versions of the technology can make some genuine horrors but you're blinding yourself to progress if you can't see the potential in it.

sussmannbaka · on Nov 24, 2022

the cars we're talking about here have a random amount of wheels and sometimes morph into cosmic horrors mid-ride.

nextlevelwizard · on Nov 24, 2022

>Demoing even the v1 of stable diffusion to the non-technical general users blows them away completely.

What do the results have to do with "non-technical" people? I am blown away every time I run stable diffusion of the images I get out from it.

cassianoleal · on Nov 24, 2022

I'm not. I find them funny and amusing but that's pretty much it.

lannisterstark · on Nov 24, 2022

>Huge market opportunity for folks wanting to ride the wave here.

what precisely is the market here?

catnip09 · on Nov 24, 2022

What are you building?

LASR · on Nov 24, 2022

It started as an AI-powered MS paint for my son. But after demoing it to a few coworkers, it morphed into a bit more than that. Now it’s more of a storybook creator that young kids can use to generate their own stories.

Not looking to monetize at all. But inference is expensive. So might have something to cover costs.

Some backstory:

When I was growing up in the early 90s, my dad took me into his office over the weekends when he was doing some overtime paperwork. I would be on his IBM Windows 3.1 workstation. He didn’t have any games on his work computer, so I would spend the entire day “playing” with MS Paint. I couldn’t read yet (3-4 years old), but I was able to figure it out.

We didn’t have a computer at home. But seeing how I was so good at it, my parents bought one. I eventually got into coding etc. All of this defined who I am today.

So I wanted to recreate some of this magic, for my own son. He’s 3 months old, so not quite the right age. But I have some free time on parental leave. So why not. Might be useful for parents with 3-5 year olds.

M4v3R · on Nov 24, 2022

As a father of a 1.5 year old girl this sounds incredibly awesome and I'm hoping you will release it somehow, looking forward to your Show HN post!

ericd · on Nov 24, 2022

I have some young kids, would love to try this out with them if you’re sharing.

stymaar · on Nov 24, 2022

Same here! (And I also remember playing with MS paint on my dad's work computer)

momojo · on Nov 24, 2022

I'm getting Scribblenauts vibes in a good way.

KidComputer · on Nov 24, 2022

[flagged]

wfme · on Nov 24, 2022

This comment pretty clearly breaks the commenting guidelines https://news.ycombinator.com/newsguidelines.html

>Be kind. Don't be snarky. Have curious conversation; don't cross-examine. Please don't fulminate. Please don't sneer, including at the rest of the community. Edit out swipes.

Comments should get more thoughtful and substantive, not less, as a topic gets more divisive.

KidComputer · on Nov 24, 2022

hackernewds · on Nov 24, 2022

This is unnecessarily presumptive and targeted, offering no practical value

LoveMortuus · on Nov 24, 2022

AI mobile keyboard suggestions, aren't really so good that I would be impressed with it, especially because it gets it wrong so many times, although to be fair, I do write in the languages, which I'm quite sure doesn't help the AI in the slightest.

ksec · on Nov 27, 2022

Last time we had people in the field of AI writing on HN expecting another AI Winter. I thought all the improvement we have now, and we might get in the future like Stable Diffusion 3.0 and possibly many others, while not L5 Self Driving Car or General AI, has yet to be fully used outside of Tech community, or even in Tech itself. The long tail of distribution and positive feedback loop will sustain its development for another 10 years.

malermeister · on Nov 24, 2022

What kind of stuff does your app do that blows people away?

butMebbe · on Nov 24, 2022

Wraps an ML model that blows people away in opinionated UX

kennyloginz · on Nov 24, 2022

That’s a little vague, so forgive me if I’m assuming too far.

You are making changes to a products UX based on graphical inference?

I could see a decent business supporting the logic problems a UX designed from AI graphics would introduce ;)

seydor · on Nov 24, 2022

And we are not doing a good job at educating people and preparing them for what's to happen. People are so used to BigTech making decisions for them

tamersalama · on Nov 24, 2022

I agree that this a big wave, but I'm still struggling to find commercial (read: large organizations) applications.

fsloth · on Nov 24, 2022

I guess you need to look at the current TAM for visual production in general. That's the baseline (so includes visualization studios, agencies, game studios etc). Generally this potentially can help in many labour intensive parts of a creative visual process.

Is that a "large" organization market or not depends on your metric and what the market positioning of the offering is. I would see applications in both specialist content creation tools as well as "stock photos and merch".

In terms of finding stock photos, if you add a better text api that is easier to control this probably can compete with static stock photos in the sense that people can tune their images as much as they like. For example with their corporate merch (Imagine producing a slideset at Acme co. "Please give me an elephant and walrus wearing acme caps".

Ad agencies already love that they can train a model to quickly iterate product shot ideas extremely rapidly.

Then we have "the usual" effect automation has on market demand - automation increases the productivity of a task requiring labour, hence allowing to reduce the cost of a unit of production, which generally increases the demand. I.e. creative stuff will be cheaper to do, you won't replace artists, but suddenly the dude or dudette who spent hours just tweaking stuff has their own art studio at finger tips to command. They can get so much more done much faster.

The tech is not 100% bullet proof yet but at this pace it will be good enough soon (or probably is for several applications if there was just an UX sugaring targeting specific domain workflow).

wodenokoto · on Nov 24, 2022

Built a plugin for Power Point and sell it corporate wide.

abeppu · on Nov 24, 2022

Does it actually make anything in the corporate world better to use generated images in slides? When coworkers use stock photos which were presumably made by humans operating actual cameras, I don't think it's clear that their presentation is actually more valuable as a result.

127 · on Nov 24, 2022

Why does it have to be large organization? Why can't it be many small businesses.

ftufek · on Nov 24, 2022

I suspect those applications will come from specializing the model. For example, there's people that have avatar generators or automated ad creatives. A cool application I've been toying with is generating icons.

subbu · on Nov 24, 2022

Train the model with https://lucide.dev/ and ask it to generate a few more?

seydor · on Nov 24, 2022

While i agree it is exciting, the media industry will remain the same in size. Does this have applications outside media/entertainment ?

jnrk · on Nov 24, 2022

> The general public still is blown away by autosuggest in mobile OS keyboards.

And it's still not available in my language on iOS... :( (Norwegian)

layer8 · on Nov 24, 2022

The dangerous thing is that people also don’t understand the limitations of that technology.

minimaxir · on Nov 24, 2022

GitHub Repo: https://github.com/Stability-AI/stablediffusion

HuggingFace Space (currently overloaded unsurprisingly): https://huggingface.co/spaces/stabilityai/stable-diffusion

Doing a 2.0 release on a (US) 2-day holiday weekend is an interesting move.

It seems a tad more difficult to set up the model than the previous version.

Operyl · on Nov 24, 2022

Seems like a potentially good time to launch it, lots of young people with free time.

Gigachad · on Nov 24, 2022

Less likely journalist whingers will pick it up too.

kennyloginz · on Nov 24, 2022

Definitely something to talk about or share with the fam.

IanCal · on Nov 24, 2022

I think it's looking fairly similar, the first one was a bit tricky too. Later improvements by the community made it clearer.

The docs aren't good though, it tells you to download two things when actually I think you only need one. If you do need two then it doesn't tell you at all where to put the second.

You really need xformers if you're doing it at home, I've got a 3090 and it blew through the ram without it. However, the instructions didn't work for me for compiling and there's an incompatibility if you try and install from conda. You can have it work but you need to upgrade python from 3.8.5 to 3.9 in the yaml file first, then you can install it (xformers needs 3.9+, and something else in SD breaks on 3.10+ so 3.9 works).

This needs the classic "sit next to a new person installing it by following the docs and see what problems they hit, fix the docs and start from scratch again" process.

Looks good, though so far the images I've made don't look as nice as with 1.4, but I guess that's largely down to finding the right tweaks for the model and right magic wording for the prompts.

gst · on Nov 24, 2022

> Doing a 2.0 release on a (US) 2-day holiday weekend is an interesting move.

Their HQ seems to be located in London.

minimaxir · on Nov 24, 2022

True, but it's going to dampen the launch a bit.

jehb · on Nov 24, 2022

Is it? For me, I'm in tech but nowhere near anything for which playing with a new SD release would be relevant to my day job. Having a couple of extra days off to play with a new tech toy probably means I'll use it more.

smcleod · on Nov 24, 2022

I don't see why? The world is a whole lot bigger than the USA!