Hacker News new | past | comments | ask | show | jobs | submit login
Generating Fashion Using AI (twitter.com/karenxcheng)
226 points by BIackSwan on Sept 3, 2022 | hide | past | favorite | 71 comments



Part of me thinks "this is really cool, it's all moving really fast", but the other part of me thinks "this is moving too fast, aren't people missing something?".

It feels like everyone's so desperate to create or predict the next AI unicorn, that no one's paying attention to the fundamentals. Gives me weird dotcom bubble vibes.

I still cannot see how this can fundamentally change the fashion industry. Fashion is not about design, it's about generating desire to buy. You hire models and influencers to showcase your items, so people think they can be just like them, you just have buy the same piece of clothing. AI doesn't change that. You could totally have a niche for AI-generated clothing, but that's it.

Besides, clothes are a physical item. AI can automate the generation of a "blueprint", but it is still heavily constrained by the physical result. It would be mostly hit and miss. Feels more efficient to have designer use their experience to create sketches that would actually look good in real life.

All in all, I am amazed as much as I am skeptical of the entire synthetic art revolution going on. It's been what? A month? It's too soon. Aren't we jumping the gun?

Whatever happened to boring opinions? "Hmm that's amazing, but we'll see what happens, it's too soon". I'm yet to be convinced of how, beyond being an overall better tool, synthetic art will fundamentally change businesses.


"You hire models and influencers"

The influencers can also be generated by AI (or even just 3D modelling), since so much of the marketing is digital. And it's already happening (mostly in asian countries):

https://www.dexerto.com/entertainment/ai-created-influencer-...

These videos are just demonstarting the concept design part created by something not even designed for this specific use case. It's true that making a product from a concept design is another problem. But it's not hard to imagine that AI could be trained specifically for this on actual design plans and using physically accurate cloth simulation in the training loop to generate something that could be actually built. After all if AI can improve protein folding predictions why not cloth folding :). (OK, it' not completely the same but both are about 3D structure).

We already have some pretty good sub-millimeter level cloth simulations (not using AI but could help AI): https://www.youtube.com/watch?v=Mrdkyv0yXxY

And of course people are excited when they have a new tool they can use, they try to find more usecases for it and some will work out some not. In this case they even have the open-source blueprint for the tool so they can finetune it for their own ideas.


There are issues with this, though.

One big issue is that big companies already with some regularity rip off boutique designs and then undercut them. Big data and harnessing AI will mostly make that situation worse.

Examples from Shein, a Chinese fast fashion retailer that already uses strategies like this: https://www.dazeddigital.com/fashion/article/55146/1/shein-f...


It feels like everyone's so desperate to create or predict the next AI unicorn, that no one's paying attention to the fundamentals.

What exactly are the fundamentals?

The transformer revolution is pretty foundational and the inability to see the societal impact of this technology can only be explained by a lack of imagination.

Majority of peoples time is spent in digital worlds, the ability to create synthetic images, text, etc. is going to radically alter the way we spend our time online. Both malicious and marvelous use cases will be discovered, but to hand-wave it all away as a "Dotcom bubble vibe" is tech cynicism at its worst.


>Majority of peoples time is spent in digital worlds

That's just the bias for people in your bubble, the majority of the population of this planet spends its time overwhelmingly in the real world. Calling it "digital worlds" is also a big indication of tech bro bias.


There are billions of people who use FB/instagram/Tik-tok, so yes that’s the majority of people that spend time in digital worlds


I think you're reading too much into it. It's just a cool application of changing clothes in videos.

Nobody presented it as changing the fashion industry. It's generating new outfits in a video, and that's super cool.

I don't think anybody's moving too fast or missing fundamentals here, at least not yet. This is all brand-new, so people are just having a fun time exploring.


Yup, i don't think this even applies to fashion, after all it's just reconstituted fashion.

The most obvious application of this is to replace, augment or make more economic live action computer graphics techniques.

That could actually be pretty interesting though, liberating even, because it would open up a previously incredibly expensive field... giving more access to compete with CG previously only available to high budget hollywood. Imagine if manipulating a live action scene was as easy as dropping some descriptions into some program and paying for a little compute.


The other application that comes to mind is some online shopping virtual outfit try on.


It won't shake up the fundamentals, and knowing the fundamentals is probably more valuable than ever in a sea of spam images.

AI does point us in an exciting new direction though. This stuff is baby steps and currently at day 1 of germinating a whole new kind of fruit 20yrs down the line. And that's all it needs to do, to be revolutionary. We need new tools and directions.


I love that the author shows the whole workflow. It highlights how much tweaking and work it takes to achieve a polished output. Still amazing how easy it was to chain the networks together.


The creativity happening around AI right now is incredible. I came across this recently and was blow away: https://twitter.com/xsteenbrugge/status/1558508866463219712?...


That is spectacular! Keep in mind when watching this that the technology to make this has only been available for mere weeks, and here's a video that would have taken months-to-years to produce using traditional techniques.

True automation is coming to art.

Mindblowing.


https://twitter.com/TomLikesRobots/status/156567899598691123...

https://twitter.com/remi_molettee/status/1564632028959629319

https://twitter.com/matthen2/status/1564192185909739521

https://twitter.com/Infinite__Vibes/status/15650454342293340...

https://twitter.com/thibaudz/status/1564892979789045760

https://twitter.com/chav_ez/status/1565806042344087552

https://twitter.com/zippy731/status/1564616100477820938

https://twitter.com/RonnyKhalil/status/1565024524181159941

https://twitter.com/remi_molettee/status/1565356181190807553

https://twitter.com/remi_molettee/status/1563187170734927872

https://twitter.com/replicatehq/status/1564354673108127744

https://twitter.com/genekogan/status/1564626995979370505

https://twitter.com/ala_art_lab/status/1565984951178346496

https://twitter.com/DrewMedina20/status/1565746320953966592

https://twitter.com/Aiartitune/status/1565795049102786560

https://twitter.com/Aiartitune/status/1563651144806645760

https://twitter.com/ala_art_lab/status/1565603328003870720

https://twitter.com/pharmapsychotic/status/15642809223625973...

https://twitter.com/Aiartitune/status/1563832168119517186

https://twitter.com/socalpathy/status/1565899540451966977

https://twitter.com/zippy731/status/1565342075196870656

https://twitter.com/makeitrad1/status/1563335226524282882

https://twitter.com/mrflosunday/status/1565885053753761792

https://twitter.com/TomLikesRobots/status/156488734249359769...

https://twitter.com/EuclideanPlane/status/156421740831482675...

https://twitter.com/erocdrahs/status/1565320455162044417

https://twitter.com/Carl_Ingram_art/status/15630745562893967...

https://twitter.com/benscottpye/status/1565352548608778242

https://twitter.com/MichaelCarychao/status/15645904797940613...

https://twitter.com/AiJoe_eth/status/1564221320916779011

https://twitter.com/Carl_Ingram_art/status/15646973666696273...

https://twitter.com/ChekhovEugene/status/1565880769477738497

https://twitter.com/originalmaderix/status/15656282243520552...

https://twitter.com/Aiartitune/status/1564177213888643072

https://twitter.com/Infinite__Vibes/status/15646387276616581...


Still needs temporal stability to transfer a prompt-generated style onto a video. I wonder if it can be combined with something like this

https://isl-org.github.io/PhotorealismEnhancement/


Wild! Thanks for sharing.


These are cool, but I feel like too many of these are just kinda coming up with a simple prompt and letting SD do it's thing. I'm looking forward to more human direction with these videos rather than just "what happens if AI zooms in a bunch?" (the Infinite__Vibes ones are getting there).


Thanks for sharing. They somehow remind me of the VGA computer art animations of the 90s. Fond memories.

Couldn’t find any links on YT. Please share if anyone has them.


Old enough to get the reference!



Really impressive, thanks for the links.

Has someone a good intro to stable diffusion for someone who doesn't know a lot about AI?


+1


Nice collection!


woah, thanks for the links!


Which means with goggles we can all wear simple cotton outfits and appears fancy digitally. #ecofriendly


Virtualizing consumption may be the aspect of mixed reality I’m most excited about. There are plenty of goods that have to be physical, but many don’t.


I worked at an ai startup that was trying to generate products to sell. The AI engineers didn’t understand that the image isn’t useful, but only good for judging whether the design looks good. It needs to spit out json schematics for the product in a way a specific manufacturer can understand.

There’s also the problem that AI can’t be specific. I can’t design merch with a specific video game logo or band name. The output always has that “AI dream residue”.

This is a useful tool only for creative inspiration.


Perhaps AI text-to-STL file, ready for 3D printing prototypes?


AI images are rapidly becoming just like clip art and stock photos: i.e. you can, with some training, recognize the signs. And, just like clip art and stock photos, you should recognize when an image is used without much effort, since it is probably not additive to the context of whatever media it is embedded in, and can be safely ignored.


It might sound unrelated, but many years ago I moved to Europe, coming from a South American country. Having spent many years "at home", I never bothered much with how we looked like, physical features that you could say stereotypical. But then I moved to Europe, and suddenly I could spot people from my country from miles away. After years and many places travelled, I still cannot describe in words what we look like, but I know it when I see. It's a gut feeling.

At least with the photorealistic prompts, there is always this feeling of "this doesn't look right". Right now it's easy to pinpoint what it is. Usually is a completed distorted face. But they might get better at this. But probably there will always be that "uneasy feeling".


There does seem to be a lot of editing work required to get this working.

I would like to understand if there's a more automated way of doing this.


There isn’t yet but my company’s goal is precisely to make this kind of thing accessible to everyone. I don’t want to spam and we haven’t launched yet anyway, but there’s a link in my profile if interested.

I suspect there will be a lot of companies that build businesses out of chaining models together for specific kinds of users. The more you can focus on a niche, the more you can paper over some of the limitations in the models.


The most important work is done by Dalle which has no open API right now. Once they open it up - there could be a way to run a program on all 3. You would still have to cherry pick which frames from Dalle look the nicest


I believe (but not 100% sure) you can achieve this with Stable Diffusion also, which is something you can run locally.


For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2. Stable Diffusion also has fairly weak inpainting which is what is used for the first step. I am confident it will surpass Dalle 2 at some point though, it has only been open sourced for ~2 weeks after all.


> For sure but Stable Diffusion still cannot produce as realistic quality as Dalle 2.

I don't think that's true. Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

Some SD outputs are truly amazing, but so is some DALL-E 2 outputs. Maybe DALL-E 2 is easier to use for beginners, but people who are good with prompts will get as good output with SD as with DALL-E.


I’ve tried about 500+ prompts on Midjourney, Stable diffusion, and Dalle 2 each. It’s just not there yet, it’s really good for creative results though.

I agree it might do a the job decently for a dress in this instance.


> Just like Gimp can be used to make better results than Photoshop (or even mspaint), the quality of the results is up to the artist/user, not the tool itself.

There's a huge difference between Gimp/Photoshop and an image generation model.

If a particular model can't generate faces properly then the "artist/user" can't get around that unless they develop a new model or find one that can fix the output of the first.


I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge. The more related keywords involved in the composition, the greater the artist's control over the final result. Example: the prompt

_A distant futuristic city full of tall buildings inside a huge transparent glass dome, In the middle of an arid desert full of big dunes, Sunbeams, Artstation, Dark sky full of stars with a bright sun, Massive scale, Fog, Very Detailed, Cinematic, Colorful_

is more sophisticated than just

_A city full of tall buildings inside a huge transparent glass dome_

Note that the conceptual density, hence the quality, of the prompt depends heavily on the cultural and linguistic background of the person composing. In fact, a quality prompt is very similar to a movie scene described in a script/storyboard [by the way, there go the Production Designers, along with the concept artists, graphic designers, set designers, costume designers, lighting designers… ].

In an attempt to monetize the fruits of the new technology, Internet entrepreneurs will be forced by the invisible hand of the job market to delve deeper into language skills. It will be a benign side effect, I think, considering the current state of the Internet. Perhaps this will lead to a better articulation of ideas in the network environment.

Just as YouTube influencers have a knack for dealing with the visual aspects of human interactions, aspiring prompt engineers will have to excel at sniffing out the nuances of human expression. They have great potential to be the new cool professionals of the digital economy, as were web designers, and later influencers — who, with the end of social networks, now tend to lose relevance.

To differentiate themselves, prompt engineers will have to be avid readers and practitioners of semiotics/semiology.

Umberto Eco and the structuralists may return to fashion.

(*) I used a prompt by Simon Willison

https://simonwillison.net/2022/Aug/29/stable-diffusion/


> In addition to linguistic precision, the parameters involved in prompt composition, for a perfectly controlled artistic result, require technical knowledge, a sense of style and historical knowledge.

You are assuming that the models themselves respond accurately to linguistic clues. Actually they embody the cloud of random noise, prejudices, innacuracies and misconceptions in the training data and then pile on a big layer of extra noise by virtue of their stochastic nature.

So this isn't a case of the learned academic with extensive domain knowledge steering a precision machine. It's more like a someone poking a huge chaotic furnace with a stick and seeing what comes out.


> I agree. People who will carry out this activity at a high professional level in creative agencies will have to deepen in the observation and study of language.

So they have to be a character out of a William Gibson novel? Do they rip the labels off their clothes too?

PS: This is awesome.


That beats the point. If you need specialized artists and know how to make this work it’s useless. Might as well just use photoshop then. The whole point is your grandma should be able to do this herself.


???

Imagine someone saying that about Photoshop when it was first introduced...

"Why would you even use Photoshop if so you still have learn a tool in order to use it? Might as well just use a canvas with colors... The whole point of Photoshop is that your grandma should be able to use it!"

No, every tool in the world is not meant to make it zero effort to do something. Some tools are meant to incrementally make it easier, or even just make it less effort for people who already are good at something. And this is OK.


It’s not okay. The whole appeal of AI generated art is you don’t need someone with skills. Photoshop is not even close to an analog.


Inpainting comparison between Stable Diffusion and Dalle 2: https://twitter.com/nicolaymausz/status/1565290282907848704

Have to confess I haven't been bothered to log into D2 since the SD beta started. Which is crazy because my mind was blown when it first launched back in April, but it for now seems like closed AI simply can't keep up with the open source community swarm.


For my usage, dalle 2 is better at understanding or creating pictures that are a good start. But stable diffusion provides a better resolution and provides more settings. I sometimes use them together.


does the inpainting code actually work or is it just a vestigal placeholder from the last version of stable diffusions release?


They will never open it up. They're ClosedAI, because they're a for profit company that makes money off their SaaS which prohibits reverse engineering.


They will open up an API yes. That is different than being open sourced. Actually seems that anyone who currently has access can try this: https://github.com/ezzcodeezzlife/dalle2-in-python


Too many products/businesses use "Open" to mean "You can use it if you pay" rather than "you can understand how it works" or "you can look under the hood".

OpenAI is just another example of a typical SaaS business misusing the word to make themselves seem "nicer" than they are. Ultimately, it's a for-profit business and it will operate as such, shouldn't surprise anybody.


I wish open AI actually had a SAAS model.

If you want a SaaS model, you should have all of the things that the SAAS model supports including premium pricing and enterprise tiers, OpenAI needs to get their act together on DallE2 as no serious business use case will use the consumer pricing and hacked together unofficial apis.


Maybe we differ in how we understand what "SaaS" really is (a debate which is as old as the concept of "SaaS").

For me, if you offer something software-y behind payment, and the software can only be accessed online, it's a SaaS.

No need to offer premium pricing, enterprise tiers, support or anything else. Putting up a online API/UI that is locked behind payment, it's a SaaS (in my eyes).


NoMoreBro here on HN is also generating fashion pieces: https://unshush.com/


Thank you for mentioning it, yreg! :)

There is Twitter too https://twitter.com/UnshushProject

I started with a lot of enthusiasm but discovered you need to be A LOT more social (or lucky) to make people see things!

Next time maybe with some code and a Show HN it will be more fun.


Jesus christ fast fashion is just going to get worse isn't it?

Standing infront of your augmented reality mirror in the evening swiping through AI generated outfits for the next day that are made automatically during the night and shipped to you by drone the next morning to wear for the day and then dispose of.


"Standing infront of your augmented reality mirror in the evening swiping through AI generated outfits for the next day that are made automatically during the night and shipped to you by drone the next morning to wear for the day and then the next time to drone comes, it also brings with it yesterdays fabric to recycle it by destroying it to small particles, washing them, then create new clothes for a different person, and the cycle starts a new"

Adding a bit more imagination to your initial idea make the entire idea actually a net-positive. People can start recycle clothes (good for the planet), with a new design everyday (good for people who like that) and it provides a business (good for the economy).


The only part that seems implausible here is the recycling.


One can imagine a quite opposite scenario:

You wear AR glasses and can choose whatever virtual fashion you want. Other people wearing glasses too can also see your outfit. So you make something physical into something virtual, thereby not having fast fashion at all.


I think you have too much faith in humanity…more like glasses to remove the virtual clothes.


That's a far far future. The current AI simply can't reason with the design. It's all just 2D image manipulation, and the image you see might be physically impossible, let alone generating the fitting sewing pattern from a simple image. Perhaps, one can limit the scope of AI generation to minor details, like color, pattern, printings, pockets, etc.


Far far future as in 1 year?


I'd be shocked if it takes as long as a year for the first neural 3D engine to be released.


There's considerably more conceptual distance between "dalle but 3d" and "dalle but emits practical product designs rather than concept art" than there is between dalle and 3d dalle.


Conversely, there's considerably more difference in concept and capacity between Dalle, and what's in development but not yet released. Safe to say that OAI and the rest of bigtech is at least a few years ahead internally and the open source community is catching up faster than most think, and even branching and exceeding capabilities in some directions as the tech begins to snowball.


How much money are you willing to bet that there would be an AI producing working clothing designs from prompts in 1 year? If it's less than your yearly salary, even you don't believe this crap.


Old man yells at the sky.


> shipped to you by drone the next morning to wear for the day and then dispose of

An environmental catastrophe with today's technology.

But if we can fix that, say make the clothes with a 3d printer out of yesterday's recycled clothes, with energy from renewable sources, then that sounds pretty cool actually.


Has anyone run existing stop-motion videos through DAIN?



I wonder when the transition will happen and when DALLE 2 and similar technologies will actually replace jobs. Because practically, they can already.


Fashion is mostly a scam. The idea that people - especially, but not just - women are supposed to replace their wardrobe (*) frequently, and that the items in their wardrobe need not last, induces artificial consumer demand, diverting social resources - especially those distributed among the masses - from more useful endeavors (social or private), and is environmentally wasteful.

So, my non-artificial intelligence has generated the following new fashion: Last year's.

(*) - Note I'm not saying change clothes on themselves, but the set of clothes they own/use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: