Nvidia releases Paint me Picture – A web app for GauGAN2

akersten · on Nov 25, 2021

I really don't have anything constructive to say. I think in general we're getting too soft on shitty things, so I'm going to be harsh.

I clicked through to the demo site ( http://gaugan.org/gaugan2/ ) and it was horrible.

The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.

I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.

This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."

If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

[0] https://imgur.com/BNLDt6A

onion2k · on Nov 25, 2021

Your comment is a really good example of how startups can fail.

Here we have a web app that does something very clever, built by great devs, that fails because it doesn't work the way users expect it to.

So many highly intelligent, super technical founders have a belief that their amazing tech will sell itself, so they don't put the time and effort necessary in to design, UX, or marketing, and they fail because their UI didn't make it clear what to do. It probably works brilliantly when they demo to people, with the authors driving it or helping users get the most from it. But when users have to use it without that help... It fails.

The lesson for founders here is simple - test your UX, because you won't get a second chance with most customers.

phist_mcgee · on Nov 25, 2021

The app I used worked nothing like the slick demo in the video. In fact, the UX and UI are some of the worst I have used in recent memory.

No matter what some backend folks believe, there will always need to be highly skilled front end engineers who can put together web apps in a way where the interface just 'gets out of the way' so you can focus on the actual utility.

hutzlibu · on Nov 25, 2021

"The app I used worked nothing like the slick demo in the video."

This point is very important and I hope to not do the same misstake.

Because I also watched the video, saw what was happening there, looked nice - but trying it for myself absolutely did not work as expected.

In other words, if there is a simple video with features shown - then trying it out needs to be as simple as the video, or it causes lots of frustration.

First time users do not want to deal with setup configs etc. first. This is something you want later.

akersten · on Nov 25, 2021

After a sibling commenter very patiently pointed out that I was holding it wrong - I would encourage taking another look at this project if only to try out the Painting mode.

It does produce some very cool results: https://imgur.com/LQuo4UM

Had the press release/tutorial emphasized this angle instead of the wonky text-to-image thing, my initial impression would have been a lot better. This is genuinely a really neat feature. All my UI and discoverability criticism stands though!

ma2rten · on Nov 25, 2021

The text-to-image thing is cool as well, as long as you can a) figure out how it works b) only enter landscape terms.

This comment explains how to use it: https://news.ycombinator.com/item?id=29338213

FpUser · on Nov 25, 2021

I managed to produce landscape - canada fall colors red yellow green and bright blue sunny sky.

No matter what variations I tried the sky was cloudy dark. Trees however were majestic. The UI does suck.

ma2rten · on Nov 25, 2021

Your query worked for me on first try: https://imgur.com/a/iTKwZ3v

FpUser · on Nov 26, 2021

I tested it again and it worked. Not sure what was I doing wrong first few tries

azalemeth · on Nov 25, 2021

Type in "cat" into the text box and see a wonderful variety of landscapes that look like they are straight from the surface of some sort of plane of hell. Or furry flowers. Or fruit with eyes. I was expecting "lion in the Savannah" type pics, not "a visualisation of my DND group's "baleful polymorph" spell"...

pmontra · on Nov 25, 2021

Yes, whatever I type I get some abomination in the general settings I asked for (hills, mountains, ocean.) And I agree that the UX is also horrible. I want the webapp they are showing in the video, not the one they have online.

geoduck14 · on Nov 25, 2021

>I got a generic picture of the milky way for any prompt I tried ("rocks", "trees").

The picture is just really zoomed out. That's how awesome it is - it shows you ALL rocks and trees.

/s

dom96 · on Nov 25, 2021

What if this is done on purpose? If they make the UX too easy then since this tech is so impressive it will be shared across the general population and quickly the service will get overloaded. This way only those that truly have the patience to fiddle with controls have the ability to get it working.

Imnimo · on Nov 25, 2021

It looks like you have selected for it to use a segmentation mask, and not to use text.

akersten · on Nov 25, 2021

I have no idea what that means, but the fact that it both told me to enter a text prompt and actually let me do it while not being in whatever magical mode it should have been in in order to actually use the text prompt is another point that can be added to my above rant.

Alright, I've uttered the incantation for it to do the thing. I still don't get it. [0] https://imgur.com/4zbaiH0

I also tried another example prompt, which bared a striking similarity to the previous result. I don't know if it's persisting the result (It shouldn't - I didn't click the re-use image button), but the strange life-raft looking artifact is very persistent. [1] https://imgur.com/KTCM4xH

Imnimo · on Nov 25, 2021

Yeah, the text-to-image seems to be highly dependent on whether the generator knows how to generate the specific objects the text model thinks should be in the image. I got much more consistent results using the semantic segmentation drawing as input:

https://imgur.com/QC13zml

(and for what it's worth, you're totally right that the UI is just an absolute disaster)

akersten · on Nov 25, 2021

The picture you drew and had it turn into rocks is actually really cool!

I think I would have been more generous to the project had I known it could do that. Maybe I X'd out of the frustrating tutorial too early? :)

johnsolo1701 · on Nov 25, 2021

Literally the first step after accepting the TOS:

https://i.imgur.com/oH4P1Xc.png

egeozcan · on Nov 25, 2021

Yes but the selection is lost if you press enter out of habit. I had the same frustration until I realized what happened which turned that into anger :)

Maybe I'm just too old for tech demos.

johnsolo1701 · on Nov 25, 2021

Nah, the UI is not very intuitive, so there's plenty of blame to go around :-)

danielvaughn · on Nov 25, 2021

Wow, you weren't kidding. What an awful interface.

randyrand · on Nov 25, 2021

Hackernews has never gone soft on anything. Quite the opposite.

IshKebab · on Nov 25, 2021

Yeah Hackernews is almost universally critical. Even his comment is needlessly critical. nVidia isn't poising this as a polished consumer app. It's a demo! The URL is https://www.nvidia.com/en-us/research/ai-demos/

For a research demo the UI is extremely polished. It even has an interactive tutorial!! Some people...

speedgoose · on Nov 25, 2021

I worked in research in human computer interactions, this is a very poor UI even in a research setting. The technology is impressive though.

belval · on Nov 25, 2021

Your parent comment didn't mean that this was a good UI for human-computer interaction research, he meant that it was a good UI considering it was probably just a quick demo built by the scientists who did the research to showcase their work.

I work in a research group and most of my colleagues would have a hard making an interface like this one, no matter how confusing. They are ML/DL scientists and usually have had absolutely 0 exposure to frontend.

speedgoose · on Nov 25, 2021

Yes not every researcher is good in frontend development. But my research group wouldn't promote such a broken UI until someone fix it at least a little bit.

A4ET8a8uTh0 · on Nov 25, 2021

<<If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.

Let me put my tinfoil on. Maybe the idea is to normalize and make it less scary. Look at the press releases. The general response was not a horrified 'omg, this tech has gone berserk', but 'oh, its a benign lil biddy'.

Tinfoil off.

The interface is absolutely atrocious.

EMM_386 · on Nov 25, 2021

I bailed on it also, with the same experience.

A barrage of obtrusive prompts. When it wanted me to enter a phrase I typed "sunset over a field" and got a picture of the Milky Way galaxy also, followed by more prompts.

I then closed the tab and came here to read the comments.

bamboozled · on Nov 25, 2021

It just did absolutely nothing for me, I typed in words and hit enter, just a green screen.

varelse · on Nov 25, 2021

I'll say something constructive. While the text to image functionality is terrible, the segmentation based on arbitrarily drawn images is kind of fun. They should have just gone with the latter.

thecleaner · on Nov 25, 2021

This is a demo piece not a paid app. Nvidia has zero interest in selling web apps. This is just a marketing stunt to show off their capabilities so they can sell hardware.

martinko · on Nov 25, 2021

> I got a generic picture of the milky way for any prompt I tried You need to uncheck "segmentation" and check "text"

Rd6n6 · on Nov 25, 2021

I don’t know how to make sense of the tos for projects like this. They all have clauses that let them modify their terms later for example

iaml · on Nov 27, 2021

> If you press Enter in the prompt field it refreshes the page.

That's just default browser behaviour for forms.

csomar · on Nov 25, 2021

I'm still clicking on the next button.

eis · on Nov 25, 2021

I am just getting very weird results that don't look at all like the one in the demo video. Here for example the image it gave me for "car in front of house" https://i.imgur.com/QdtrtCR.png

Or how about this one for "dog playing with ball" https://i.imgur.com/ldGLdwF.png

I have tried about a dozen different input phrases and every time I get these very strange results.

thom · on Nov 25, 2021

Yeah, it can draw anything you want, as long as what you want is mountains, trees and lakes.

codefreakxff · on Nov 25, 2021

Cars and houses don’t sound like landscape. Maybe they should put some filters for non landscape input

eis · on Nov 25, 2021

I missed the part where it said it was trained only on landscapes. So I retried it with just those and got this:

"river flowing through desert": https://i.imgur.com/QSjH5hk.png

"sunset waterfall": https://i.imgur.com/wS3EEci.png

Or how about a lovely "green shoreline": https://i.imgur.com/RkbTV99.png

CornCobs · on Nov 25, 2021

2nd one is pretty dope. The 3rd one is rather uncanny to me though

guerrilla · on Nov 25, 2021

Well, you did better than I did. I only get pictures of stars and galaxies regardless of what I input, even restricting it to landscapey words.

anigbrowl · on Nov 25, 2021

Those are at least weird and interesting. I tried 'dog running by a river' and it just kept giving me astronomical images ¯\(°_o)/¯

prezjordan · on Nov 25, 2021

Make sure "Input utilization" is set to "Text" if you're entering a text prompt.

dd444fgdfg · on Nov 25, 2021

what? are you saying they showcased the best results? that's unheard of.

eis · on Nov 25, 2021

It's not just a matter of showcasing the best results. It's a matter of night and day difference between normal output and the one showcased. They are not even remotely close. I actually wonder why they decided to release this to the public in its current form. I was very impressed by the noise suppression app they released so I expected something that delivers decent results.

throwoutway · on Nov 25, 2021

The first image would make an excellent /r/writingprompt

visarga · on Nov 25, 2021

You broke it!!

nitred · on Nov 25, 2021

I wonder if in a decade or so, large tech companies will unseat Disney, Warner Brothers etc as creators of animation movies.

While the results on Nvidia's website aren't too impressive, if you look at the history of animated movies [1], one can see how trivial and simplistic the art and animation was.

Having had some experience doing some research on GANs at university, I know them to be very powerful. What's very important to note is that the images generated my the model are truly "novel" i.e. completely fictitious. The images generated may be biased to some of the training data such as color and texture of the water and rocks for example, but every image is a fantasy of the model. The only way the model can generate such realistic images is because it has a very good abstract internal representation of what oceans, waves, rocks are.

Back at university, I pitched the idea to my professor of using GANs for generating "novel" images in real time while parents would read bed time stories to children. I didn't get very far. Glad to see some real progress in that direction.

[1] https://www.filmsite.org/animatedfilms.html

f0e4c2f7 · on Nov 25, 2021

Close, but it will actually turn out to be a very small company (possibly in less than a decade).

Hollywood has little awareness of just how much danger the legacy version of their industry is in.

ML generated assets are slowly creeping towards reality and at the same time doing 3D dev is 100-1000x easier than it was just a few years ago. It's now possible to do for free in many cases as well.

pjmlp · on Nov 25, 2021

Hardly, Disney now owns Pixar after the early 3D competition days, they can as easily buy another concurrent.

teddyh · on Nov 25, 2021

That’s “competitor”.

pjmlp · on Nov 25, 2021

Right, thanks.

stefan_ · on Nov 25, 2021

This seems to be the link: http://gaugan.org/gaugan2/

eis · on Nov 25, 2021

A tip for people who get lost in the interface:

  1. Just close the tutorial
  2. Scroll down to the ToS checkbox and check that
  3. On top in the "input utilization" row make sure only text is checked
  4. Enter your text (use only landscape terms)
  5. Press the arrow to the right inside a rectangle button located below the text input.
  6. Maybe zoom out a bit because the result will be the image on the right which for me was out of view by default. Had to zoom to 50% to see the whole UI.

mcintyre1994 · on Nov 25, 2021

Also definitely don't hit enter like you do in every other form, because that seems to clear your input, swap the "input utilisation" back to only segmentation and sometimes also unchecks the ToS checkbox.

saberience · on Nov 25, 2021

I only said lakes, mountains etc, and only got galaxy/star pictures...

https://imgur.com/a/SiQwBA2

Porygon · on Nov 25, 2021

You have to

1. uncheck "segmentation"

2. check "text"

https://i.imgur.com/n9P8N3c.png

ma2rten · on Nov 25, 2021

Thanks! This UI is something else...

carrolldunham · on Nov 25, 2021

I finally got through to the demo through three links and it's so busted in so many ways for me that I give up. Maybe it's stupid to try with my old netbook but I don't get any indication of whether I need a fancy graphics card for it to work or if it's running on my end. Anyway

-the screen zooms around disorienting for the tutorial and I get to congratulations you made your first image - there's nothing there.

-Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

-The whole site is a fixed width which is wider than my screen

- A red alert check box at the bottom confuses me about whether that's why it's not working etc

webmaven · on Nov 25, 2021

> Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry

That tripped me up too. After typing in your text, don't hit enter or anything similar, just click or tap the button with the right arrow (or anything to the right of that button, effects vary).

Turing_Machine · on Nov 25, 2021

More or less the same, and I have a nearly-new M1 iMac, so it's not your old netbook.

Chrome appeared to work somewhat better than Brave, but it was still pretty frustrating.

I did manage to get it to work in a half-assed manner eventually, but the UI definitely needs a great deal of work.

poniko · on Nov 25, 2021

The UI is horribel .. they don't have one single ux person who have 2 hour to spare that could help out at nvidia ...

blt · on Nov 25, 2021

I entered "kitten" and got typical surreal GAN output with disconnected topology, dozens of eyes, etc.

Edit: Looks like it was only trained on landscape images.

Baeocystin · on Nov 25, 2021

I like what it did with colorless green ideas sleeping furiously. It fits the mood of the sentence.

https://i.imgur.com/jPd0QLE.png

saberience · on Nov 25, 2021

I entered "mountains" and "mountains and lake" and only got pictures of what looked like blurred galaxies and stars. Clicking any of the style buttons got me colored/tinted pictures of stars. Is it broken?

https://imgur.com/a/SiQwBA2

luegen · on Nov 25, 2021

Your segmentation map contains "sky" by default. Try drawing "mountain" color in the lower part of the image first.

Tade0 · on Nov 25, 2021

MtG dual-color lands serve as a good source of ideas on what to put in the textbox:

https://www.mtglands.com/coloridentity-dualcolor.html

As for the UI: layout via tables?

xdfgh1112 · on Nov 25, 2021

Feeding all magic cards into a GAN would be a great way to generate new ones.

blackoil · on Nov 25, 2021

I am impressed, obviously still a tech. demo but I like the future. text to image isn't awesome but segmentation worked beautifully

https://imgur.com/a/VSv9ZbA

greenseagull21 · on Nov 25, 2021

The comments are overwhelmingly critical of the user interface, which is undoubtedly the weak part of this release, but I was still able to get some very impressive results.

An AI generated house on a lake: https://imgur.com/a/0wtVKum

I have found the best results come from uploading an image, then using the demo tools to get a segmentation map and sketch lines, then editing those as you desire. Changing the styling at the end also makes a big difference!

edumucelli · on Nov 25, 2021

I input "a horse", it gives me this: https://imgur.com/1coGQix

dt3ft · on Nov 25, 2021

Behold, AI. I think our dev jobs are safe for the next 100 years at least.

bliss · on Nov 25, 2021

I sit here quite impressed with my Pseudo 50s SciFi book cover

https://imgur.com/a/NZuGTXU

However... This gallery on imgur gives a better idea of capability https://imgur.com/gallery/coWN44P

jeroenhd · on Nov 25, 2021

The UI for the demo is atrocious, but that's probably because the text-to-image generation was glued to their existing AI painting tool.

I'd love for just the algorithm generation tool to be available for download. The web UI is clunky and just doesn't seem to work right.

system2 · on Nov 25, 2021

After trying 30 minutes I kind out understood some stuff. But the ui is legit anxiety inducer. I hope they can fix the ui to make it fun. Currently felt like using 80's DOS graphic software with so much manual input.

King-Aaron · on Nov 25, 2021

Trying to do everything that the tutorial and other commenters have suggested, but when I click the arrow button it just waits for a few moments and nothing is generated. Am I missing something?

wodenokoto · on Nov 26, 2021

Chose building->house, segmentation and text, wrote "skyscraper" as text and drew some lines of a silhouette of a skryscraper.

Returned an image of stars in outer space.

ribit · on Nov 25, 2021

Are there any open-source models that can do similar type of landscape generation? I would really like to look at the code and try to understand how these things are built...

quitit · on Nov 25, 2021

This might be a good start: https://thisbeachdoesnotexist.com

There are a whole series of "this<blank>doesnotexist" E.g. landscapes, faces, animals, etc.

thuccess129 · on Nov 25, 2021

Nvidia should visualize a periodic 24 hour 3D landscape sweep of the chatter on social media platform for metaverse dive through and interactive engagement.

monkeydust · on Nov 25, 2021

Wow. I have a good degree of respect for Nvidia but this should never have been released in the state it is. Whos the product manager for this?

AbuAssar · on Nov 25, 2021

Link to actual demo: http://gaugan.org/gaugan2/

stunt · on Nov 25, 2021

It seems that the web UI was generated by AI too, because it's really hard to make sense of it.

Havoc · on Nov 25, 2021

What license are the produced images under? I could see this being used for cheap stock photos

bradhensen · on Nov 25, 2021

Now, it's not an imagination if you can create a visual art using texts.

savant_penguin · on Nov 25, 2021

To whomever came up with this name, good job

Porygon · on Nov 25, 2021

This sounds quite ironic to me, since "Super-GAU" in German stands for a disaster beyond all expectations (usually meltdown of a nuclear reactor).

https://de.wikipedia.org/wiki/Super-GAU

Another unfortunate name: "Gauleiter" was a regional leader of the Nazi Party.

https://en.wikipedia.org/wiki/Gauleiter

Tade0 · on Nov 25, 2021

"gaugan" sounds like the word for "rag" in Polish.

togaen · on Nov 25, 2021

h2odragon · on Nov 25, 2021

I hear they have a miraculous new AI tool that magically determines your sexual desires then uses lasers to induce those feelings through your eyeballs with no contact necessary! Coincidentally demonstrated at this very same URL!

Even then, I don't think I'd care enough to fight through the layers of bullshit here.