The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.
I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.
This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."
If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.
Your comment is a really good example of how startups can fail.
Here we have a web app that does something very clever, built by great devs, that fails because it doesn't work the way users expect it to.
So many highly intelligent, super technical founders have a belief that their amazing tech will sell itself, so they don't put the time and effort necessary in to design, UX, or marketing, and they fail because their UI didn't make it clear what to do. It probably works brilliantly when they demo to people, with the authors driving it or helping users get the most from it. But when users have to use it without that help... It fails.
The lesson for founders here is simple - test your UX, because you won't get a second chance with most customers.
The app I used worked nothing like the slick demo in the video. In fact, the UX and UI are some of the worst I have used in recent memory.
No matter what some backend folks believe, there will always need to be highly skilled front end engineers who can put together web apps in a way where the interface just 'gets out of the way' so you can focus on the actual utility.
"The app I used worked nothing like the slick demo in the video."
This point is very important and I hope to not do the same misstake.
Because I also watched the video, saw what was happening there, looked nice - but trying it for myself absolutely did not work as expected.
In other words, if there is a simple video with features shown - then trying it out needs to be as simple as the video, or it causes lots of frustration.
First time users do not want to deal with setup configs etc. first. This is something you want later.
After a sibling commenter very patiently pointed out that I was holding it wrong - I would encourage taking another look at this project if only to try out the Painting mode.
Had the press release/tutorial emphasized this angle instead of the wonky text-to-image thing, my initial impression would have been a lot better. This is genuinely a really neat feature. All my UI and discoverability criticism stands though!
Type in "cat" into the text box and see a wonderful variety of landscapes that look like they are straight from the surface of some sort of plane of hell. Or furry flowers. Or fruit with eyes. I was expecting "lion in the Savannah" type pics, not "a visualisation of my DND group's "baleful polymorph" spell"...
Yes, whatever I type I get some abomination in the general settings I asked for (hills, mountains, ocean.) And I agree that the UX is also horrible. I want the webapp they are showing in the video, not the one they have online.
What if this is done on purpose? If they make the UX too easy then since this tech is so impressive it will be shared across the general population and quickly the service will get overloaded. This way only those that truly have the patience to fiddle with controls have the ability to get it working.
I have no idea what that means, but the fact that it both told me to enter a text prompt and actually let me do it while not being in whatever magical mode it should have been in in order to actually use the text prompt is another point that can be added to my above rant.
Alright, I've uttered the incantation for it to do the thing. I still don't get it. [0] https://imgur.com/4zbaiH0
I also tried another example prompt, which bared a striking similarity to the previous result. I don't know if it's persisting the result (It shouldn't - I didn't click the re-use image button), but the strange life-raft looking artifact is very persistent. [1] https://imgur.com/KTCM4xH
Yeah, the text-to-image seems to be highly dependent on whether the generator knows how to generate the specific objects the text model thinks should be in the image. I got much more consistent results using the semantic segmentation drawing as input:
Yes but the selection is lost if you press enter out of habit. I had the same frustration until I realized what happened which turned that into anger :)
Yeah Hackernews is almost universally critical. Even his comment is needlessly critical. nVidia isn't poising this as a polished consumer app. It's a demo! The URL is https://www.nvidia.com/en-us/research/ai-demos/
For a research demo the UI is extremely polished. It even has an interactive tutorial!! Some people...
Your parent comment didn't mean that this was a good UI for human-computer interaction research, he meant that it was a good UI considering it was probably just a quick demo built by the scientists who did the research to showcase their work.
I work in a research group and most of my colleagues would have a hard making an interface like this one, no matter how confusing. They are ML/DL scientists and usually have had absolutely 0 exposure to frontend.
Yes not every researcher is good in frontend development. But my research group wouldn't promote such a broken UI until someone fix it at least a little bit.
<<If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.
Let me put my tinfoil on. Maybe the idea is to normalize and make it less scary. Look at the press releases. The general response was not a horrified 'omg, this tech has gone berserk', but 'oh, its a benign lil biddy'.
A barrage of obtrusive prompts. When it wanted me to enter a phrase I typed "sunset over a field" and got a picture of the Milky Way galaxy also, followed by more prompts.
I then closed the tab and came here to read the comments.
I'll say something constructive. While the text to image functionality is terrible, the segmentation based on arbitrarily drawn images is kind of fun. They should have just gone with the latter.
This is a demo piece not a paid app. Nvidia has zero interest in selling web apps. This is just a marketing stunt to show off their capabilities so they can sell hardware.
I am just getting very weird results that don't look at all like the one in the demo video. Here for example the image it gave me for "car in front of house" https://i.imgur.com/QdtrtCR.png
It's not just a matter of showcasing the best results. It's a matter of night and day difference between normal output and the one showcased. They are not even remotely close. I actually wonder why they decided to release this to the public in its current form. I was very impressed by the noise suppression app they released so I expected something that delivers decent results.
I wonder if in a decade or so, large tech companies will unseat Disney, Warner Brothers etc as creators of animation movies.
While the results on Nvidia's website aren't too impressive, if you look at the history of animated movies [1], one can see how trivial and simplistic the art and animation was.
Having had some experience doing some research on GANs at university, I know them to be very powerful. What's very important to note is that the images generated my the model are truly "novel" i.e. completely fictitious. The images generated may be biased to some of the training data such as color and texture of the water and rocks for example, but every image is a fantasy of the model. The only way the model can generate such realistic images is because it has a very good abstract internal representation of what oceans, waves, rocks are.
Back at university, I pitched the idea to my professor of using GANs for generating "novel" images in real time while parents would read bed time stories to children. I didn't get very far. Glad to see some real progress in that direction.
Close, but it will actually turn out to be a very small company (possibly in less than a decade).
Hollywood has little awareness of just how much danger the legacy version of their industry is in.
ML generated assets are slowly creeping towards reality and at the same time doing 3D dev is 100-1000x easier than it was just a few years ago. It's now possible to do for free in many cases as well.
1. Just close the tutorial
2. Scroll down to the ToS checkbox and check that
3. On top in the "input utilization" row make sure only text is checked
4. Enter your text (use only landscape terms)
5. Press the arrow to the right inside a rectangle button located below the text input.
6. Maybe zoom out a bit because the result will be the image on the right which for me was out of view by default. Had to zoom to 50% to see the whole UI.
Also definitely don't hit enter like you do in every other form, because that seems to clear your input, swap the "input utilisation" back to only segmentation and sometimes also unchecks the ToS checkbox.
I finally got through to the demo through three links and it's so busted in so many ways for me that I give up. Maybe it's stupid to try with my old netbook but I don't get any indication of whether I need a fancy graphics card for it to work or if it's running on my end. Anyway
-the screen zooms around disorienting for the tutorial and I get to congratulations you made your first image - there's nothing there.
-Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry
-The whole site is a fixed width which is wider than my screen
- A red alert check box at the bottom confuses me about whether that's why it's not working
etc
> Exiting the tutorial, checking 'text' instead of 'segmentation' just immediately switches back after entry
That tripped me up too. After typing in your text, don't hit enter or anything similar, just click or tap the button with the right arrow (or anything to the right of that button, effects vary).
I entered "mountains" and "mountains and lake" and only got pictures of what looked like blurred galaxies and stars. Clicking any of the style buttons got me colored/tinted pictures of stars. Is it broken?
The comments are overwhelmingly critical of the user interface, which is undoubtedly the weak part of this release, but I was still able to get some very impressive results.
I have found the best results come from uploading an image, then using the demo tools to get a segmentation map and sketch lines, then editing those as you desire. Changing the styling at the end also makes a big difference!
After trying 30 minutes I kind out understood some stuff. But the ui is legit anxiety inducer. I hope they can fix the ui to make it fun. Currently felt like using 80's DOS graphic software with so much manual input.
Trying to do everything that the tutorial and other commenters have suggested, but when I click the arrow button it just waits for a few moments and nothing is generated. Am I missing something?
Are there any open-source models that can do similar type of landscape generation? I would really like to look at the code and try to understand how these things are built...
Nvidia should visualize a periodic 24 hour 3D landscape sweep of the chatter on social media platform for metaverse dive through and interactive engagement.
I hear they have a miraculous new AI tool that magically determines your sexual desires then uses lasers to induce those feelings through your eyeballs with no contact necessary! Coincidentally demonstrated at this very same URL!
Even then, I don't think I'd care enough to fight through the layers of bullshit here.
I clicked through to the demo site ( http://gaugan.org/gaugan2/ ) and it was horrible.
The interface is clunky, slow, and confusing. I actually had to zoom out in my browser to see the whole thing. Had to click through a non-HTTPs warning. The onboarding tutorial is pretty bad.
I got a generic picture of the milky way for any prompt I tried ("rocks", "trees"). If you press Enter in the prompt field it refreshes the page.
This feels like a hackathon front-end hooked up to an intro to PyTorch webservice. It's only neat because, unlike the other 20 copies of this same project I've seen, it was the only one that didn't immediately throw its hands up and say "server overloaded, please wait."
If I'm meant to be impressed or fearful of "big data deepfake AI," this isn't it.
[0] https://imgur.com/BNLDt6A