Hacker News new | past | comments | ask | show | jobs | submit login
Using Stable Diffusion's img2img on some old Sierra titles (sciprogramming.com)
226 points by Karuma on Sept 5, 2022 | hide | past | favorite | 68 comments



Going to have to be the naysayer here. First, I'll say the simple fact that Stable Diffusion produces anything coherent is incredible. I'm blown away by the tech. However, my honest opinion of the results showcased in this article is not positive. Many of the result images contain bizarre distortions or dreamlike artifacts that severely disrupt the flow of the image. Especially the first one. It's clear that there's something like a person with long hair standing in front of a peak shrouded in clouds. But it only takes one or two moments to see the flaws in the output. I wonder if hyper parameter tuning would help.

Again, I think the results are impressive in their own right. But they seem impractical on account of the flaws in the details.


Weird until I read your comment I was blown away. Then I had another proper look at the first image and in many ways I had to turn off my brains amazing ‘upscaling’ ability.

My brain had upscaled that human like blob to a woman spinning around with a sword so her hair covered her face.

Looking closely. None of that is there really, just a suggestion of it. And that is enough.

The more I learn about vision and sight the less I’m sure that we see reality.


The key element missing from the generated images is understanding of form. In recognizing objects we're generally relying on shape first(so, strong outlines and silhouettes, blobs of color, and so forth). But only afterwards does our brain start to see forms in perspective.

When learning drawing I gradually got a sense of what is really going on is that I'm gaining a more conscious command of different shapes, just like when I learned to write letters; but instead of abstract marks, I'm learning the shape of hands, arms, etc - and from various perspectives. And so if I study a lot of the same shapes in a topic like anatomy or wildlife, I can replicate them from memory with fairly accurate proportions.

The difference between me and the AI, in its current form, is that the AI continues along the path of being an extremely smart shape recognizer and reproducer(as it should be, given some the first applications of the tech were to text recognition). So it can output a lot of details I can't(without lots of reference) and blend in stylistic ideas I'm unaware of. But I, while having a much more limited visual library, can mix in more details of the perspective, how anatomy and clothing work, and other kinds of logic. I can push the shapes to convey specific action and expression, design lighting situations and so on.

AI's ability to do it all in one step gives it a result that is very "savant", because it doesn't know what is and isn't a coherent image, but it has total mastery at making the shapes and applying rendering. Some of the things I've seen it do to prompts are wildly creative in interpretation as a result. It's a good tool.


Those art ML models indeed operate on wrong premise that the input and output images are entirely raster fields, but most of them should actually be considered curve fields with the curves internally extrapolated to complete color or texture filled 3D-shapes by what's known as gestalt principles*, volume estimation from shading etc. What should be raster is only filling textures.

The current approach creates huge limitation of input/output images being like 512x512 small and a whole load of texture-turning-to-shape and vice versa artifacts.

It could be possibly overcome with a paradigm shift, though.

* https://encyclopedia.pub/entry/history/show/399

* http://www.scholarpedia.org/article/Gestalt_principles


Artists have for the longest time used our brains ability to upscale. Many paintings, even ones that seem super detailed like those by James Gurney in his Dinotopia series, will have blobs in the background. Our brain will recognize based on silhouette and shape an extraordinary amount of detail that isn’t actually there. Detail such as the type of clothing and the action of a person. But if you look closer it’s a rectangular blob with a triangular blob within it to indicate clothes.

The difference between AI art and actual human art is the level of intention one can detect in it. When I look at human art I absolutely marvel at the cleverness of the artist to convey something that still looks like what I was imagining even when I look closer at it.

With AI art I look closer and realize that the blob presents more confusion the closer I look at it.

I’ve been telling anyone who will listen that AI art isn’t stealing much lunch when it comes to professional art. But it may very well be a powerful tool to artists to speed up their workflows and artists who refuse to use the tool stand a chance at being left behind the same way some artists got left behind in the illustration industry world once digital tools showed up.


> I’ve been telling anyone who will listen that AI art isn’t stealing much lunch when it comes to professional art.

Yet. These models have been out for only a matter of months. Just last year the state of the art was DALL-e v1, which is a toy in comparison[0] to DALL-e 2/imagen/SD.

Making predictions is perilous but it would be surprising to me if computers did not have fully super-human artistic ability in the next 5 years.

[0] https://openai.com/blog/dall-e/


I mean for sure. I’m not going to make any predictions about the future when the field is so young. It’s only related to the current generation. Honestly though, art isn’t the field to look at for seeing when the jump is coming. It’s in AI actually being able to recognize relationships between things. Basic stuff like looking at enough pictures of horses and recognizing what the leg is and how many legs a normal horse will have. Even stable diffusion which has some of the best generation will still give me 5 legs or two legs coming out of the same side. These kinds of images are a boon to artists who will need to do all the final corrections.

Relationships between things is complicated. Someone resting their face on a fence is going to have a huge number of effects on the deformations of the face, especially the eyes and hair depending on how they are resting. It’s not enough for AI to have seen enough pictures of faces on fences to be able to apply that in an image. It needs to understand what pressure and gravity is doing to the underlying structures. That’s how human artists study at least. It’s why they can take that lesson and apply it to learnings about how skin behaves depending on the age of a person. They aren’t copying. They are solving problems by thinking about muscles underneath and how they change depending on any number of factors.

None of this even touches on lighting and colours.

If there’s one prediction of the future I’m willing to make, it’s that until research progresses on teaching computers to apply actual knowledge, AI within the creative space will remain assistive instead of replacing.


We see reality in some sense - light of various wavelenghts reflects off physical objects and enters our eye. However, our processing and subjective interpretation of the input is what can be more subjective, because every person's brain is going to process such items differently based upon experience (especially early life experience, when our brains are the most plastic). Also, some people have sensory differences (such as color blindness) that can influence the processing of the light that enters our eye.

Objective reality exists, for some definition of "exists" - there is physical matter present, with properties enabling some or all wavelengths of light to reflect (and similar for other senses like hearing). However, if we viewed reality devoid of the subjective processing, we'd "see" everything, but key existential concepts such as object permanence would not be possible, as that requires our brain be able to process and recognize an object in order to identify what the object is in the first place, to even be able to remember what it is. Not entirely unlike the iterative process of modern machine learning.


>Looking closely. None of that is there really, just a suggestion of it.

In contrast to the photorealistic pixel art?


Exactly, these complaints are insane given that the source is 8 colour pixel art.


What people are doing right now is taking these wonky outputs and putting them back into the system after making some crude edits to it.

As you repeat that cycle things are going to get better, but it does require human labor at that point.

It's not a complete magic tool, but once you put a little bit of fixing effort in you can get a long way with very little.


Are you naysaying Stable Diffusion or the idea of generative models for art in general?

It's hard to look at what's happened in the last few months and not think of it as akin to the invention of the steam engine, but for art.

It's not perfect, as early machines had many flaws, were wildly inefficient, produced irregular output. But the innovation that followed created the industrial revolution.


> or the idea of generative models for art in general?

I'm not sure how you could read my comment and possibly come to this conclusion.


> but for art.

computer generated art *


It's interesting to see how much this has improved in the space of weeks, imagine what this will look like in a year, in 10 years, in 30 years.

This is the beginning of something pretty big, I think.


What's missing is the charm. In general, AI-generated visual content tends to have a very sterile look to it. The only image in said thread that looks even remotely decent to me is the one of a canyon/bridge. The rest are what I'd classify as voids.


>AI-generated visual content tends to have a very sterile look to it

I think it's because the prompts are often sterile as well. you have to add stuff like "matte painting" and "dream" just like in the first example in dreamstudio."very detailed landscape with xy" also works fine. Avoid prompts like "digital art" or "render".


I think the weirdness is most obvious in the cliff overlooking a beach and in the woman's impossible object hand.

All of the output here is cool and impressive for sure, but good art it is not.


The inaccuracy or weirdness of the resulting images has no bearing on how good or bad it is as art. Art has nothing to do with that. I would argue this is a shitty tech demo more than anything else.

I do not mean to discount the creator as it’s cool regardless, it just doesn’t really have anything to do with art. They’re literally just running some old computer images through a technology. That’s it.

There will probably be good art conceived of good artists that uses this style and these techniques at some point, though.


Very similar stuff was said about the camera. Vermeer even essentially used one to win the realism competition in painting (of course later becoming Ayn Rand's favorite painter since he was the most "objective").


This is the opening act—of course the tech has all sorts of issues.

What I’ve found more amazing than the tech is how rapidly and intensively a community has formed around Stable Diffusion and all the stuff they’re doing with it. I’m more confident than not these issues will get worked out.

This whole thing has been a breath of fresh air. We can run this stuff on high-end workstations and we’re not beholden to tech giants for interesting creative ML applications.

We’re about to see a thousand flowers bloom.


The inconsistent and often incoherent shadows and lighting effects are immediately annoying in all generated images if you've spent any time working with visual media.


Compared to low res Sierra graphics, what's been shown here is nothing short of astounding, if you ask me. Look at the skull! It's a masterpiece.


> Look at the skull! It's a masterpiece.

There was also a time when I (unironically) classified McDonald's food a delicacy (at some point when I was younger than ten).


It's still good and tens of millions enjoy it every day


I didnt ever classify that hot garbage as a delicacy.


> I didnt ever classify that hot garbage as a delicacy.

Never said you did, but there are, ahem, parallels.


Your palate is delayed here like it was when you thought McDonalds was a delicacy. If you develop it, let's revisit?

:D


Here's more, characters from old DOS games:

https://old.reddit.com/r/StableDiffusion/comments/x2wwxx/usi...

and

https://old.reddit.com/r/StableDiffusion/comments/x5qrje/usi...

I love what the AI did with the hot tub girl from Leisure Suit Larry!


I'm mostly impressed by the guy from Dune 2, he looks like a real person!


This is really timely - I just started to replay some of my favorite Sierra titles "Hero's Quest".

1 and 2 (unofficially) have VGA ports that updated the graphics. So, I may have to run some screen shots through to get an even newer backdrop.


Awesome, but it’s totally missing the atmosphere in the originals imo


I'd be interested to know the parameters used, especially prompt_strength

The correspondence to the original image is not especially high: the Leisure Suit Larry image, for example, enhances the original colours of the sea in a nicely realistic way, but all the foreground detail is essentially reinvented from scratch, including some very obvious omissions. In some of them the changes to perspective and more lifelike skull/canyons etc might improve on the original image, but it also flips even pretty basic stuff like which shoulder the woman's hand is placed on (and yes, once you look at that hand closely, the fingers SD has had to add in are all wrong...)

Ideally for this sort of use case you'd want high fidelity to the geometry of the original image but less fidelity to the palette (use more than 256 colours and naturalistic or artistic textures rather than lines and pixel dithering), but I'm not sure SD can manage that at the moment


The before and after are great.

There is an art to conveying a feeling with limited resources. That's what makes early computer game images (the good ones... because there were plenty of bad ones) so special.

The same could be said even more strongly for words. I'm not a writer and I don't remember everything, but someone famous once said something about eliminating everything non-essential from writing to make it better. That's what makes a writer really great.

Even so, the AI-upscaled versions of the original art are impressive.

I said it before on this topic, and I'll repeat it. We will someday (soonish) have games where the art is generated in real time, unique for each player, based on good inputs. And it will be awesome. Every play and every experience will be relatively unique, but most or all of the plays will be excellent. That's an exciting prospect.


Why stop at the art? We'll have unique storylines and characters as well. Possibly even unique gameplay, although at that point we would pretty much be close to some sort of AGI anyway.


Awesome!

I thought about this in a slightly different way: D2 remastered has a great switch back feature.

You could easily train a network with tons and tons of D2 old vs. New and just use it to Auto upscale /reimagine of old D2.

Image rendering a old D2 video into 4k D2 remastered style.


I found the intro of one of the kings quest games run through SD a few days ago, but can't seem to find it now. It wasn't that impressive since I guess there wasn't much fine tuning going on, but I liked the general idea. It had a few funny hiccups, but I'd have expected much more erratic behavior, because there is no information shared between frames (I guess). But maybe this concept can be improved upon?


Was mucking around with Sd and one of my first thoughts was making a "Where in Stable Diffusion is Carmen Sandiego"


Wonder if one day we can do realtime video conversion with this tool on a phone using the camera. We be mind blowing to see it live in action and changing your backyard in any style you like to see. Maybe even with VR. Ultra tripping experience. I guess we have to wait 5 to 10 years


Something like this, but with an automated flow?: https://twitter.com/karenxcheng/status/1564635828436885504


This is cool, but img2img likely couldn't be easily used to make images in such games, because every scene would look very different. The problem is that you can't maintain the exact look between images. It could be used for artistic ideas and raw material though.


Is there any service offering user-friendly access to open source DL models? Paid is fine.


If you have over 4MB VRAM you can run it locally. I've been experimenting recently and find that even with 10MB VRAM I can only get 256x256 resolution images. I have a Dockerfile I can share that packages up the install process and removes censorship if anyone is interested. I find the censoring is extremely conservative.


Huh, I've been using the docker container by cmd2 and been doing 512x512 just fine with 8GB. Are you on Windows by any chance?


Yes, windows with a RTX 2070 Super. The logs say the app is trying allocate just a few hundred MB more than what I have. I'm reasonably happy with 256x256 for now, just messing around.


Please do share. I did the same and have it but having trouble deploying it to a prod GPU server, still figuring it out.


Here you go: https://gist.github.com/gvbl/9231406c54e7c9fd37abdfa6c697fe5...

You can run it with a command like this (I'm on windows): docker run -it -v <model file path>:/stable-diffusion/models/ldm/stable-diffusion-v1/model.ckpt -v <outputs folder>:/stable-diffusion/outputs -v <inputs folder>:/stable-diffusion/inputs -v <cache folder>:/root/.cache --gpus all knightley python /stable-diffusion/scripts/txt2img.py --W 256 --H 256 --prompt "a horse wearing a top hat"

assuming you build the image and tag it "knightley"


http://beta.dreamstudio.ai is the official paid Stable Diffusion webUI.


I liked it until I noticed that it costs me $10 a day. I'm also not sure if they support img2img. I have used huggingface and then replicate for it(NSFW filter is very annoying).


Unless you’re looking at something else, it’s not per day, it’s a credit system that amounts to about 1 cent per image. How many images do you plan to generate a day?


yes, it's per image but for some reason I easily rushed through the 1000 generations in one day.


Well you’ve gotta either run it locally or pay for it. The GPUs this stuff runs on are too expensive to offer unlimited use for free.


I'm not against paying but it should be somewhat raisonnable. to put things into perspective: Colab pro only costs about $10/month and you will probably be able to generate at the same speed.


For uses like that Colab costs Google more than they earn. It would make no sense to base a business around that model.


I did too, but I bought another 1000 generations and haven't run out yet. Many days I don't use it at all. I like it better than a monthly subscription like MidJourney.


Just use Google Colab, it's free indefinitely because Google is paying for it.


Midjourney uses Stable Diffusion now too in their beta version.


The stable diffusion repo itself is free, and you can get the model from hugging face, also for free. Runs on my 1070 easily.


Can someone please do this for some Lucasarts classics?


They already did for one :)


Yeah, a little peeved that they called Maniac Mansion a Sierra game


Yeah, it’s amazing that things like Stable Diffusion can make coherent looking images. But why? Why convert iconic EGA and VGA *art* into algorithmic representations? To me, it doesn’t improve on the originals in any way. I would never play a game with the graphics replaced by these. To me, it is the constraints of older technologies that led artists to create imaginative works that transcended the materials available (big pixels with a limited palette) and helped create worlds where your own imagination could run wild. Replacing these images with photo realistic Thomas Kinkade-esque treacle is an abomination.


Why do people always ask “why?” on these things? Not everything has to be “useful” or have commercial value. The original games aren’t going anywhere. People just enjoy doing this kind of stuff, pushing the technology to make interesting things.


I’m not criticizing it because it’s not useful. I’m criticizing it because it is not art, and it is anti-humanist.


"""At the 1983 Academy Awards, Oscar voters declined to nominate [Tron’s] pioneering special effects. Lisberger said it was because they felt using computers as an animation tool was cheating.""" - https://www.moviefone.com/2017/07/08/19-things-you-never-kne...


Who said it's art? It's new and cool and fascinating. That's all.

Contrary to what you wrote, these AI-generated ones aren't replacing the originals. The originals are still there.

Sorry, but I think you're reading way too much into this... it's a fun toy, it's not replacing game art and it's not an "abomination". It's cool and new. That's all.


Of course it is art. The thing that makes it art is the human thought process and intention, regardless of how the final result turns out.


Oh, and the pictures look like shit.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: