Hacker News new | past | comments | ask | show | jobs | submit login
Search over 5M+ Stable Diffusion images and prompts (lexica.art)
189 points by headalgorithm on Aug 26, 2022 | hide | past | favorite | 57 comments



Stable Diffusion was finally enough to push me to install drivers for the NVIDIA 3060 on my laptop which has sat completely unused (never powered on once I figured out how to not power it on!) since I got it (I’d have preferred no dGPU at the time, but wanted other features of the laptop that are just about never sold without a fancy dGPU for some reason). Pretty hefty requirements, as a casual layman, even though I know this is smaller and more accessible than just about everything in the past. I think I ended up at around 9GB downloaded (which will cost me almost $2 in concrete terms) and 23GB of disk space used (including things like nvidia-dkms, nvidia-utils, cuda and python-pytorch-opt-cuda; all the relevant Arch packages came to about 14GB).

I’m using https://github.com/basujindal/stable-diffusion to run it since I only have 6GB of VRAM.

I’m having fun. But I haven’t had much luck getting it to draw the quick brown fox jumping over the lazy dog; a few steps in there are often the shapes of two animals, but it is consistently reduced to just a fox after a bit more. Extensions to the prompt (like reminding it that there are two animals, and trying to separate the two concepts) can improve it a bit, but it still tends to forget there are two animals, or if it gets two, to draw two foxes, or a dog–fox hybrid and a lazy fox. I imagine I could vastly improve my results with img2img and giving it a basic sketch with placeholders for two distinct animals.

It also has a surprisingly poor idea of what an echidna is.


Is it okay to ask what your situation is that 9GB cost $2?


I live in Australia in a rural area where the best internet connection I can get is on the Optus cellular network (I have clear line of sight to a tower 400m away used by fewer than 200 people; it’s actually the best non-commercial supply I’ve ever had for both speed and reliability, typically around 45/15Mbps five years ago when I moved to the area and with less than one observed downtime of less than one hour per annum, though where available NBN fibre should generally be able to be better these days). Actually, this is cheaper than it would often be, because it depends on what supplier I’m with at the time, which often depends on available introductory offers. My current arrangement amounts to 20¢/GB, the cheapest I’ve ever had (it’s interesting looking back even four years, when the best available was $0.90–$1.10/GB). When I finish the current one in a couple of months, it looks like I’ll switch again and be back to the ballpark I’ve had before, around 30¢/GB. Skip introductory offers, and you’re mostly at $0.60–$1.00/GB, or circles.life a bit lower but I refuse to use them again because of bad service and shameless illegal conduct that they refuse to acknowledge or do anything about (like sending third-party advertising text messages from CirclesLife, which has been illegal in the absence of explicit consent since the Spam Act 2003).


Being Australian?


<rant> i.e. Being a Murdoch cash cow milked by sycophantic weasels known as the Liberal Party.

Onion chump Abbott and his frenemy Turncoat being the main beneficiaries of forcing Murdoch's and Telstras decrepit copper/coax quagmire into what was originally designed as a full FttP rollout, already in progress 5% completed when they came into power and promptly halted everything to please their puppetmaster.

Next they promised to halve the costs by delivering a slow copper-throttled NBN. Except they blew out the budget by quadruple and still climbing... already over double the costa of the originally planned FttP rollout, so Murdoch got richer, and we get only 5% the speed we should have gotten... and now also pay double the monthly fees we would have had if those weasels had just kept their greasy pork barreling mitts off our nation building tax dollars </rant>


Mobile data usage costs?


I've been using this recently, it works great if you don't know what you want to make.

Also look into CLIP Interrogator, it does image to text basically, turning an image you like into what its prompt could be. However, it won't provide everything for you, just the main description of content.


Thanks for the interrogator tip, It's so cool that it works in reverse.


Feels like shiny concept art for games is facing a similar moment as portrait painting in late 19th century. Why pay someone to paint in this generic commercial style when you can get a meaningful automatic result at the push of a button?

Most other styles of illustration seem safer for the moment because they rely more on the illustrator's personality. (I'm not talking about the kind of stuff you buy on Fiverr, but professional designers who mainly get work through their networks.)


What styles of illustration would you say are safe?


The Jack Chick Tract somehow makes me want to use regex to parse html:

https://lexica.art/prompt/64a384bd-d1b2-4f79-8921-f71737c70d...


The way it mangles faces is actually super creepy.

These shots from an exit-less, claustrophobic NYC subway with mangled faceless things is the stuff of nightmares:

https://lexica.art/?q=new+york+subway


NYC in the ‘80s was a hard time. This is pretty much in line with my recollections…


I feel like there needs to be a model that fixes faces to clean this up. Humans are so attuned to faces that I can imagine it would take a specialized model to render convincing faces. Maybe there could be a layer to identify and occlude existing pseudo-faces generated by Stable Diffusion and another model to populate the occlusion.


Many are doing exactly this with Stable Diffusion + GFPGAN (https://github.com/TencentARC/GFPGAN) as a post-processing model.


This is already done e.g. in Majesty Diffusion. You run another amount of iterations using a face Restauration GAN


And hands. Try searching for ‘hand holding apple’ for instance


I find it hilarious how many of these prompts are using "unreal engine 5" to get a good image.

There's a lot (or honestly maybe a small amount) of work to be done to improve these prompt interfaces. Raw projections of your queries into the embedding space is honestly pretty dumb. Like, it'd be nice if we could start by settling the embeddings into images that are "good".


There's a rating feature on the website that let's you rate the results. It's greyed out for me so I'm not sure if it's a timed feature or a premium thing, but it's there.


Not this website. The prompt interface to the embedding model.

There's no reason one should have to spam "good" sounding phrases like "high quality" into the prompt to get a good image. Direct embeddings of the prompt are stupid.


What people call "prompt engineering" is just knowing to do that.


Yeah and what I'm saying is that this rapidly rising "skill" is just nonsense. This is not a reasonable way to interact with the embedding space. We will not be doing prompt engineering, hopefully within the "near" future.

This is like copy pasting by highlighting, clicking 'edit' and scrolling down to copy/paste all with the mouse.


No, it is not. Making these models generate interesting images requires you to learn something that is almost similar to a new kind of language.

As soon as you start to automate this process, for example by adding some default attributes like "4k, 8k, hd" to every prompt, you introduce a huge amount of bias to the output and lose the freedom to get anything outside of those specifiers.

Sure, future iterations will have a better understanding of language input. But knowing exactly how to phrase your prompts will always be a skill that requires eloquent writing to get to the more interesting and appropriate results.

In part that's because using more esoteric language will automatically connect you to a specific subselection of the source material, that was described using those more uncommon words in the training of the model. Having an extensive vocabulary and knowing how to wield it is actually a huge boon in this particular field.

"Unreal Engine 5" is just a quick shortcut to output that is detailed, clean, often futuristic and usually looks impressive. But you can go a lot further, for example by manually subtracting weights. Teasing MidJourney with this prompt was entertaining:

clear view of a dense forest::5 plants::-.5 tree::-.5 trees::-.5 foliage::-.5 leaves::-.5 shrubs::-.5 bushes::-.5 blur::-.5 mist::-.5 winter::-.5

Btw, is anybody working on a "language florifier" model yet? I imagine writers would be interested. "Rewrite this story with more emotion and in the style of Kurt Vonnegut, cyberpunk".


Yes, it is stupid. Adding 4k to every prompt introduces bias. Yes. That doesn't mean learning the ins and outs of each phrases bias is a reasonable idea. It's also not guaranteed to be a constant effect. Its great that you can become more skilled at prompts, that doesn't make it a good interaction model. The interface is a tool and tools are important. That there are people who are great at typewriters doesn't mean they're all that reasonable in the age of computers and word processors.

> But you can go a lot further, for example by manually subtracting weights. Teasing MidJourney with this prompt was entertaining:

This is an example of an improvement from basic prompts. It's still far from a good model. "Guess and check" is basically the worst UX one can create for a design process.

One should be able to specify content separately from style, and layer in stylistic choices in a clear hierarchy. Text is a good model for specifying content. It's a pretty shitty way to specify style. Style is something we could likely convey visually and with pallet reference points.


Do dall-e2 and stable diffusion models regularly get retrained? If so, as they get retrained using the ouputs of the models scraped from sites like this will we see some sort of mode collapse?

Is it reasonable for hobbyists to retrain these models with reduced or custom image sets or would that require a lot of money in compute?


For anyone else who has no idea what they're looking at, from https://duckduckgo.com/?q=stable+diffusion :

> Stable Diffusion is a text-to-image model [...] It is a breakthrough in speed and quality meaning that it can run on consumer GPUs.



What is the license for those pictures? Neither lexica nor openart talk about it.


Same license treatment as the training data. Edit: Nobody cares about licenses since they don't want to be asked about how they licensed the training data.



I’m thinking if Greg Rutkowski can get himself removed from the AI-prompts half of this art will disappear overnight.


Here is his Twitter:

https://twitter.com/GrzegorzRutko14/with_replies

But he doesn't seem to comment on ai art.


He has given multiple live presentations on the Midjourney Discord server. He's quite happy that his work is helping lots of people make great new art.


Page with his work is here:

https://www.artstation.com/Rutkowski


He seems very popular in prompts. But how is he? He doesn't even have a wikipedia page.


Isn't the data set lifted from Artstation for a lot of this stuff?

https://www.artstation.com/Rutkowski He's a fantasy concept artist, not a like wikipedia page having artist.


I think its just a meme from users seeing his name in other prompts. In many cases it's used in combinations with other artists whose styles aren't at all similar, it doesn't make sense other than users spamming it to get "good" results?


I was just wondering this, does including his name have the same effect as the trending on art station stuff ?


Where is this coming from? Everything people submit is public?


> Hyperrealistic mixed media image of matt damon bald head resembles !!uncircumcised penis!!, stunning 3d render inspired art by istván sándorfi and greg rutkowski, perfect facial symmetry, realistic, highly detailed attributes and atmosphere, dim volumetric cinematic lighting, 8k octane extremely hyper-detailed render, post-processing, masterpiece

Why the penis prompt?



Because the title doesn't give the warning- the website is NSFW!


For the curious: it's not intentionally NSFW but the filters aren't very good (if there are any) so you do see occasionally inappropriate images.


Try "tatters" for some truly horrifying imagery. That nsfw filter needs some work.


Is it possible to economically/efficiently run this on a 14" MBP? Or do you need an nvidia graphics card to actually run this thing?


Takes about 10-20 minutes to create 5 images on my 16“ M1 Pro 32 GiB MacBook Pro. It takes around 1 minute on my desktop system using a 6 GiB VRAM RTX 3070 Ti.


meta: this site is causing all sorts of graphical glitches when scrolling on a pixel 6 android 13 Firefox. There are flickering boxes filled with pixel junk between each item and in the header


Could you take a screenshot or better yet — a screen recording? I’m not really sure what might be causing that.


Here's a screenshot https://imgur.com/a/gpStPID


Ah, that might be because of the backdrop-blur on the navigation bar. Weird that it’s causing graphical issues on your phone.


And yesterday we had the post on OpenArt: https://news.ycombinator.com/item?id=32586439


How do you submit new images and prompts?




Are there diffusion models trained on face datasets only?


web demo for stable diffusion: https://huggingface.co/spaces/stabilityai/stable-diffusion

can also run it in colab (includes img2img): https://colab.research.google.com/drive/1NfgqublyT_MWtR5Csmr...

and web ui for stable diffusion runs locally (includes gfpgan/realesrgan and alot of other features): https://github.com/hlky/stable-diffusion-webui


Dupe:

Discover stable diffusion prompts with Lexica (lexica.art) https://news.ycombinator.com/item?id=32594107

Seems the same idea as openart: OpenArt: “Pinterest” for Dalle-2 images and prompts (openart.ai) https://news.ycombinator.com/item?id=32586439




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: