Stable Diffusion was finally enough to push me to install drivers for the NVIDIA 3060 on my laptop which has sat completely unused (never powered on once I figured out how to not power it on!) since I got it (I’d have preferred no dGPU at the time, but wanted other features of the laptop that are just about never sold without a fancy dGPU for some reason). Pretty hefty requirements, as a casual layman, even though I know this is smaller and more accessible than just about everything in the past. I think I ended up at around 9GB downloaded (which will cost me almost $2 in concrete terms) and 23GB of disk space used (including things like nvidia-dkms, nvidia-utils, cuda and python-pytorch-opt-cuda; all the relevant Arch packages came to about 14GB).
I’m having fun. But I haven’t had much luck getting it to draw the quick brown fox jumping over the lazy dog; a few steps in there are often the shapes of two animals, but it is consistently reduced to just a fox after a bit more. Extensions to the prompt (like reminding it that there are two animals, and trying to separate the two concepts) can improve it a bit, but it still tends to forget there are two animals, or if it gets two, to draw two foxes, or a dog–fox hybrid and a lazy fox. I imagine I could vastly improve my results with img2img and giving it a basic sketch with placeholders for two distinct animals.
It also has a surprisingly poor idea of what an echidna is.
I live in Australia in a rural area where the best internet connection I can get is on the Optus cellular network (I have clear line of sight to a tower 400m away used by fewer than 200 people; it’s actually the best non-commercial supply I’ve ever had for both speed and reliability, typically around 45/15Mbps five years ago when I moved to the area and with less than one observed downtime of less than one hour per annum, though where available NBN fibre should generally be able to be better these days). Actually, this is cheaper than it would often be, because it depends on what supplier I’m with at the time, which often depends on available introductory offers. My current arrangement amounts to 20¢/GB, the cheapest I’ve ever had (it’s interesting looking back even four years, when the best available was $0.90–$1.10/GB). When I finish the current one in a couple of months, it looks like I’ll switch again and be back to the ballpark I’ve had before, around 30¢/GB. Skip introductory offers, and you’re mostly at $0.60–$1.00/GB, or circles.life a bit lower but I refuse to use them again because of bad service and shameless illegal conduct that they refuse to acknowledge or do anything about (like sending third-party advertising text messages from CirclesLife, which has been illegal in the absence of explicit consent since the Spam Act 2003).
<rant>
i.e. Being a Murdoch cash cow milked by sycophantic weasels known as the Liberal Party.
Onion chump Abbott and his frenemy Turncoat being the main beneficiaries of forcing Murdoch's and Telstras decrepit copper/coax quagmire into what was originally designed as a full FttP rollout, already in progress 5% completed when they came into power and promptly halted everything to please their puppetmaster.
Next they promised to halve the costs by delivering a slow copper-throttled NBN. Except they blew out the budget by quadruple and still climbing... already over double the costa of the originally planned FttP rollout, so Murdoch got richer, and we get only 5% the speed we should have gotten... and now also pay double the monthly fees we would have had if those weasels had just kept their greasy pork barreling mitts off our nation building tax dollars </rant>
I've been using this recently, it works great if you don't know what you want to make.
Also look into CLIP Interrogator, it does image to text basically, turning an image you like into what its prompt could be. However, it won't provide everything for you, just the main description of content.
Feels like shiny concept art for games is facing a similar moment as portrait painting in late 19th century. Why pay someone to paint in this generic commercial style when you can get a meaningful automatic result at the push of a button?
Most other styles of illustration seem safer for the moment because they rely more on the illustrator's personality. (I'm not talking about the kind of stuff you buy on Fiverr, but professional designers who mainly get work through their networks.)
I feel like there needs to be a model that fixes faces to clean this up. Humans are so attuned to faces that I can imagine it would take a specialized model to render convincing faces. Maybe there could be a layer to identify and occlude existing pseudo-faces generated by Stable Diffusion and another model to populate the occlusion.
I find it hilarious how many of these prompts are using "unreal engine 5" to get a good image.
There's a lot (or honestly maybe a small amount) of work to be done to improve these prompt interfaces. Raw projections of your queries into the embedding space is honestly pretty dumb. Like, it'd be nice if we could start by settling the embeddings into images that are "good".
There's a rating feature on the website that let's you rate the results. It's greyed out for me so I'm not sure if it's a timed feature or a premium thing, but it's there.
Not this website. The prompt interface to the embedding model.
There's no reason one should have to spam "good" sounding phrases like "high quality" into the prompt to get a good image. Direct embeddings of the prompt are stupid.
Yeah and what I'm saying is that this rapidly rising "skill" is just nonsense. This is not a reasonable way to interact with the embedding space. We will not be doing prompt engineering, hopefully within the "near" future.
This is like copy pasting by highlighting, clicking 'edit' and scrolling down to copy/paste all with the mouse.
No, it is not. Making these models generate interesting images requires you to learn something that is almost similar to a new kind of language.
As soon as you start to automate this process, for example by adding some default attributes like "4k, 8k, hd" to every prompt, you introduce a huge amount of bias to the output and lose the freedom to get anything outside of those specifiers.
Sure, future iterations will have a better understanding of language input. But knowing exactly how to phrase your prompts will always be a skill that requires eloquent writing to get to the more interesting and appropriate results.
In part that's because using more esoteric language will automatically connect you to a specific subselection of the source material, that was described using those more uncommon words in the training of the model. Having an extensive vocabulary and knowing how to wield it is actually a huge boon in this particular field.
"Unreal Engine 5" is just a quick shortcut to output that is detailed, clean, often futuristic and usually looks impressive. But you can go a lot further, for example by manually subtracting weights. Teasing MidJourney with this prompt was entertaining:
clear view of a dense forest::5 plants::-.5 tree::-.5 trees::-.5 foliage::-.5 leaves::-.5 shrubs::-.5 bushes::-.5 blur::-.5 mist::-.5 winter::-.5
Btw, is anybody working on a "language florifier" model yet? I imagine writers would be interested. "Rewrite this story with more emotion and in the style of Kurt Vonnegut, cyberpunk".
Yes, it is stupid. Adding 4k to every prompt introduces bias. Yes. That doesn't mean learning the ins and outs of each phrases bias is a reasonable idea. It's also not guaranteed to be a constant effect. Its great that you can become more skilled at prompts, that doesn't make it a good interaction model. The interface is a tool and tools are important. That there are people who are great at typewriters doesn't mean they're all that reasonable in the age of computers and word processors.
> But you can go a lot further, for example by manually subtracting weights. Teasing MidJourney with this prompt was entertaining:
This is an example of an improvement from basic prompts. It's still far from a good model. "Guess and check" is basically the worst UX one can create for a design process.
One should be able to specify content separately from style, and layer in stylistic choices in a clear hierarchy. Text is a good model for specifying content. It's a pretty shitty way to specify style. Style is something we could likely convey visually and with pallet reference points.
Do dall-e2 and stable diffusion models regularly get retrained? If so, as they get retrained using the ouputs of the models scraped from sites like this will we see some sort of mode collapse?
Is it reasonable for hobbyists to retrain these models with reduced or custom image sets or would that require a lot of money in compute?
Same license treatment as the training data. Edit: Nobody cares about licenses since they don't want to be asked about how they licensed the training data.
He has given multiple live presentations on the Midjourney Discord server. He's quite happy that his work is helping lots of people make great new art.
I think its just a meme from users seeing his name in other prompts. In many cases it's used in combinations with other artists whose styles aren't at all similar, it doesn't make sense other than users spamming it to get "good" results?
> Hyperrealistic mixed media image of matt damon bald head resembles !!uncircumcised penis!!, stunning 3d render inspired art by istván sándorfi and greg rutkowski, perfect facial symmetry, realistic, highly detailed attributes and atmosphere, dim volumetric cinematic lighting, 8k octane extremely hyper-detailed render, post-processing, masterpiece
Takes about 10-20 minutes to create 5 images on my 16“ M1 Pro 32 GiB MacBook Pro. It takes around 1 minute on my desktop system using a 6 GiB VRAM RTX 3070 Ti.
meta: this site is causing all sorts of graphical glitches when scrolling on a pixel 6 android 13 Firefox. There are flickering boxes filled with pixel junk between each item and in the header
I’m using https://github.com/basujindal/stable-diffusion to run it since I only have 6GB of VRAM.
I’m having fun. But I haven’t had much luck getting it to draw the quick brown fox jumping over the lazy dog; a few steps in there are often the shapes of two animals, but it is consistently reduced to just a fox after a bit more. Extensions to the prompt (like reminding it that there are two animals, and trying to separate the two concepts) can improve it a bit, but it still tends to forget there are two animals, or if it gets two, to draw two foxes, or a dog–fox hybrid and a lazy fox. I imagine I could vastly improve my results with img2img and giving it a basic sketch with placeholders for two distinct animals.
It also has a surprisingly poor idea of what an echidna is.