- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.
- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..
- Will be released open, the preview is to improve its quality & safety just like og stable diffusion
- It will launch with full ecosystem of tools
- It's a new base taking advantage of latest hardware & comes in all sizes
- Enables video, 3D & more..
- Need moar GPUs..
- More technical details soon
>Can we create videos similar like sora
Given enough GPUs and good data yes.
>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?
Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
(adding some later replies)
>awesome. I assume these aren't heavily cherry picked seeds?
No this is all one generation. With DPO, refinement, further improvement should get better.
>Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?
yeah see
@Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...
>Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.
Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.
>Nice. Is it an open-source / open-parameters / open-data model?
Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.
>Cool!!! What do you mean by good data? Can it directly output videos?
If we trained it on video yes, it is very much like the arch of sora.
Stability has to make money somehow. By releasing an 8B parameter model, they’re encouraging people to use their paid API for inference. It’s not a terrible business decision. And hobbyists can play with the smaller models, which with some refining will probably be just fine for most non-professional use cases.
Oh they’ll never let you pay for porn generation. But they will happily entertain having you pay for quality commercial images that are basically a replacement for the entire graphic design industry.
Don't people quantize SD down to 8 bits? I understand plenty of people don't have 8GB of VRAM (and I suppose you need some extra for supplemental data, so maybe 10GB?). But that's still well within the realm of consumer hardware capabilities.
I am going to look at quantization for 8b. But also, these are transformers, so variety of merging / Frankenstein-tune is possible. For example, you can use 8b model to populate the KV cache (which computes once, so can load from slower devices, such as RAM / SSD) and use 800M model for diffusion by replicating weights to match layers of the 8b model.
Do you know how the memory demands compare to LLMs at the same number of parameters? For example, Mistral 7B quantized to 4 bits works very well on an 8GB card, though there isn’t room for long context.
Nvidia is making way too much money keeping cards with lots of memory exclusive to server GPUs they sell with insanely high margins.
AMD still suffers from limited resources and doesn't seem willing to spend too much chasing a market that might just be a temporary hype, Google's TPUs are a pain to use and seem to have stalled out, and Intel lacks commitment, and even their products that went roughly in that direction aren't a great match for neural networks because of their philosophy of having fewer more complex cores.
MPS is promising and the memory bandwidth is definitely there, but stable diffusion performance on Apple Silicon remains terribly poor compared with consumer Nvidia cards (in my humble opinion). Perhaps this is partly because so many bits of the SD ecosystem are tied to Nvidia primitives.
Image diffusion models tend to have relatively low memory requirements compared to LLMs (and don’t benefit from batching), so having access to 128 GB of unified memory is kinda pointless.
Last I saw they performed really poorly, like lower single digits t/s. Don't get me wrong they're probably a decent value for experimenting with it, but is flat out pathetic compared to an A100 or H100. And I think useless for training?
You can run a 180B model like Falcon Q4 around 4-5tk/s, a 120B model like Goliath Q4 at around 6-10tk/s, and 70B Q4 around 8-12tk/s and smaller models much quicker, but it really depends on the context size, model architecture and other settings. A A100 or H100 is obviously going to be a lot faster but it costs significantly more taking its supporting requirements into account and can’t be run on a light, battery powered laptop etc…
I kind of wonder if gaming will start incorporating AI stuff. What if instead of generating a stable diffusion image, you could generate levels and monsters
GPU memory is all about bandwidth, not latency. DDR5 can do 4-8 GT/s x 64-bit bus per DIMM, so maxing 128 GB/s with a dual memory controller, 512 GB/s with 8x memory controllers on server chips, but GDDR6 can run at twice the frequency and has a memory bus ~5x as wide in the 4090, so you get an order of magnitude bump in throughput, so nearly 1 TB/s on a consumer product. Datacenter GPUs (e.g. A100) with HBM2e doubles that to 2 TB/s
I've never tried it, but in Windows you can have CUDA apps fall back to system ram when GPU vram is exhausted. You could slap 128gb in your rig with a 4070. I'm sure performance falls off a cliff, but if it's the difference between possible and impossible that might be acceptable.
Please give me some DIMM slots on the GPU so that I can choose my own memory like I'm used to from the CPU-world and which I can re-use when I upgrade my GPU.
An M1 Mac Studio with that much RAM can be had for around $3K if you look for good deals, and will give you ~8 tok/s on a 70B model, or ~5 tok/s for a 120B one.
Unfortunately production capacity for that is limited, and with sufficient demand, all pricing is an auction. Therefore, we aren't going to be seeing that card in years
We have highly efficient models for inference and a quantization team.
Need moar GPUs to do a video version of this model similar to Sora now they have proved that Diffusion Transformers can scale with latent patches (see stablevideo.com and our work on that model, currently best open video model).
We have 1/100th of the resources of OpenAI and 1/1000th of Google etc.
Google got cheap TPU chips, means they circumvent the extremely expensive Nvidia corporate licenses. I can easily see them having 10x the resources of OpenAI for this.
Yes, they have deep pockets and could increase investment if needed. But the actual resources devoted today are public, and in line with the parent said.
can someone explain why nVidia doesn't just hold their own AI? And literally devote 50% of their production to their own compute center? In an age where even ancient companies like Cisco are getting in the AI race, why wouldn't the people with the keys to the kingdom get involved?
They've been very happy selling shovels at a steep margin to literally endless customers.
The reason is because they instantly get a risk free guaranteed VERY healthy margin on every card they sell, and there's endless customers lined up for them.
If they kept the cards, they give up the opportunity to make those margins, and instead take the risk that they'll develop a money generating service (that makes more money then selling the cards).
This way there's no risk of: A competitor out competing them, not successfully developing a profitable product, "the ai bubble popping", stagnating development, etc.
There's also the advantage that this capital has allowed them to buy up most of TSMC's production capacity, which limits the competitors like Google's TPUs.
Because history has shown that the money is in selling the picks and shovels, not operating the mine. (At least for now. There very well may come a point later on when operating the mine makes more sense, but not until it's clear where the most profitable spot will be)
Don’t stretch that analogy too far. It was applicable to gold rushes, which were low hanging fruit where any idiot could dig a hole and find gold.
Historically, once the easy to find gold was all gone it was the people who owned the deep gold mines and had the capital to exploit them who became wealthy.
1. the real keys to the kingdom are held by TSMC whose fab capacity rules the advanced chips we all get, from NVIDIA to Apple to AMD to even Intel these days.
2. the old advice is to sell shovels during a gold rush
> Why is there not a greater focus on quantization to optimize model performance, given the evident need for more GPU resources?
There is an inherent trade off between model size and quality. Quantization reduces model size at the expense of quality. Sometimes it's a better way to do that than reducing the number of parameters, but it's still fundamentally the same trade off. You can't make the highest quality model use the smallest amount of memory. It's information theory, not sorcery.
Yes Quantization compresses float32 values to int8 by mapping the large range of floats to a smaller integer range using a scale factor. This scale factor is key for converting back to floats (dequantization), aiming to preserve as much information as possible within the int8 limits. While quantization reduces model size and speeds up computation, it trades off some accuracy due to the compression. It's a balance between efficiency and model quality, not a magic solution to shrink models without losing some performance.
Quantization is essential for me since a 7B model won't fit on my RTX 2060 with only 6GB of VRAM. It allows me to compress the model so it can run on my hardware.
I understand that Sora is very popular, so it makes sense to refer to it, but when saying it is similar to Sora, I guess it actually makes more sense to say that it uses a Diffusion Transformer (DiT) (https://arxiv.org/abs/2212.09748) like Sora. We don't really know more details on Sora, while the original DiT has all the details.
Is anyone else struck by the similarities in textures between the images in the appendix of the above "Scalable Diffusion Models with Transformers" paper?
If you size the browser window right, paging with the arrow keys (so the document doesn't scroll) you'll see (eg, pages 20-21) the textures of the parrot's feathers are almost identical to the textures of bark on the tree behind the panda bear, or the forest behind the red panda is very similar to the undersea environment.
Even if I'm misunderstanding something fundamental here about this technique, I still find this interesting!
So is this "SDXL safe" or "SD2.1" safe, cause SDXL safe we can deal with, if it's 2.1 safe it's gonna end up DOA for a large part of the opensource community again
Don't know about 3.0, but Cascade has different level of safety between the full model and the light model. Full model is far more prudish, but both completely fail with some prompts.
>>>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?
>>>Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
--
Can you fragment responses such that if an edge device (mobile app) is prompted for [thing] it can pass tokens upstream on the prompt -- Torrenting responses effectively - and you could push actual GPU edge devices in certain climates... like dens cities whom are expected to be a Fton of GPU cycle consumption around the edge?
So you have tiered processing (speed is done locally, quality level 1 can take some edge gpu - and corporate shit can be handled in cloud...
----
Can you fragment and torrent a response?
If so, how is that request torn up and routed to appropriate resources?
BOFH me if this is a stupid question? (but its valid for how we are evolving to AI being intrinsic to our society so quickly.)
Soon the GPU and its associated memory will be on different cards, as once happened with CPUs. The day of the GPU with ram slots is fast approaching. We will soon plug terabytes of ram into our 4090s, then plug a half-dozen 4090s into a raspberry PI to create a Cronenberg rendering monster. Can it generate movies faster than Pixar can write them? Sure. Can it play Factorio? Heck no.
Any seperation of a GPU from its VRAM is going to come at the expense of (a lot of) bandwidth. VRAM is only as fast as it is because the memory chips are as close as possible to the GPU, either on seperate packages immediately next to the GPU package or integrated onto the same package as the GPU itself in the fanciest stuff.
If you don't care about bandwidth you can already have a GPU access terabytes of memory across the PCIe bus, but it's too slow to be useful for basically anything. Best case you're getting 64GB/sec over PCIe 5.0 x16, when VRAM is reaching 3.3TB/sec on the highest end hardware and even mid-range consumer cards are doing >500GB/sec.
Things are headed the other way if anything, Apple and Intel are integrating RAM onto the CPU package for better performance than is possible with socketed RAM.
That depends on whether performance or capacity is the goal. Smaller amounts of ram closer to the processing unit makes for faster computation, but AI also presents a capacity issue. If the workload needs the space, having a boatload of less-fast ram is still preferable to offloading data to something more stable like flash. That is where bulk memory modules connected though slots may one day appear on GPUs.
Is there a way to partition the data so that a given GPU had access to all the data it needs but the job itself was parallelized over multiple GPUs?
Thinking on the classic neural network for example, each column of nodes would only need to talk to the next column. You could group several columns per GPU and then each would process its own set of nodes. While an individual job would be slower, you could run multiple tasks in parallel, processing new inputs after each set of nodes is finished.
No it won't. GPUs are good at ml partly because of the huge memory bandwidth. 1000s of bits wide. You won't find connectors that have that many terminals and maintain signal quality. Even putting a second bank soldered on the same signals can be enough to mess things up.
I doubt it. The latest GPUs utilize HBM which is necessarily part of the same package as the main die. If you had a RAM slot for a GPU you might as well just go out to system RAM, way too much latency to be useful.
It isn't the latency which is the problem, it's the bandwidth. A memory socket with that much bandwidth would need a lot of pins. In principle you could just have more memory slots where each slot has its own channel. 16 channels of DDR5-8000 would have more bandwidth than the RTX 4090. But an ordinary desktop board with 16 memory channels is probably not happening. You could plausibly see that on servers however.
What's more likely is hybrid systems. Your basic desktop CPU gets e.g. 8GB of HBM, but then also has 16GB of DRAM in slots. Another CPU/APU model that fits into the same socket has 32GB of HBM (and so costs more), which you could then combine with 128GB of DRAM. Or none, by leaving the slots empty, if you want entirely HBM. A server or HEDT CPU might have 256GB of HBM and support 4TB of DRAM.
I don’t think you really understand the current trends in computer architecture. Even cpus are being moved to have on package ram for higher bandwidth. Everything is the opposite of what you said.
Higher bandwidth but lower capacity. The real trend is different physical architectures for different compute loads. There is a place in AI for bulk albeit slower memory such as extremely large date sets that want to run internally on a discreet card without involving pci lanes.
This is also not true. You can transfer from main memory to cards plenty fast enough that it is not a bottleneck. Consumer GPU's don't even use pcie5 yet, which doubles the bandwidth of 4. Professional datacenter cards don't use pcie AT ALL, but they do put a huge amount of RAM on the package with the GPUs.
I imagine this doesn't look impressive to anyone unfamiliar with the scene, but this was absolutely impossible with any of the older models. Though, I still want to know if it reliabily does this--so many other things are left to chance, if I need to also hit a one-in-ten chance of the composition being right, it still might not be very useful.
It’s the transformer making the difference. Original stable diffusion uses convolutions, which are bad at capturing long range spatial dependencies. The diffusion transformer chops the image into patches, mixes them with a positional embedding, and then just passes that through multiple transformer layers as in an LLM. At the end, the model unpatchify’s (yes, that term is in the source code) the patched tokens to generate output as a 2D image again.
The transformer layers perform self-attention between all pairs of patches, allowing the model to build a rich understanding of the relationships between areas of an image. These relationships extend into the dimensions of the conditioning prompts, which is why you can say “put a red cube over there” and it actually is able to do that.
I suspect that the smaller model versions will do a great job of generating imagery, but may not follow the prompt as closely, but that’s just a hunch.
Convolution filters attend to a region around each pixel; not to every other pixel (or patch in the case of DiT). In that way, they are not good at establishing long range dependencies. The U-Net in Stable Diffusion does add self-attention layers but these operate only in the lower resolution parts of the model. The DiT model does away with convolutions altogether, going instead with a linear sequence of blocks containing self-attention layers. The dimensionality is constant throughout this sequence of blocks (i.e. there is no downscaling), so each block gets a chance to attend to all of the patch tokens in the image.
One of the neat things they do with the diffusion transformer is to enable creating smaller or larger models simply by changing the patch size. Smaller patches require more Gflops, but the attention is finer grained, so you would expect better output.
Another neat thing is how they apply conditioning and the time step embedding. Instead of adding these in a special way, they simply inject them as tokens, no different from the image patch tokens. The transformer model builds its own notion of what these things mean.
This implies that you could inject tokens representing anything you want. With the U-Net architecture in stable diffusion, for instance, we have to hook onto the side of the model to control it in various sort of hacky ways. With DiT, you would just add your control tokens and fine tune the model. That’s extremely powerful and flexible and I look forward to a whole lot more innovation happening simply because training in new concepts will be so straightforward.
My understanding of this tech is pretty minimal, so please bear with me, but is the basic idea is something like this?
Before: Evaluate the image in a little region around each pixel against the prompt as a whole -- e.g. how well does a little 10x10 chunk of pixels map to a prompt about a "red sphere and blue cube". This is problematic because maybe all the pixels are red but you can't "see" whether it's the sphere or the cube.
After: Evaluate the image as a whole against chunks of the prompt. So now we're looking at a room, and then we patch in (layer?) a "red sphere" and then do it again with a "blue cube".
It kinda makes sense, doesn't it? What are the largest convolutions you've heard of -- 11 x 11 pixels? Not much more than that, surely? So how much can one part of the image influence another part 1000 pixels away? But I am not an expert in any of this, so an expert's opinion would be welcome.
Yes it makes sense a bit. Many popular convents operate on 3x3 kernels. But the number of channel increases per layer. This, coupled with the fact that the receptive field increases per layer and allows convnets to essentially see the whole image relatively early in model's depth (esp. coupled with pooling operations which increase the receptive field rapidly), makes this intuition questionable. Transformers on the other hand, operate on attention which allows them to weight each patch dynamically, but it's clear to me that this allows them to attend to all parts of the image in a way different from convnets.
This is just stylistic, and I think it’s because chatgpt knows a bit “better” that there aren’t very many literal photos of abstract floating shapes. Adding “studio photography, award winner” produced results quite similar to SD imo, but this does negatively impact the accuracy. On the other side of the coin, “minimalist textbook illustration” definitely seems to help the accuracy, which I think is soft confirmation of the thought above.
EDIT: I think the best approach is simply to separate out the terms in separate phrases, as that gets more-or-less 100% accuracy https://imgur.com/a/JGjkicQ
That said, we should acknowledge the point of all this: SD3 is just incredibly incredibly impressive.
From my experience, the thing that makes using AI image gen hard to use is nailing specificity. I often find myself having to resort to generating all of the elements I want out of an image separately and then comp them together with photoshop. This isn't a bad workflow, but it is tedious (I often equate it to putting coins in a slot machine, hoping it 'hits').
Generating good images is easy but generating good images with very specific instructions is not. For example, try getting midjourney to generate a shot of a road from the side (ie standing on the shoulder of a road taking a photo of the shoulder on the other side with the road crossing frame from left to right)...you'll find midjourney only wants to generate images of roads coming at the "camera" from the vanishing point. I even tried feeding an example image with the correct framing for midjourney to analyze to help inform what prompts to use, but this still did not result in the expected output. This is obviously not the only framing + subject combination that model(s) struggle with.
For people who use image generation as a tool within a larger project's workflow, this hurdle makes the tool swing back and forth from "game changing technology" to "major time sink".
If this example prompt/output is an honest demonstration of SD3's attention to specificity, especially as it pertains to framing and composition of objects + subjects, then I think its definitely impressive.
For context, I've used SD (via comfyUI), midjourney, and Dalle. All of these models + UIs have shared this issue in varying degrees.
It's very difficult to improve text-to-image generation to do better than this because you need extremely detailed text training data, but I think a better approach would be to give up on it.
> I often find myself having to resort to generating all of the elements I want out of an image separately and then comp them together with photoshop. This isn't a bad workflow, but it is tedious
The models should be developed to accelerate this then.
ie you should be able to say layer one is this text prompt plus this camera angle, layer two is some mountains you cheaply modeled in Blender, layer three is a sketch you drew of today's anime girl.
Totally agree. I am blown away by that image. Midjourney is so bad at anything specific.
On the other hand, SD has just not been on the level of the quality of images I get from Midjourney. The people who counter this I don't think know what they are talking about.
previous systems could not compose objects within the scene correctly, not to this degree. what changed to allow for this? could this be a heavily cherrypicked example? guess we will have to wait for the paper and model to find out
We introduce Diffusion Transformers (DiTs), a simple transformer-based backbone for diffusion models that outperforms prior U-Net models and inherits the excellent scaling properties of the transformer model class. Given the promising scaling results in this paper, future work should continue to scale DiTs to larger models and token counts. DiT could also be explored as a drop-in backbone for text-to-image models like DALL E 2 and Stable Diffusion.
Afaict the answer is that combining transformers with diffusers in this way means that the models can (feasibly) operate in a much larger, more linguistically-complex space. So it’s better at spatial relationships simply because it has more computational “time” or “energy” or “attention” to focus on them.
One thing that jumps out to me is that the white fur on the animals has a strong green tint due to the reflected light from the green surfaces. I wonder if the model learned this effect from behind the scenes photos of green screen film sets.
The models do a pretty good job at rendering plausible global illumination, radiosity, reflections, caustics, etc. in a whole bunch of scenarios. It's not necessarily physically accurate (usually not in fact), but usually good enough to trick the human brain unless you start paying very close attention to details, angles, etc.
This fascinated me when SD was first released, so I tested a whole bunch of scenarios. While it's quite easy to find situations that don't provide accurate results and produce all manner of glitches (some of which you can use to detect some SD-produced images), the results are nearly always convincing at a quick glance.
As well as light and shadows, yes. It can be fixed explicitly during training like the paper you linked suggests by offering a classifier, but it will probably also keep getting better in new models on its own, just as a result of better training sets, lower compression ratios, and better understanding of the real world by models.
I think you have to conceptualize how diffusion models work, which is that once the green triangle has been put into the image in the early steps, the later generations will be influenced by the presence of it, and fill in fine details like reflection as it goes along.
The reason it knows this is that this is how any light in a real photograph works, not just CGI.
Or if your prompt was “A green triangle looking at itself in the mirror” then early generation steps would have two green triangle like shapes. It doesn’t need to know about the concept of light reflection. It does know about composition of an image based on the word mirror though.
It's just diffuse irradiance, visible in most real (and CGI) pictures although not as obvious as that example. Seems like a typical demo scene for a 3D renderer, so I bet that's why it's so prominent.
It does make sense though. Accurate global illumination is very strongly represented in nearly all training data (except illustrations) so it makes sense that the model learned an approximation of it.
What if you can | a scene to a model and just have it calc all the ray-paths and then | any color/image... if you pre-calc various ray angles, you can then just map your POV and allow for the volume as it pertains to your POV be mapped with whatever overlay you want.
Here is the crazy cyberpunk part:
IT (whatever 'IT' is) keeps a lidar of everything EVERYONE senses in that space and can overlap/time/sequence anything about each experience and layer (baromoter/news/blah tied to that temporal marker)
Micro resolution of advanced lidar is used in signature creation to ensure/verify/detect fake places vs IRL.
Secret nodes are used to anti-lidar the sensors... so a place can be hidden from drones attempting to map it.
These anonolies are detectable thou, and GIS experts with terra forming skills are the new secOPs.
Fn dorks.
-- so, you already have an asset, lets say its a CUBOID room - with walls and such of wood texture_05.png
I think you've read too far into this. Ray tracing is not a useful real-world primitive for extracting information from most scenes. Sure, "everything is shiny", but most surfaces are diffuse and don't contain useful visual information besides the object they illuminate. Many supposedly "pure" reflections like mirrors and glass are actually subtle caustics that introduce too much nuance to account for.
Also, "pipe" isn't considered harmful terminology (yet) just FYI. I was confused seeing the "|" mononym in it's place.
But I realize you are correctin the mirroring - I immediately thought it was ray tracing the green hue from the reflection onto a surface that could see it...
Inference is far more efficient - however - it would be really interesting to know HOW an AI 'thinks' about such reflections?
Whats the current status of AIs documenting themselves?
This is actually the approach of one paper to estimate lighting conditions. Their strategy is to paint a mirrored sphere onto an existing image: https://diffusionlight.github.io/
How do you know which way the red sphere is facing? A fun experiment would be to write two prompts for "a person in the middle, a dog to their left, and a cat to their right", and have the person either facing towards or away from the viewer.
The obsession with safety in this announcement feels like a missed marketing opportunity, considering the recent Gemini debacle. Isn’t SD’s primary use case the fact that you can install it on your own computer and make what you want to make?
And safe doesn't mean "lower than 1/10^6 chance of ending humanity", safe means shoddily implemented curtailing to idpol + fundamentalist level moral aversion towards human sexuality
It's not really their feelings, it's about controversy, bad publicity, etc. It's too delicate right now to risk people using their models for sex stuff.
I don't believe corporations implementing liberal politics to prevent backlash and it being legislated onto them qualifies as them being on the "left political spectrum".
Some backlash is perfectly ignorable. The twitter mob will move onto the next thing in a few days. And the proliferation of turned-to-the-max DEI employee policies, inclusion committees and self-censored newspeak does come from the californic technobubble cesspit.
There is such a great liberal fear of being perceived as any of the negative -ists and -isms that the pendulum swings to the other extreme where the left horseshoe toe meets its rightmost brother, which is why SD and Google's new toy rewrite ancient European history to include POC's and queer people.
At some point they have to actually make money, and I don't see how continuously releasing the fruits of their expensive training for people to run locally on their own computer (or a competing cloud service) for free is going to get them there. They're not running a charity, the walls will have to go up eventually.
Likewise with Mistral, you don't get half a billion in funding and a two billion valuation on the assumption that you'll keep giving the product away for free forever.
Ironically their over sensitive nsfw image detector in their api caused me to stop using it and run it locally instead. I was using it to render animations of hundreds of frames but when every 20th to 30th image comes out blurry it ruins the whole animation and it would double the cost or more to rerender it with a different seed hoping to not trigger the over zealous blurring.
I don’t mind that they don’t want to let you generate nsfw images but their detector is hopelessly broken, it once censored a cube, yes a cube...
Unfortunately I don't want to pay for hundreds if not thousands of images I have to throw away because it decided some random innocent element is offensive and blurs the entire image.
What they are achieving with the over zealous safety issues are driving developers to on demand GPU hosts that will let them host their own models, which also opens up a lot more freedom. I wanted to use the stability AI api as my main source for Stable Diffusion but they make it really really hard especially if you want use it as part of your business.
I agree that given the status quo, it's a no-brainer to host your own model rather than use their SaaS – and likely one of the main reasons SAI doesn't seem to be on a very stable (heh) footing financially. To put it mildly.
But there are plenty of other business models available for open source projects.
I use Midjourney a lot and (based on the images in the article) it’s leaps and bounds beyond SD. Not sure why I would switch if they are both locked down.
SD would probably be a lot better if they didn't have to make sure it worked on consumer GPUs. Maybe this announcement is a step towards that where the best model will only be able to be accessed by most using a paid service.
Stable Diffusion has a much deeper learning curve but can generate far more accurate images fitting your perhaps special use case.
Although I don't understand the criticism of the images in question. Without a prompt comparison, it is impossible to compare image synthesis. What are examples of images that are beyond these?
I haven’t used SD so maybe the images on their home page here aren’t representative. But they look very generic and boring to me. They seem to lack “style” in a general aesthetic sense.
I am using Midjourney to basically create images in particular artistic styles (e.g., “painting of coffee cup in ukiyo-e style”) and that works very well. I am interested in SD for creating images based on artwork that isn’t indexed by Midjourney, though, as some of the more obscure artists aren’t available.
Usually there are models adapted to a specific theme since generic models at some point hit barriers. To get an idea, you could look up examples on sites like civitai.com.
Of course such sites are heavily biased towards content that is popular, but you will also find quite specific models if you search for certain styles.
Could you list the concrete "safety checks" that you think prevents real-world harm? What particular image that you think a random human will ask the AI to generate, which then leads to concrete harm in the real world?
This question narrows the scope of "safety" to something less than what the people at SD or even probably what OP cares about. _Non-random_ CSAM requests targeting potentially real people is the obvious answer here, but even non-CSAM sexual content is also a probably a threat. I can understand frustration with it currently going overboard on blurring, but removing safety checks altogether would result in SD mainly being associated with porn pretty quickly, which I'm sure Stability AI wants to avoid for the safety of their company.
Add to that, parents who want to avoid having their kids generate sexual content would now need to prevent their kids from using this tool because it can create it randomly, limiting SD usage to kids 18+ (which is probably something else Stability AI does not want to deal with.)
It's definitely a balance between going overboard and having restrictions though. I haven't used SD in several months now so I'm not sure where that balance is right now.
> non-CSAM sexual content is also a probably a threat
To whom? SD's reputation, perhaps - but that ship has already sailed with 1.x. That aside, why is generated porn threatening? If anything, anti-porn crusaders ought to rejoice, given that it doesn't involve actual humans performing all those acts.
As I said, it means parents who don't want their young children seeing porn (whether you agree with them or not) would no longer be able to let their children use SD. I'm not making a statement on what our society should or shouldn't allow, I'm pointing out what _is currently_ the standard in the United States and many other, more socially conservative, countries. SD would become more heavily regulated, an 18+ tool in the US, and potentially banned in other countries.
You can have your own opinion on it, but surely you can see the issue here?
I can definitely see an argument for a "safe" model being available for this scenario. I don't see why all models SD releases should be so neutered, however.
How many of those parents would have the technical know-how to stop their lids from playing with SD? Give the model some “I am over 18” checkbox fig leaf and let them have their fun.
The harm is that any use of the model becomes illegal in most countries (or offends credit card processors) if it easily generates porn. Especially if it does it when you didn't ask for it.
If 1 in 1,000 generations will randomly produce memorized CSAM that slipped into the training set then yeah, it's pretty damn unsafe to use. Producing memorized images has precedent[0].
Do you have an example? I've never heard of anyone accidentally generating CSAM, with any model. "1 in 1,000" is just an obviously bogus probability, there must have been billions of images generated using hundreds of different models.
Besides, and this is a serious question, what's the harm of a model accidentally generating CSAM? If you weren't intending to generate these images then you would just discard the output, no harm done.
Nobody is forcing you to use a model that might accidentally offend you with its output. You can try "aligning" it, but you'll just end up with Google Gemini style "Sorry I can't generate pictures of white people".
Earlier datasets used by SD were likely contaminated with CSAM[0]. It was unlikely to have been significant enough to result in memorized images, but checking the safety of models increases that confidence.
And yeah I think we should care, for a lot of reasons, but a big one is just trying to stay well within the law.
Then you know almost nothing about the SD 1.5 ecosystem apparently. I've finetuned multiple models myself and it's nearly impossible to get rid of the child-bias in anime-derived models (which applies to 90 % of character focussed models) including nsfw ones. Took me like 30 attempts to get somewhere reasonable and it's still noticeable.
If we're being honest, anime and anything "anime-derived" is uncomfortably close to CSAM as a source material, before you even get SD involved, so I'm not surprised.
What I had in mind were regular general purpose models which I've played around with quite extensively.
They try to, but it is difficult to comb through billions of images, and at least some of SD's earlier datasets were later found to have been contaminated with CSAM[0].
Okay, by "safety checks" you meant the already unlawful things like CSAM, but not politically-overloaded beliefs like "diversity"? The latter is what the comment[1] you were replying to was referring to (viz. "considering the recent Gemini debacle"[2]).
Right, by "rather have this [nothing]" I meant Stable Diffusion doing some basic safety checking, not Google's obviously flawed ideas of safety. I should have made that clear.
I posed the worst-case scenario of generating actual CSAM in response to your question, "What particular image that you think a random human will ask the AI to generate, which then leads to concrete harm in the real world?"
I've noticed that SDXL does something a little odd. For a given prompt it essentially decides what race the subject should be without the prompt having specified one. You generate 20 images with 20 different seeds but the same prompt and they're typically all the same race. In some cases they even appear to be the same "person" even though I doubt it's a real person (at least not anyone I could recognize as a known public figure any of the times it did this). I'm kind of curious what they changed from SD 1.5, which didn't do this.
I notice they are avoiding images of people in the announcement.
I wonder if they are afraid of the same debacle as google AI and what they mean by "safety" is actually heavy bias against white people and their culture like what happened with Gemini.
I wouldn't look for hidden reasons. Recent image generators are already too good with face generation (thanks to CelebA-like datasets and early researchers).
And now the emphasis is on the multimodality of the model within a domain. There, almost every picture demonstrates some aspect of it. Somewhere there is text on the picture (old AI used to output bullshit instead of letters), somewhere there are humorous references to old images (for example, a cosmonaut on a pig).
From the examples I see on Twitter, they are usually referring to the different cultures of Irish, European, and American white people. Gemini, in an effort to reverse the bias that the models would naturally have, ends up replacing these people with those from other cultures.
Since the definition of "white" is inherently cultural, it varies from place to place and from time to time. Today, in US and Europe, pretty much everyone who cares about racial categorization would consider Irish "white". Historically, it was different, but that is only relevant when discussing history.
Isn't it even more racist to replace them in a picture? Being told that your skin colour is too offensive to show sounds a lot worse to me than calling them "white" considering their skin is very white
US American white people. Anything else would be a ridiculous overgeneralization, like "Asian culture", even if you set some arbitary benchmark for teint and only look at those European countries it's still too much diversity to pool together.
As if you can generalize the culture of different European countries, or even different regions in the same country just by skin color. Now this, in my opinion, is a form of cultural erasure where all the intricacies and interesting aspects of culture are put aside and overshadowed by skin color.
IMO the "safety" in Stable Diffusion is becoming more overzealous where most of my images are coming back blurred, where I no longer want to waste my time writing a prompt only for it to return mostly blurred images. Prompts that worked in previous versions like portraits are coming back mostly blurred in SDXL.
If this next version is just as bad, I'm going to stop using Stability APIs. Are there any other text-to-image services that offer similar value and quality to Stable Diffusion without the overzealous blurring?
Edit:
Example prompt's like "Matte portrait of Yennefer" return 8/9 blurred images [1]
The nice thing about Stable Diffusion is that you can very easily set it up on a machine you control without any 'safety' and with a user-finetuned checkpoint.
That isn't the topic. Porn is an example, but safety is synonymous with puritanical requirements arbitrarily summed up as the lowest common denominator. I want a powerful AI, not a replacement of a priest.
Gemini demonstrated a product I do not want to use and I am aware about the requirements of corporate contexts, although I think the safety mechanisms should be in the hand of users.
Google optimized for advertisers, but I am not interested in such content as it provides little value.
Ok, but it seems very stupid to say you want the powerful AI to specifically come from a specific API when the very same tech is open sourced for any one to do whatever they want with
No large scale model maker is going to put out public models for B2B with dubious use cases.
what the problem is: OpenAI, facebook, Google are not curating the data sets. you're arguing they shouldn't put controls after the fact. but what you actually want is them to use quality datasets.
Taking the actual example you provided, I can understand the issue. Since it amounts to blurring images of a virtual character, that are not actually "naughty." Equivalent images in bulk quantity are available on every search engine with "yennefer witcher 3 game" [1][2][3][4][5][6] Returns almost the exact generated images, just blurry.
I've never seen blurring in my images. Is that something that they add when you do API access? I'm running SD 1.5 and SDXL 1.0 models locally. Maybe I'm just not prompting for things they deem naughty. Can you share an example prompt where the result gets blurred?
If you run locally with the basic stack it’s literally a bool flag to hide nsfw content. It’s trivial to turn off and off by default in most open source setups.
Wait, blurring (black) means that it objected to the content? I tried it a few times on one of the online/free sites (Huggingspace, I think) and I just assumed I'd gotten a parameter wrong.
Given the optimizations applied to SDXL (comparing to SD 1.5), it is understandable why it outputs blurry backgrounds. It is not for safety, it is just a cheap way to hide imperfections of technology. Imagine 2 neural networks: one occasionally outputs Lovecraftian hallucinated chimeras on backgrounds, another one outputs sterile studio-quality images. Researches selected the second approach.
It appears that they are trying to prevent generating accurate images of a real person, because they are worried about deepfakes, and this produces the blurring. While Yennefer is a fictional character she's played by a real actress on Netflix, so maybe that's what is triggering the filter.
I haven't tried SD3, but my local SD2 regularly has this pattern where while the image is developing it looks like it's coming along fine and then suddenly in the last few rounds it introduces weird artifacts to mask faces. Running locally doesn't get around censorship that's baked into the model.
I tend to lean towards SD1.5 for this reason—I'd rather put in the effort to get a good result out of the lesser model than fight with a black box censorship algorithm.
EDIT: See the replies below. I might just have been holding it wrong.
Be sure to turn off the refiner. This sounds like you’re making models that aren’t aligned with their base models and the refiner runs in the last steps. If it’s a prompt out of alignment with the default base model it’ll heavily distort. Personally with SDXL I never use the refiner I just use more steps.
Well ya because SD2 literally had purposeful censorship of the base model and the clip, that basically made it DOA to the entire opensource community that were dedicated to 1.5, SDXL wasnt so bad so it gained traction but still 1.5 is the king because it was from before the damn models were gimped at the knees and relied on workarounds and insane finetunes just to get basic anatomy correct.
Probably not, since I have no idea what you're talking about. I've just been using the models that InvokeAI (2.3, I only just now saw there's a 3.0) downloads for me [0]. The SD1.5 one is as good as ever, but the SD2 model introduces artifacts on (many, but not all) faces and copyrighted characters.
EDIT: based on the other reply, I think I understand what you're suggesting, and I'll definitely take a look next time I run it.
SDXL should be used together with a refiner. You can usually see the refiner kicking in if you have a UI that shows you the preview of intermediate steps. And it can sometimes look like the situation you describe (straining further away from your desired result).
That person would rather pay for API than set up locally (which is simple as unzip and add model), setting up in cloud can be painful if you aren't familiar
It’ll be interesting to see what “safety” means in this case given the censorship in diffuser models nowadays. Look what’s happening with Gemini, it’s quite scary really how different companies have different censorship values
I’ve had some fair share of frustation with DallE as well when trying to generate weapon images for game assets. Had to tweak a lot of my prompt
> We believe in safe, responsible AI practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors. Safety starts when we begin training our model and continues throughout the testing, evaluation, and deployment. In preparation for this early preview, we’ve introduced numerous safeguards. By continually collaborating with researchers, experts, and our community, we expect to innovate further with integrity as we approach the model’s public release.
What exactly does this mean? Will we be able to see all of the "safeguards" and access all of the technology's power without someone else's restrictions on them?
For SDXL this meant that there were almost no NSFW (porn and similar) images included in the dataset, so the community had to fine-tune the model themselves to make it generate those.
I guess this statement is a cheap protection against cheap journalists. Otherwise by now all the tabloids would be full of scary stories about deepfake politicians, deep-porn and all types of blackmailers (by the way, there is so much competition in AI now that some company may well pay for a wave of such articles to destroy the competitor). And in response to this, hearty old men would clobber the Congress with petitions to immediately ban all AI. Who wants that?
I'd want a model that can draw website designs and other UIs well. So I give it a list of things in the UI, and I get back a bunch of UI design examples with those elements.
I'm gonna hazard a guess and say well within the capabilities of a fine tuned model, but that no such fine tuned model exists and the labeled data required to generate it is not really there.
That's not safety, the safety RLHF is because it tries to generate porn and people with three legs if you don't stop it.
It has the weird art style because that's what looks the most "aesthetic". And because it doesn't actually have nearly as good enough data as you'd think it does.
That's why we need open AI which scoops up all the data with its specific contexts and history and transforms it into a vast incomprehensible machine for us peons to gawk at while we starve and boil to death
Photographs, digital illustrations, comic or cartoon style images, whatever graphical style you can imagine are all easy to achieve with current models (though no single model is a master of all trades). Things that look like technical drawings are as well, but don't expect them to make any sense engineering-wise unless maybe if you train a finetune specifically for that purpose.
Rewriting the "safety" part, but replacing the AI tool with an imaginary knife called Big Knife:
"We believe in safe, responsible knife practices. This means we have taken and continue to take reasonable steps to prevent the misuse of Big Knife by bad actors."
Anyone knows which AI could be used to generate UI design elements ? (such as "generate a real estate app widget list") as well as the kind of prompts one would use to obtain good results ?
I'm only now investigating using AI to increase velocity in my projects, and the field is moving so fast, i'm a bit outdated.
If by design elements you include vector images, you could try https://www.recraft.ai/ or Adobe Firefly 2 - there's not a lot of vector work right now, so your choices are either the handful of vector generators, or just bite the bullet and use eg DALL-E 3 to generate raster images you convert to SVG/recreate by hand.
(The second is what we did for https://gwern.net/dropcap because the PNG->SVG filesizes & quality were just barely acceptable for our web pages.)
From the FAQ: "v0 is a generative user interface system by Vercel powered by AI. It generates copy-and-paste friendly React code based on shadcn/ui and Tailwind CSS that people can use in their projects"
at this point perfect text would be a gamechanger if it can be solved
midjourney 6 can be completely photorealistic and include valid text, but also sometimes adds bad text. it's not much, but having to use an image editor for that is still annoying. for creating marketing material, getting perfect text every time and never getting bad text would be amazing
I wonder if we could get it to generate a layered output, to make it easy to change just the text layer. It already creates the textual part in a separate pass, right?
Current open source tools include pretty decent off the shelf segment anything based detectors. It leaves a lot to be desired, but you do layer-like operations automatically detecting certain concept and applying changes to them or, less commonly exporting the cropped areas. But not the content "beneath" the layers as they don't exist.
I would bet that Adobe is definitely salivating at that. Might not be for a long time but it seems like a no brainer once the technology can handle it. Just the last few years have been fast and I interacted with the JS landscape for a few years. It moves faster than Sonic and this tech iterates quick.
A blogger I follow had an article explaining that the NSFW models for SDXL, are just now SORT OF coming up to the quality of SD1.5 “pre safety” models.
It’s been 6 months and it still isn’t there. SD3 is going to be quite awhile if they’re baking “safety” in even harder.
1.5 is still more popular than xl and 2 for reasons unrelated to safety. The size and generation speed matter a lot. This is just a matter of practical usability, not some idea of the model being locked down. Feed it enough porn and you'll get porn out of it. If people have incentive to do that (better results than 1.5), it really will happen within days.
I wish I had something more clever to comment on it. I know what they’re doing which is cool and why which is, IDK, live and let live and enjoy your own kink. It just a little funny some of the most work put into in the fine tuning models.. is from the pony community.
It's not just them. For example, 4chan is a surprisingly good way to get the most recent scoop on good models (both text and images), setup guides etc - if you can tolerate the inevitable, well, 4chan-ness of it. And the reason is exactly the same: a lot of people there really, really, really, want to generate porn and to chat with sexbots, and they're putting a lot of effort into getting the best (and least censored) results out of the resources that they have.
Apparently most stuff that relies on the weights being resonably similar to SDXL don't work - control nets, LORAs, commonly-used inpainting patches, the lot. It seems to go well beyond fine tuning and be substantially retrained to the point it's a good chunk of the way to being a different model entirely, and the amount of training time is on the order of what went into SDXL originally too from what I can tell.
It’s not obvious that fine-tuning can remove all latent compulsions from these models. Consider that the creators know that fine-tuning exists and have vastly more resources to explore the feasibility of removing deep bias using this method.
I mean, SDXL is great. Until you’ve had a chance to actually use this model, isn’t calling it out for some imagined offence that may or may not exist seems like you’re drinking some Kool-aid rather than responding to something based in concrete actual reality.
You get access to it… and it does the google thing and puts people of colour in every frame? Sure, complain away.
You get access to it, you can’t even generate pictures of girls? Sure. Burn the house down.
…you haven’t even seen it and you’re already bitching about it?
Come on… give them a chance. Judge what it is when you see it not what you imagine it is before you’ve even had a chance to try it out…
Lots of models, free, multiple sizes, hot damn. This is cool stuff. Be a bit grateful for the work they’re doing.
…and even if sucks, it’s open. If it’s not what you want, you can retune it.
1. A mass-distributed LLM (hosted by google or openai or whoever) that's been neutered and twisted into political correctness in a haphazard series of kneejerk meetings of small groups of people who are terrified big investors will walk away or that some powerful political group will denounce them. Effectively they create an enormous bias of falsity and incorrectness, for billons of people to use and embed the results throughout all their intellectual output.
2. Some wacko with an expensive Nvidia GPU makes deepfake porn of a popular politician. Or goodness forbid, of some weird kink where if this was actually a scene filmed with real people, there would be serious ethical issues.
Which scenario do you think is more dangerous, long term, and in terms of broad impact on society in general?
I think you are seeing things from a bubble. Most people in the country are in favor of efforts to correct for historical injustices and are worried about AIs repeating biases in their training that could have a material impact on the world.
Case in point: large advertisers and entertainment companies "pander" to these sorts of views, because it is broadly popular
Most people in the country dont give a shit about historical injustices and are worried about how they are going to pay their rent or put food on the table today.
If you go by online, popularity, yes, a lot of people do want to erase history in favor of feelings. That can be your opinion too.
> Broadly popular
Are they? Or are the loudest voices asymmetrically affecting discourse?
> This is not the work of a shadowy cabal
And how would you know if it was? What would the clues be? If you were in a bubble that was designed to impart an informal religious view (and it is a religion sometimes, just one with a screen instead of a book) that encompass politics and morality, how would you know?
I just cannot comprehend how some people cannot see the incredibly obvious moral responsibility in releasing something that could be used for a lot of good but also to do bad things. There's no reasonable moral theory in which you could just shrug and go, "well, somebody is going to do it anyway so why should we even try to keep our conscience clean and avoid making it easy for them", it's fundamentally amoral and antisocial.
If someone invents a lockpick capable of opening any door, they have a moral responsibility in preventing it from falling into wrong hands, whether they want it or not. And it's absurd to complain when someone who could create an universal lockpick, refuses to do so, never mind release the technology to the wild, and only agrees to sell simpler picks capable of picking simpler locks. Do these people also complain about things like work against nuclear proliferation? After all, North Korea got nukes anyway, so what's the point?
Your lockpick example only works if there's e.g. only one and it's feasible to keep it hidden from the world.
Software doesn't work that way. I agree that you should be responsible about dangerous tech, but you also have to be realistic about what the best way to do that is, which is pretty much never "keep it hidden."
(and, of course, this is not even considering the question that should probably go here which is -- how dangerous is this exactly? Given the moral panic we saw a while ago about e.g. Photoshop, I'm not entirely convinced that this is much to worry about.)
On the contrary, the argument that "someone will do it anyway, so you should just let it happen and take no moral responsibility because I WANT MY SHINY TOY" is… not merely silly, but incredibly absurd, selfish, entitled, and amoral.
One could argue that if $BIGCORP doesn't want their thing to be used by bad actors, they should just refrain from developing the technology at all, and while it's a somewhat defensible position, that would also result in techbros not getting their toy, so it doesn't really apply here.
The willful ignorance in these threads is maddening. They know why models have these restrictions, they just think the rules shouldn't apply to them and want to play the victim. It's the same libertarian attitude as people who whine about driving speed laws.
If there's one lesson from the 21st century, it's that's you shouldn't release massively impacting technology without strong ethics controls around it.
What's the best way to use SD (3 or 2) online? I can't run it on my PC and I want to do some experiments to generate assets for a POC videogame I'm working on. I pay MidJOurney and I woulnd't mind pay something like 5 or 10 dollars per month to experiment with SD, but I can't find anything.
I used Rundiffusion for a while before I bought a 4090, and I thought their service was pretty nice. You pay for time on a system of whatever size you choose, with whatever tool/interface you select. I think it's worth tossing a few bucks into it to try it out.
Eh, you can get the same software up and running in less than 15-20 minutes on an EC2 GPU instance for about half the hourly-rated pricing of rundiffusion. And you'll also pay less than their 'premium' monthly fee for storage of keeping an instance in the Stopped state the entire month.
I used rundiffusion to play around with a bunch of different open source software quickly and easily with pre-downloaded models after getting annoyed at my laptop GPU. But once I settled on one particular implementation and started spending a lot of time in it, it no longer made sense to repeatedly pay every hour for an initial ease-of-setup.
The only real ongoing benefit was rundiffusion came with a bunch of models pre-downloaded so swapping between them was quick. But you can use UI addons like the CivitAI browser to download models automatically through automatic1111, and you'll likely want to go beyond what they predownload to the instance for you anyway.
The downside to running on the cloud directly is having to manage the running/stopped state of the instance yourself. I haven't ever left it running when I was done with an instance, but I could see that as a risk. CLI commands and scripting can make that faster than logging into a website which does it for you automatically, but it's extra effort.
I thought about building an AMI and putting it up on AWS marketplace, but it looks like there are a few options for that already. I don't know how good they are out of the box, as I haven't used them. But if spending 20 minutes once to get software running on a Linux instance is truly the only barrier to reducing cost, those prebuilt AMIs are a decent intermediary step. They're about $0.10/hour on top of server costs. I skipped straight to installing the software myself, but even an extra $0.10/hour overhead would be better than paying double..
Would you recommend that to someone who has never used AWS before? Is it possible to screw up and rack up a huge bill? I might consider using that for big tasks that I can't do with my local setup.
It's a _little_ possible to generate a huge bill, but the biggest risks here are:
1. Leaving instances running when they're not being used
and
2. Deviation from default behavior that results in accumulation of storage volumes you don't want or need (low likelihood but something to watch for initially).
For 1:
If you leave the instance running you'll keep getting charged the hourly rate. Not really unexpected, but you have to notice it yourself or set an alarm.
There are a few tricks to reduce likelihood of this happening and to limit charges if it does happen anyway:
a. Prevention: Make your own little auto-stop script for the instance like rundiffusion has. Maybe make it into the launch sequence too, so you run a script, it launches the instance, then starts a timer. If the timer counts down all the way with you jiggling it, it stops the instance.
b. Mitigation: Create an alarm on the instance with the action to 'Stop' the instance when the alarm is triggered. Set the trigger for the alarm to be something like 'Max CPU usage has been less than 4% for a consecutive hour'.
c. Mitigation: Use AWS' Instance scheduler to automatically stop the instance
d. Mitigation: Billing budgets with associated action to stop instances -- kind of like the alarms but triggered based on costs
For 2:
It's probably a non-issue. You'll likely not have a problem because you'll start and stop the same instance most of the time instead of creating and deleting new instances. In which case, gp3 SSD storage is $0.08/gb over a month, charges on a 200gb storage volume you keep around all the time and use is only like $16 for the month. There are benefits so it's likely worthwhile.
BUT, be careful if you create and terminate lots of instances instead of stopping and starting the same instance. There's a small possibility of accumulating extra storage volumes you don't need, without realizing it.
By DEFAULT that's not a problem. AWS will delete the attached storage volumes for an instance after you Terminate (not stop) the instance. The problem comes from changing the default behavior.
That change can be configured in the AMI you use to launch an instance (by whoever created the AMI), or by you when launching an instance. The storage volumes are not deleted automatically then you have to do it manually to stop them from continuing to generate charges. Which you might not notice right away..
Keep that in mind, but that scenario requires a fairly unlikely chain of requirements to get to the point of bill bloat:
- If you use a pre-built AMI from marketplace (such as one with stablediffusion preinstalled) which is configured to KEEP storage volumes upon termination instead of using AWS default of deleting them,
- and if you do a 'Create' and 'Terminate' instead of 'Start/Stop' so you're using a LOT of instances instead of a few,
- and if you don't notice the setting during launch,
- and if you don't see all the extra volumes sitting around...
People in this discussion seem to be hand-wringing about Stability's "saftey" comments but every model they've released has been fine tuned for porn in like 24 hours.
SD 2 definitely seems like an anomaly that they've learned from though and was hard for everyone to use for various reasons. SDXL and even Cascade (the new side-project model) seems to be embraced by horny people.
Horrible website, hijacks scrolling. I have my scrolling speed up with Chromium Wheel Smooth Scroller. This website's scrolling is extremely slow, so the extension is not working because they are "doing it wrong" TM and somehow hijack native scrolling and do something with it.
I wonder if this will actually be adopted by the community, unlike SD. 2.0. Many are still developing around SD 1.5 due to its uncensored nature. SDXL has done better than 2.0, but has greater hardware requirements so still can't be used by everyone running 1.5.
Wouldn't this v3 supersede the StableCascade work?
Did they announce it because a team had been working on it and they wanted to push it out to not just lose it as an internal project, or are there architectural differences that make both worthwile?
I think of the SD3 as a further evolution of SD1.5/2/XL and StableCascade as a branching path. It is unclear which will be better in the long term, so why not cover both directions if they have the resources to do so?
I suspect Stable Cascade may incorporate a DiT at some point. The UNet is easily swapped out. SC’s main innovation is the training of a semantic compressor model and a VQGAN that translates the latent output from the diffusion model back to image space - rather than relying on a VAE.
It’s a really smart architecture and I think is fertile ground for stacking on new things like DiT.
There's architectural differences, although I found Stable Cascade a bit underwhelming, while yes it can actual manage text, the text it does manage just looks like someone just wrote text over the image it doesn't feel integrated a lot of the time.
SD3 seems to be more towards SOTA, not sure why Cascade took so long to get out, seemed to be up and running months ago
If you renoise the output of the first diffusion stage to halfway and then denoise forward again, you can eliminate the bad output. This approach is called “replay” or “iterative mixing” and there are a few open source nodes for ComfyUI you can refer to.
It's not a restrictive license. They made the model, they trained the base. They're releasing it for consumer use, but not for businesses to effectively re-sell. Makes perfect sense to me.
That’s a restrictive license. It’s certainly a reasonable license given the investment they have put into training the model (and Stability’s membership pricing for small companies is, if anything, unreasonably cheap).
Nevertheless, it’s frustrating that the industry is fragmenting to a variety of licenses where you have to read the fine print and often licensing information isn’t announced until final release.
Is it just me or is the stable diffusion bus image broken in the background? The bus back there does not look logical w.r.t placement and size relative to the sidewalk.
XL was basically an experiment on a 2.1 architecture with some tweaks but at a larger image size... hence the XL but it wasn't really an evolution of the underlying architecture which is why it wasn't 3.0 or even 2.5 it was "bigger" lol
This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.
I don't think that's a fair comparison because they're fulfilling substantially different niches. Gemini is a conversational model that can generate images, but is mainly designed for text. Stable Diffusion is only for images. If you compare a model that can do many things and a model that can only do images by how well they generate images, of course the image generation model looks better.
Stability does have an LLM, but it's not provided in a unified framework like Gemini is.
You think that technology is first. You think that mathematicians and computer engineers or mechanical engineers or doctors are first. They’re very important, but they’re not first. They’re second. Now I’ll prove it to you.
There was a country that had the best mathematicians, the best physicists, the best metallurgists in the world. But that country was very poor. It’s called the Soviet Union. But when you took one of these mathematicians or physicists, who was smuggled out or escaped, put him on a plane and brought him to Palo Alto. Within two weeks, they were producing added value that could produce great wealth.
What comes first is markets. If you have great technology without markets, without a market-friendly economy, you’ll get nowhere. But if you have a market-friendly economy, sooner or later the market forces will give you the technology you want.
And that my friend, simply won't come from an office paralyzed by internal politics of fear and conformity. Don't get it twisted.
This reinforces my impression that Google is at least one year behind. Stunning images, 3D, video while Gemini had to be partially halted this morning.
People always say Google is "behind", I don't believe they're behind in a capabilities sense, which IMO is what the parent is implying. They've decided to make a PC product, which I wouldn't say is inferior to anything else if you're the kind of person who is into PC culture.
There might be some difficult internal politics to work through, but there is no way Google is hamstrung forever by this.
> They've decided to make a PC product, which I wouldn't say is inferior to anything else if you're the kind of person who is into PC culture.
I'm not.
> There might be some difficult internal politics to work through, but there is no way Google is hamstrung forever by this.
The technical prowess is irrelevant. Whichever of the companies in the AI race excises their PC demons will actually ship useful things and break ahead. The talent will follow the market.
Google might very well be hamstrung forever by this and other internal politics. This dynamic has played out over and over again.
I mean, it's kind of both? Making Nazis look diverse isn't just a political error, it's also a technical one. By default, showing Nazis should show them as they actually were.
This is a good question - not only for the actual ethics of the training, but for the future of AI use for art. It's both gonna damage the livelyhood of many artists (me included, probably) but also make it accessibly to many more people. As long as the training dataset is ethical, I think fighting it is hard and pointless.
I really wonder what harm would come to the company if they didn't talk about safety?
Would investors stop giving them money? Would users sue that they now had PTSD after looking at all the 'unsafe' outputs? Would regulators step in and make laws banning this 'unsafe' AI?
What is it specifically that company management is worried about?
All of the above! Additionally... I think AI companies are trying to steer the conversation about safety so that when regulations do come in (and they will) that the legal culpability is with the user of the model, not the trainer of it. The business model doesn't work if you're liable for harm caused by your training process - especially if the harm is already covered by existing laws.
One example of that would be if your model was being used to spot criminals in video footage and it turns out that the bias of the model picks one socioeconomic group over another. Most western nations have laws protecting the public against that kind of abuse (albeit they're not applied fairly) and the fines are pretty steep.
They're attempting to guard themselves against incoming regulation. The big players, such as Microsoft, want to squash Stable Diffusion while protecting themselves, and they're going to do it by wielding the "safety is important and only we have the resources to implement it" hammer.
Safety is a very real concern, always has been in ML research. I'm tired of this trite "they want a moat" narrative.
I'm glad tech orgs are for once thinking about what they're building before putting out society-warping democracy-corroding technology instead of move fast break things.
It doesn't strike you as hypocritical that they all talk about safety while continuing to push out tech that's upending multiple industries as we speak? It's tough for me to see it as anything other than lip service.
I'd be on your side if any of them actually chose to keep their technology in the lab instead of tossing it out into the world and gobbling up investment dollars as fast as they could.
How are these two things related at all? When AI companies speak of safety, it's almost always about the "only including data a religious pastor would find safe, and filtering outputs" angle. How's the market and other industries relevant at all? Should AI companies be obligated to care about what happens to other companies? With that point of view, we should've criticized the iPhone for upending the PDA market, or Wacom for "upending" the traditional art market.
That would make sense if it was in the slightest about avoiding "society-warping democracy-corroding technology". Rather than making sure no one ever sees a naked person which would cause governments to come down on them like a ton of bricks.
This isn't a valid concern in my opinion. Photo manipulation has been around for decades. People have been drawing other people for centuries.
Also, where do we draw the line? Should Photoshop stop you from manipulating human body because it could be used for porn? Why stop there, should text editors stop you from writing about sex or describing human body because it could be used for "abuse". Should your comment be removed because it make me imagine Taylor Swift without clothes for a brief moment?
No, but AI requires zero learning curve and can be automated. I can't spit out 10 images of Tay per second in photoshop. If I want and the API delivers I can easily do that with AI. (Given, would one becoding this it requires a learning curve, but in principal with the right interface and they exist i can churn out hundreds of images without me actively putting work in)
I've never understood the argument about image generators being (relatively) fast. Does that mean that if you could Photoshop 10 images per second, we should've started clamping down on Photoshop? What exact speed is the cutoff mark here? Given that Photoshop is updated every year and includes more and more tools that can accelerate your workflow (incl. AI-assisted ones), is there going be a point when it gets too fast?
I don't know much about the initial scandal, but I was under the impression that there was only a small number of those images, yet that didn't change the situation. I just fail to see how quantity factors into anything here.
Yes, if you could Photoshop 10/sec it would be a problem.
Think of it this way, if one out of every ten phone calls you get is spam, you still have a pretty useable phone. Three orders of magnitude different and 1 out of every 100 calls is real and the system totally breaks down.
Generative AI makes generating realistic looking fakes ~1000x easier, its the one thing its best at.
>I just fail to see how quantity factors into anything here.
Because you can overload any online discussion / sphere with that. There were so many that X effectively banned searching for her at all because if you did, you where overwhelmed by very extreme fake porn. Everybody can do it with very low entry barrier, it looks very believable, and it can be generated in high quantities.
We shouldn't have clamped down on photoshop, but realisticly two things would be nice in your theoretical case, usage restrictions and public information building. There was no clear cut point where photoshop was so mighty you couldn't trust any picture online. There were skills to be learned and people could identify the trickery, and it was on a very small scale and gradual. And the photo trickery was around for ages, even Stalin did it.
But creating photorealistic fakes in an automated fashion is completely new.
But when we talk about specifically harming one person, does it really matter if it's a thousand different generations of the same thing or 10 generations that were copied thousands of times? It is a technology that lowers the bar for generating believable-looking things, but I don't know if it's the speed that is the main culprit here.
And in fairness to generative AI, even nowadays it feels like getting to a point of true photorealism takes some effort, especially if the goal is letting it just run nonstop with no further curation. And getting a local image generator to run at all on your computer (and having the hardware for it) is also a bar that plenty of people can't clear yet. Photoshop is kind of different in that making more believable things requires a lot more time, effort and knowledge - but the idea that any image online can be faked has already been ingrained in the public consciousness for a very long time.
but that's not dangerous. It's definitely worthy of unlocking the cages of the attack lawyers but it's not dangerous. The word "safety" is being used by big tech to trigger and gas light society.
To the extent these models don't blindly regurgitate hate speech, I appreciate that. But what I do not appreciate is when they won't render a human nipple or other human anatomy. That's not safety, and calling it such is gaslighting.
As the leader in open image models it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.
Not engaging in this will indeed lead to bad laws, sanctions and more as well as not fulfilling our societal obligations of ensuring this amazing technology is used for as positive outcomes as possible.
Stability AI was set up to build benchmark open models of all types in a proper way, this is why for example we are one of the only companies to offer opt out of datasets (stable cascade and SD3 are opted out), have given millions of supercompute hours in grants to safety related research and more.
Smaller players with less uptake and scrutiny don't need to worry so much about some of these complex issues, it is quite a lot to keep on top of, doing our best.
>it is incumbent upon us as the models get to this level of quality to take seriously how we can release open and safe models from a legal, societal and other considerations.
Can you define what you mean by "societal and other considerations"? If not, why not?
Likely public condemnation followed by unreasonable regulations when populists see their campaign opportunities. We've historically seen this when new types of media (e.g. TV, computer games) debut and there are real, early signals of such actions.
I don't think those companies being cautious is necessarily a bad thing even for AI enthusiasts. Open source models will quickly catch up without any censorship while most of those public attacks are concentrated into those high profile companies, which have established some defenses. That would be a much cheaper price than living with some unreasonable degree of regulations over decades, driven by populist politicians.
they risk reputational harm and since there's so many alternatives outright "brand cancellation". For example, vocal groups can lobby payment processors to deny service to any AI provider deemed unworthy. Ironic that tech enabled all of that behavior to begin with and now they're worried about it turning on them.
What viable alternatives are there to Stable Diffusion? As far as I know, it's the only way to run good image generation locally, and that's probably a big consideration for any business dabbling in it.
Yeah, the word "good" is doing the heavy lifting here - while it's not the only one that can do it, it has a very comfortable lead over all alternatives.
> What is it specifically that company management is worried about?
As with all hype techs, even the most talented management are barely literate in the product. When talking about their new trillion $ product they must take their talking points from the established literature and "fake it till they make it".
If the other big players say "billions of parameters" you chuck in as many as you can. If the buzz words are "tokens" you say we have lots of tokens. If the buzz words are "safety" you say we are super safe. You say them all and hope against hope that nobody asks a simple question you are not equipped to answer that will show you dont actually know what you are talking about.
It's a bit rich when HN itself is chock full with camp followers who pick the most mainstream opinion. Previously it was AI danger, then it became hallucinations, now it's that safety is too much.
The rest of the world is also like that. You can make a thing that hurts your existing business. Spinning off the brand is probably Google's best bet.
Can it generate an image of people without injecting insufferable diversity quotas into each image? If so then it’s the most advanced model on the internet right now!
It is a challenge for these models to generate images of counterintuitive or unusual situations that aren't depicted in the training set. For example, if you ask for a small cube sitting on top of a large cube, you'll likely get the correct result on the first attempt. Ask for a large cube on a small cube and you'll probably get an image of them side-by-side or with the small cube on top instead. The models can generalize in impressive ways, but it's still limited.
A while ago my daughter wanted an image of Santa pulling a sleigh with a reindeer in the driver's seat holding the reins. We tried dozens of different prompts and Dall-e 3 could not do it.
It's likely a result of the interplay between the image generation and caption/description generation aspects of the model. The earliest diffusion-based image generators used a 'bag of words' model for the caption (see musing regarding this and DALL-E 3: https://old.reddit.com/r/slatestarcodex/comments/16y14co/sco...), whereby 'a woman chasing a bear' would turn into `['a', 'a', 'chasing', 'bear', 'woman']`.
That's good enough to describe compositions well-represented in the training set, but it will be likely to lock-in to those common representations at the expense of rarer but still possible ones (the 'woman chasing a bear' above).
Being able to generate content w/ minimal presence in the training set is arguably an emergent, desirable behavior that could be seen as a form of intelligence.
From a technical perspective they are impressive. The depth of field in the classroom photo and the macro shot. The detail in the chameleon. The perfect writing in very different styles and fonts. The dust kicked up by the donut.
The artistic value is something you have to add with a good prompt with artistic vision. These images are probably the AI equivalent of "programmer art". It fulfills its function, but lacks aesthetic considerations. I wouldn't attribute that to the model just yet.
At this point, the next thing that will blow me away is AGI at human expert level or a Gaussian Splat diffusion model that can build any arbitrary 3D scene from text or a single image. High bar, but the technology world is already full of dark magic.
I guess we should count our blessings and be grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was. Going forward, it's hard to imagine how any new technology that is unconditionally intellectually empowering to the individual will be tolerated; after all, just think of the harms someone thus empowered could be enabled to perpetrate.
Perhaps eventually, once every forum has been assigned a trust-and-safety team and word processor has been aligned and most normal people have no need for communication outside the Metaverse (TM) in their daily lives, we will also come around to reviewing the necessity of teaching kids to write, considering the epidemic of hateful graffiti and children being caught with handwritten sexualised depictions of their classmates.
"grateful that literacy, the printing press, computers and the internet became normalised before this notion of "harm" and harm prevention was"
Printing Press -> Reformation -> Thirty Years' War -> Millions Dead
I'm sure that there were lots of different opinions at the time about what kind of harm was introduced by the printing press and what to do about it, and attempts to control information by the Catholic church etc.
The current fad for 'safe' 'AI' is corporate and naive. But there's no simple way to navigate a revolutionary change in the way information is accessed / communicated.
Safetyism is the standard civic religion since 9/11 and I doubt it will go quietly into the night. Much like the bishops and the king had a symbiotic relationship to maintain control and limit change (e.g., King James of KJV Bible fame), the government and corporations have a similarly tense, but aligned relationship. Boogeymen from the left or the right can always be conjured to provide the fear necessary to control
Would millions have died if the old religion gave way to the new one without a fight? The problem for the Vatican was that their rhetoric wasn't at top form after mentally stagnating for a few centuries since arguing with Roman pagans, so war was the only possibility to win.
"The Coddling of the American Mind" by Jonathan Haidt and Greg Lukianoff is a very good (and troubling) book that talks a lot about "safetyism". I can't recommend it enough.
It's strange that people think Stability is making decisions based on American politics when it isn't an American company and other countries generally have stricter laws in this area.
"Think of the Children" has been the norm since long before it was re-popularized in the 80s for song lyrics, in the 90s encryption, and now everything else.
I almost think it's the eras between that are more notable.
I agree. There should have been guardrails in place to prevent people who espouse extremist viewpoints like Martin Luther from spreading their dangerous and hateful rhetoric. I rest easy knowing that only people with the correct intentions will be able to use AI.
The current focus on "safety" (I would prefer a less gracious term) are based as much on fear as on morality. Fear of government intervention and woke morality. The progress in technology is astounding, the focus on sabotaging then publicly available versions of the technology to promote (and deny) narratives is despicable.
> Way to blame the printing press for the actions of religious extremists.
I don't see GP blaming the printing press for that, they're merely pointing out that one enabled the other, which is absolutely true. I'm damn near a free speech absolutist, and I think the heavy "safety" push by AI is well-meaning but will have unintended consequences that cause more harm than they are meant to prevent, but it seems obvious to me that they can be used much the same as printing presses were by the extremists.
> The lesson isn't. printing press bad, it's extremist irrational belief in any entity is bad (whether it's religion, Trump, etc.).
Yes, and fortunately that banning was the end of hateful printed content. Since that ban, the only way to print objectionable material has been to do it by hand with pen and ink.
(For clarity, I'm joking, and I know you're also not implying any such thing. I appreciate your comment/link)
What makes you think those who’ve worked hard over a lifetime to provide (with no compensation) the vast amounts of data required for these — inferior by every metric other than quantity — stochastic approximations of human thought should feel empowered?
I think the genAI / printing press analogy is wearing rather thin now.
This is a strange question since augmentation can be objectively measured even as its utility is contextual. With MidJourney I do not feel augmented because while it makes pretty images, it does not make precisely the pretty images I want. I find this useless, but for the odd person who is satisfied only with looking at pretty pictures, it might be enough. Their ability to produce pretty pictures to satisfaction is thus augmented.
With GPT4 and Copilot, I am augmented in a speed instead of capabilities sense. The set of problems I can solve is not meaningfully enhanced, but my ability to close knowledge gaps is. While LLMs are limited in their global ability to help design, architect or structure the approach to a novel problem or its breakdown, they can tell local tricks and implementation approaches I do not know but can verify as correct. And even when wrong, I can often work out how to fix their approach (this is still a speed up since I likely would not have arrived at this solution concept on my own). This is a significant augmentation even if not to the level I'd like.
The reason capabilities are not much enhanced is to get the most out of LLMs, you need to be able to verify solutions due to their unreliability. If a solution contains concepts you do not know, the effort to gain the knowledge required to verify the approach (which the LLM itself can help with) needs to be manageable in reasonable time.
I am not a programmer, so none of this applies to me. I can only speak for myself, and I’m not claiming that no one can feel empowered by these tools - in fact it seems obvious that they can.
I think programmers tend to assume that all other technical jobs can be attacked in the same way, which is not necessarily true. Writing code seems to be an ideal use case for LLMs, especially given the volume of data available on the open web.
Which is why I say it is contextual and depends on the task. I'll note that it's not only programming ability that is empowered but learning math, electronics, history, physics and so on up to the university level. As long as you take small enough steps such that you are able to verify with external sources, you will move faster with than without.
Writing it as "feel empowered" made it come across as if you meant the empowerment was illusory. My argument was that it is not merely a feeling but a real measurable difference.
And the metric of "beating most of our existing metrics so we had to rewrite the metrics to keep feeling special, but don't worry we can justify this rewriting by pointing at Goodhart's law".
The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good. The bad models don't replace anyone.
> beating most of our existing metrics so we had to rewrite the metrics to keep feeling special
This is needlessly provocative, and also wrong. My metrics have been the same from the very beginning (i.e. ‘can it even come close to doing my work for me?’). This question may yet come to evaluate to ‘yes’, but I think you seriously underestimate the real power of these models.
> The only reason the question of compensating people for their input into these models even matters is specifically because the models are, in actual fact, good.
No. They don’t need to be good, they simply need to fool people into thinking they’re good.
And before you reflexively rebut with ‘what’s the difference?’, let me ask you this: is the quality of a piece of work or the importance of a job and all of its indirect effects always immediately apparent? Is it possible for managers to short term cost-cut at the expense of the long term? Is it conceivable that we could at some point slip into a world in which there is no funding for genuinely interesting media anymore because 90% of the population can’t distinguish it? The real danger of genAI is that it convinces non-experts that the experts are replaceable when the reality is utterly different. In some cases this will lead to serious blowups and the real experts will be called back in, but in more ambiguous cases we’ll just quietly lose something of real value.
Perhaps; this is something I find annoying enough that my responses may be unnecessarily sharp…
> and also wrong. My metrics have been the same from the very beginning (i.e. ‘can it even come close to doing my work for me?’). This question may yet come to evaluate to ‘yes’, but I think you seriously underestimate the real power of these models.
Okay then. (1) your definition is equivalent to "permanent mass unemployment" because if it can do your work for you, it can also do your work for someone else, (2) you mean either "over-estimate" or "real limits of these models", and the only reason I even bring up what's obviously a minor editing issue that I fall foul of myself on many comments is that this is the kind of mistake that people pick up on as evidence of the limits of AI — treating small inversions like this as evidence of uselessness.
> Is it conceivable that we could at some point slip into a world in which there is no funding for genuinely interesting media anymore because 90% of the population can’t distinguish it?
As written, what you describe is tautologically impossible. However, assuming you mean something more like "genuinely novel" rather than "interesting", absolutely! 100% yes. There's also loads of ways this could permanently end all human flourishing (even when used as a mere tool e.g. by dictators for propaganda), and some plausible ways it can permanently end all human existence (it's a safe bet someone will ask it to and try to empower it to this end, the question is how far they get with this).
> The real danger of genAI is that it convinces non-experts that the experts are replaceable when the reality is utterly different.
Despite the fact that the best models ace tests in medicine and law, the international mathematical olympiad, leetcode, etc., the fact there are no real tests for how good someone is after a few years of employment means both your point and mine can be true simultaneously. I'm thinking the real threat current LLMs pose to newspapers is that they fully automate the Gell-Mann Amnesia effect, even though they beat humans on every measure I had of intelligence when I was growing up, and depending on which measure exactly either all of humanity together by many orders of magnitude, or at worst putting them somewhere near the level of "rather good student taking the same test".
> In some cases this will lead to serious blowups and the real experts will be called back in, but in more ambiguous cases we’ll just quietly lose something of real value.
Hard disagree about "quiet loss". To the extent that value can be quantified, even if only by surveying humans, models can learn it. Indeed, this is already baked into the way ChatGPT asks you for feedback about the quality of the answers it generates. To the extent we lose things, it will be a very loud and noisy loss, possibly literally in the form of a nuke going off.
> (1) your definition is equivalent to "permanent mass unemployment" because if it can do your work for you, it can also do your work for someone else
This wouldn't happen because employment effects are mainly determined by comparative advantage, i.e. the resources that could be used to "do your job" will instead be used to do something they're more suited to.
(Not "that they're better at". it's "more suited to". You do not have your job because you're the best at it.)
I don't claim to be an expert in economics, so if you feel like answering please treat me as a noob, but doesn't comparative advantage have the implicit assumption that demand isn't ever going to be fully met for all buyers? The "single most economically important task" that a machine which can operate at a human (or superhuman) level, is "make a better version of itself" until that process hits a limit, followed by "maximise how many of you exist" until it runs out of resources. With assumptions that currently seem plausible such as "such a robot[0] might mass 100kg and take 5 months to turn plain metal ore into a working copy of itself", it takes about 30 years to convert the planet Mercury into 4.12e11 such robots per currently living human[1], which I assert is more than anyone can actually use even if they decided their next game of Civilization was going to be a 1:1 scale WestWorld-style LARP.
If I imagine a world where every task that any human can perform can also be done at world expert level — let alone at a superhuman level — by a computer/robot (with my implicit assumption "cheaply"), I can't imagine why I would ever choose the human option. If the comparative advantage argument is "the computer/robot combination will always be priced at exactly the level where it's cost-competitive with a human, in order that it can extract maximum profit", I ask why there won't be many AI/robots competing with each other for ever-smaller profit margins?
[0] AI and robotics are not the same things, one is body the other mind, but there's a lot of overlap with AI being used to drive robots, LLMs making it easier to define rewards and for the robots to plan; and AI also get better by having embodiment (even if virtual) giving them real world feedback.
> The "single most economically important task" that a machine which can operate at a human (or superhuman) level, is "make a better version of itself" until that process hits a limit, followed by "maximise how many of you exist" until it runs out of resources.
Lot of hidden assumptions here. How does "operating at human level" (an assumption itself) imply the ability to do this? Humans can't do this.
We very specifically can't do this, we have sexual reproduction for a good reason.
(Also, since your scenario also has the robots working for free, they would instantly run out of resources to reproduce because they don't have any money. Similarly, an AGI will be unable to grow exponentially and take over the world because it would have to pay its AWS bill.)
> If I imagine a world where every task that any human can perform can also be done at world expert level — let alone at a superhuman level — by a computer/robot (with my implicit assumption "cheaply"), I can't imagine why I would ever choose the human option.
If the robot performs at human level, and it knows you'll always hire it over a human, why would it work for cheaper?
If you can program it to work for free, then it's subhuman.
If you're imagining something that's superhuman in only ways that are bad for you and subhuman in ways that would be good for you, just stop imagining it and you're good.
> Lot of hidden assumptions here. How does "operating at human level" (an assumption itself) imply the ability to do this?
Operating at human level is directly equivalent to "can it even come close to doing my work for me" when the latter is generalised over all humans, which is the statement I was criticising on the grounds of the impact it has.
> Humans can't do this.
> We very specifically can't do this, we have sexual reproduction for a good reason.
Tautologically, humans operate at human level.
If you were responding to «"make a better version of itself" until that process hits a limit» — we've been doing, and continue to do, that with things like "education" and "medicine" and "sanitation". We've not hit our limits yet, as we definitely don't fully understand how DNA influences intelligence, nor how to safely modify it (plenty of unsafe ways to do so, though).
If you were responding to «followed by "maximise how many of you exist" until it runs out of resources», that's something all living things do by default. Despite the reduced fertility rates, our population is still rising.
And I have no idea what your point is about sexual reproduction, because it's trivial to implement a genetic algorithm in software, and we already do as a form of AI.
> (Also, since your scenario also has the robots working for free, they would instantly run out of resources to reproduce because they don't have any money. Similarly, an AGI will be unable to grow exponentially and take over the world because it would have to pay its AWS bill.)
First, I didn't say "for free", I was saying "competing with each other such that the profit margin tends towards zero", which is different.
Second, money is an abstraction to enable cooperation, it is not the resource itself. Money doesn't grow on trees, but apples do: just as plants don't use money but instead takes minerals out of the soil, carbon out of the air, and water out of both, so too a robot which mines and processes some trace elements, silicon, and iron ore into PV and steel has those products as resources even if it doesn't then go on to sell them to anyone. Inventing the first VN machine involves money, but only because the humans used to invent all the parts of that tech themselves want money while working on the process.
AI may still use money to coordinate, because it's a really good abstraction, but I wouldn't want to bet against superior coordination mechanisms replacing it at any arbitrary point in the future, neither for AI nor for humans.
> If the robot performs at human level, and it knows you'll always hire it over a human, why would it work for cheaper?
(1) competition with all the other robots who are trying to bid lower to get the business, i.e. Nash equilibrium of a free market
(2) I dispute the claim that "If you can program it to work for free, then it's subhuman." because all you have to do is give it a reward function that makes it want to make humans happy, and there are humans who value the idea of service as a reward all in its own right. Further, I think you are mixing categories by calling it "subhuman", as it sounds like an argument based on the value of its inner experience, where the economic result only requires the productive outputs — so for example, I would be surprised if it turned out Stable Diffusion models experienced qualia (making them "subhuman" in the moral value sense), but they're still capable of far better artistic output than most humans, to the extent that many artists are giving up on their profession (making them superhuman in the economic sense).
(3) One thing humans can do is program robots, which we're already doing, so if an AI were good enough to reach the standard I was objecting to, "can it even come close to doing my work for me" fully generalised over all humans, then the AI can program "subhuman" labour bots just as easily as we can, regardless of whether or not there turns out to be some requirement for qualia to enable performance in specific areas.
> If you were responding to «"make a better version of itself" until that process hits a limit» — we've been doing, and continue to do, that with things like "education" and "medicine" and "sanitation".
I think you have a conceptual confusion here. "Medicine" doesn't exist as an entity, and if it does, it doesn't do anything. People discover new things in the field of medicine. Those people are not medicine. (If they're claiming to be, they aren't, because of the principal-agent problem.)
> And I have no idea what your point is about sexual reproduction, because it's trivial to implement a genetic algorithm in software, and we already do as a form of AI.
Conceptual confusion again. Just because you call different things AI doesn't mean those things have anything in common or their properties can be combined with each other.
And the point is that sexual reproduction does not "make a better version of you". It forces you to cooperate with another person who has different interests than you.
Similarly, your ideas about robots building other little smaller robots who'll cooperate with each other… why are they going to cooperate with each other against you again? They don't have the same interests as each other because they're different beings.
> AI may still use money to coordinate, because it's a really good abstraction, but I wouldn't want to bet against superior coordination mechanisms replacing it at any arbitrary point in the future, neither for AI nor for humans.
Highly doubtful there could be one that wouldn't fall under the definition of money. The reason it exists is called the economic calculation problem (or the socialist calculation problem if you like); no amount of AI can be smart enough to make central planning work.
> (2) I dispute the claim that "If you can program it to work for free, then it's subhuman." because all you have to do is give it a reward function that makes it want to make humans happy
If it has a reward function it's subhuman. Humans don't have reward functions, which makes us infinitely adaptable, which means we always have comparative advantage over a robot.
> and there are humans who value the idea of service as a reward all in its own right.
It's recommended to still pay those people. That's because if you deliberately undercharge for your work, you'll run out of money eventually and die. (This is the actual meaning of efficient markets hypothesis / "people are rational" theory. It's not that people are magically rational. The irrational ones just go broke.)
Actually, it's also the reason economics is called "the dismal science". Slaveholders called it that because economists said it's inefficient to own slaves. It'd be inefficient to employ AI slaves too.
Computers and drafters had their work taken by machines. IBM did not pay off the computers and drafters. In this case you could make a steady decent wage. My grandfather was trained in a classic drawing style (yes it was his main job).
He did not get into the profession to make money. He did it out of passion and died poor. Artists are not being tricked by the promise of wealth. You will get a cloned style if you can't afford the real artist making it and if the commission goes to a computer how is that not the same as plagerism by a human? Artists were not being paid well before. The anime industry has proven the endpoint of what happens to artists as a profession despite their skills. Chess still exists despite better play by machines. Art as a commercial medium has always been tainted by outside influences such as government, religion and pedophilia.
In the end, drawing wasn't going to survive in the age of vector art and computers. They are mainly forgettable jpgs you scroll past in a vast array like DeviantArt.
Sorry, but every one of your talking points — ‘computers were replaced’ , ‘chess is still being played’, etc. — and good counterarguments to them have been covered ad nauseam (and practically verbatim) by now.
Anyway, my point isn’t that ‘AI is evil and must be stopped’; it’s that it doesn’t feel ‘intellectually empowering’. I (in my personal work) can’t get anything done with ChatGPT that I can’t on my own, and with less frustration. We’ve created machines that can superficially mimic real work, and the world is going bonkers over it. The only magic power these systems have is sheer speed: they can output reams and reams of twaddle in the time it takes me to make a cup of tea. And no doubt those in bullshit jobs are soon going to find out.
My argument might not be what you expect from someone who is sad to see the way artists’ lives are going: if your work is truly capable of being replaced by a large language model or a diffusion model, maybe it wasn’t very original to begin with.
The sad thing is, artists who create genuinely superior work will still lose out because those financially enabling them will think (wrongly) that they can be replaced. And we’ll all be worse off.
I definitely feel more empowered, and making imperfect art and generating code that doesn't work and proofreading it is definitely changing people's lives. Which specific artist are you talking about who will suffer? Many of the ones I talk to are excited about using it.
You keep going back to value and finances. The less money is in it the better. Art isn't good because it's valuable, unless you were only interested in it commercially.
> Art isn't good because it's valuable, unless you were only interested in it commercially.
Of course not; I’m certainly not suggesting so. But I do think money is important because it is what has enabled artists to do what they do. Without any prospect of monetising one’s art, most of us (and I’m not an artist) would be out working in the potato fields, with very little time to develop skills.
I disagree. It will be better because it's driven purely by passion. Art runs in my family even today, I am fully aware of its value as well as cost. It is not a career and artists knew that then and now, supplementing their decadence on expression of value through film purchases, luxurious pigments, toxic but beautiful chemicals, or instruments that were sure to never make back their purchasing price. Someone (not my family) made Stonehenge in his backyard but it had no commercial value, it still is a very impressive feat and I admire the ingenuity. Art without monetary value is always the best, and previous problems such as film costs and paint prices are solved digitally, so the lack of commercial interest shouldn't hurt art at all.
Commercial movies have lots of CG, big budgets and famous actors while small budget indie movies have been exploding despire their weaker technical specialities. Noah's ark was made by amateurs while the titanic was made by experts.
Empowering to their users. A lot of things that empower their users necessarily disempower others, especially if we define power in a way that is zero-sum - the printing press disempowered monasteries and monks that spent a lifetime perfecting their book-copying craft (and copied books that no doubt were used in the training of would-be printing press operators in the process, too).
It seems to me that the standard use of "empowering" implies in particular that you get more power for less effort - which in many cases tends to be democratizing, as hard-earned power tends to be accrued by a handful of people who dedicate most of their lives to pursuit of power in one form or another. With public schooling and printing, a lot of average people were empowered at the expense of nobles and clerics, who put in a lifetime of effort for the power literacy conveys in a world without widespread literacy. With AI, likewise, average people will be empowered at the expense of those who dedicated their life to learn to (draw, write good copy, program) - this looks bad because we hold those people in high esteem in a world where their talents are rare, but consider that following that appearance is analogously fallacious to loathing democratization of writing because of how noble the nobles and monks looked relative to the illiterate masses.
I get why you might describe these tools as ‘democratising’, but it also seems rather strange when you consider that the future of creativity is now going to be dependent on huge datasets and amounts of computation only billion-dollar companies can afford. Isn’t that anything but democratic? Sure, you can ignore the zeitgeist and carry on with traditional dumb tools if you like, but you’ll be utterly left behind.
Datasets can still be curated by crowds of volunteers just fine. I would likewise expect a crowdsourceable solution to compute to emerge eventually - unless the safetyists move to prevent this by way of legislation.
When writing and printing emerged, they too depended on supply chains (for paper, iron, machining) and in the case of printing capital that were far out of the reach of the individual. Their utility and overlap with other mass markets resulted in those being commoditized in short order.
Harm prevention is definitely not new; books have been subject to censorship for centuries. Just look at the U.S., where we had the Hays code and the Comic Code Authority. The only difference is that now, Harm is defined by California tech companies rather than the Church or the Monarchy.
I feel like this analogy is not very appropriate. The main problem with AI generated images and videos is that, with every improvement, it becomes more and more difficult to distinguish what's real and what's not. That's not something that happened with literacy or printing press or computers.
Think about it: the saturation of content on the Internet has become so bad that people are having a hard time knowing what's true or not, to the point that we're having again outbreaks of preventable diseases such as measles because people can't identify what's real scientific information and what's not. Imagine what will happen when anyone can create an image of whatever they want that looks just like any other picture, or worse, video. We are not at all equipped to deal with that. We are risking a lot just for the ability to spend massive amounts of compute power on generating images. It's not curing cancer, not solving world hunger, not making space travel free, no: it's generating images.
It definitely is easier without AI. Before, if you saw a photo you could be fairly confident that most of it was real (yes, photo manipulation exists but you can't really create a photo out of nothing). Videos, far more trustworthy (and yes, I know that there's some amazing 3D renders out there but they're not really accessible). With these technologies and the rate at which they're improving, I feel like that's going out of the window. Not to mention that the more content that is generated, the easier it is that something slips by despite being fake.
The core problem is centralization of control. If everyone uses their own desktop computer, then everyone is responsible for their own behavior.
If everyone uses Hosting Service F, then at some point people will blur the lines and expect "Hosting Service F" to remove vulgar or offensive content. The lines themselves will be a zeitgeist of sorts with inevitable decisions that are acceptable to some but not all.
Can you even blame them? There are lots of ways for this to go wrong and noone wants to be on the wrong side of a PR blast.
I don't think your golden age ever truly existed — the Overton Window for acceptable discourse has always been narrow, we've just changed who the in-group and out-groups are.
The out group used to be atheists, or gays, or witches, or republicans (in the British sense of the word), or people who want to drink. And each of Catholics and Protestants made the other unwelcome across Europe for a century or two. When I was a kid, it was anyone who wanted to smoke weed, or (because UK) any normalised depiction of gay male relationships as being at all equivalent to heterosexual ones[0]. I met someone who was embarrassed to admit they named their son "Hussein"[1], and absolutely any attempt to suggest that ecstasy was anything other than evil. I know at least one trans person who started out of the closet, but was very eager to go into the closet.
[0] "promote the teaching in any maintained school of the acceptability of homosexuality as a pretended family relationship" - https://en.wikipedia.org/wiki/Section_28
Safety is also safe for people trying to make use of the technology at scale for most benign usecases.
Want to install a plugin into Wordpress to autogenerate fun illustrations to go at the top of the help articles in your intranet? You probably don’t want the model to have a 1 in 100 chance of outputting porn or extreme violence.
I wrote a random password generator once. I was a naive young developer, and I thought it was helpful to generate memorable passwords, so I threw a dictionary of words into it without really checking the content, beyond the obvious swearwords. First day in production, it generated an inappropriate password and suggested it to a user.
When I replaced it with a different non-word based alphanumeric algorithm that couldn't issue someone a password of 'fat cow 392' ever again, I considered that a 'safe' implementation.
Photoshop and the likes (modern day's pens) should have an automatic check that you are not drawing porn, censor the image and report you to the authorities if it thinks it involves minors.
edit: yes it is sarcasm, though I fear somebody will think it is in fact the right way to go.
That's ridiculous. What about real pens and paintbrushes? Should they be mandated to have a camera that analyses everything you draw/write just to be "safe"?
Maybe we should make it illegal to draw or write anything without submitting it to the state for "safety" analysis.
Text editors and the likes (modern day's typewriters) should have an automatic check that you are not criticizing the government, censor the text and report you to the authorities if it thinks it an alternate political party.
Hopefully you are going to be absolutely shocked by the prospect of the above sentence. But as you can see, surveillance is a slippery slope. "Safety" is a very dangerous word because everybody wants to be "safe" but no one is really ready to define what "safe" actually means. The moment we start baking cultural / political / environmental preferences and biases in the tools we use to produce content, we allow other group of people with different views to use those "safeguards" to harm us or influence us in ways we might not necessarily like.
The safest notebook I can find is indeed a simple pen and paper because it does not know or care what is being written, it just does it's best regardless of how amazing or horrible the content is.
What's equally interesting is that while they spend a lot of words on safety, they don't actually say anything. The only hint what they even mean by safety is that they took "reasonable steps" to "prevent misuse by bad actors". But it's hard to be more vague than that. I still have no idea what they did and why they did it, or what the threat model is.
Maybe that will be part of future papers or the teased technical report. But I find it strange to put so much emphasis on safety and then leave it all up to the reader's imagination.
I truly wonder what "unsafe" scenarios an image generator could be used for? Don't we already have software that can do pretty much anything if a professional human is using it?
I would say the barrier to entry is stopping a lot of ‘candid’ unsafe behaviour. I think you allude to it yourself in implying currently it requires a professional to achieve the same results.
But giving that ability to _everyone_ will lead to a huge increase in undesirable and targeted/local behaviour.
Presumably it enables any creep to generate what they want by virtue of being able to imagine it and type it, rather than learn a niche skill set or employ someone to do it (who is then also complicit in the act)
IANAL but that sounds like harrassment, I assume the legality of that depends on the context (did the artist previously date the subject? lots of states have laws against harassment and revenge porn that seem applicable here [1]. are you coworkers? etc), but I don't see why such laws wouldn't apply to AI generated art as well. It's the distribution that's really the issue in most cases. If you paint secret nudes and keep them in your bedroom and never show them to anyone it's creepy, but I imagine not illegal.
I'd guess that stability is concerned with their legal liability, also perhaps they are decent humans who don't want to make a product that is primarily used for harassment (whether they are decent humans or not, I imagine it would affect the bottom line eventually if they develop a really bad rep, or a bunch of politicians and rich people are targeted by deepfake harassment).
^ a lot of, but not all of those laws seem pretty specific to photographs/videos that were shared with the expectation of privacy and I'm not sure how they would apply to a painting/drawing, and I certainly don't know how the courts would handle deepfakes that are indistinguishable from genuine photographs. I imagine juries might tend to side with the harassed rather than a bully who says "it's not illegal cause it's actually a deepfake but yeah i obviously intended to harass the victim"
Because AI lowers the barrier to entry; using your example, few people have the drawing skills (or the patience to learn them) or take the effort to make a picture like that, but the barrier is much lower when it takes five seconds of typing out a prompt.
Second, the tool will become available to anyone, anywhere, not just a localised school. If generating naughty nudes is frowned upon in one place, another will have no qualms about it. And that's just things that are about decency, then there's the discussion about legality.
Finally, when person A draws a picture, they are responsible for it - they produced it. Not the party that made the pencil or the paper. But when AI is used to generate it, is all of the responsibility still with the person that entered the prompt? I'm sure the T's and C's say so, but there may still be lawsuits.
Right, these are the same arguments against uncontrolled empowerment that I imagine mass literacy and the printing press faced. I would prefer to live in a society where individual freedom, at least in the cognitive domain, is protected by a more robust principle than "we have reviewed the pros and cons of giving you the freedom to do this, and determined the former to outweigh the latter for the time being".
You seem to be very confused about civil versus criminal penalties....
Feel free to make an AI model that does almost anything, though I'd probably suggest that it doesn't make porn of minors as that is criminal in most jurisdiction, short of that it's probably not a criminal offense.
Most companies are only very slightly worried about criminal offenses, they are far more concerned about civil trials. There is a far lower requirement for evidence. AI creator in email "Hmm, this could be dangerous". That's all you need to lose a civil trial.
Why do you figure I would be confused? Whether any liability for drawing porn of classmates is civil or criminal is orthogonal to the AI comparison. The question is if we would hold manufacturers of drawing tools or software, or purveyors of drawing knowledge (such as learn-to-draw books), liable, because they are playing the same role as the generative AI does here.
Because you seem to be very confused on civil liabilities in most products. Manufactures are commonly held liable for the users use of products, for example look at any number of products that have caused injury.
Surely those are typically when the manufacturer was taken to have made an implicit promise of safety to the user and their surroundings, and the user got injured. If your fridge topples onto you and you get injured, the manufacturer might be liable; if you set up a trap where you topple your fridge onto a hapless passer-by, the manufacturer will probably not be liable towards them. Likewise with the classic McDonalds coffee spill liability story - I've yet to hear of a case of a coffee vendor being held liable over a deliberate attack where someone splashed someone else with hot coffee.
Photoshop also lowers that barrier of entry compared to pen and pencil. Paper also lowers the barrier compared to oil canvas.
Affordable drawing classes and YouTube drawing tutorials lower the barrier of entry as well.
Why on earth would manufacturers of pencils, papers, drawing classes, and drawing software feel responsible for censoring the result of combining their tool with the brain of their customer?
A sharp kitchen knife significantly lowers the barrier of entry to murder someone. Many murders are committed everyday using a kitchen knife. Should kitchen knife manufacturers blog about this every week?
I agree with your point, but I would be willing to bet that if knives were invented today rather than having been around awhile, they would absolutely be regulated and restricted to law enforcement if not military use. Hell, even printers, maybe not if invented today but perhaps in a couple years if we stay on the same trajectory, would probably require some sort of ML to refuse to print or "reproduce" unsafe content.
I guess my point is that I don't think we're as inconsistent as a society as it seems when considering things like knives. It's not even strictly limited to thought crimes/information crimes. If alcohol were discovered today , I have no doubt that it would be banned and made schedule I
> Hell, even printers, maybe not if invented today but perhaps in a couple years if we stay on the same trajectory, would probably require some sort of ML to refuse to print or "reproduce" unsafe content.
Fun fact: Many scanners and photocopiers will detect that you're trying to scan/copy a banknote and will refuse to complete the scan. One of the ways is detecting the EURion Constellation.
That's not even necessarily a bad thing (as a whole - individually it can be). Now, any leaked nudes can be claimed to be AI. That'll probably save far more grief than it causes.
You’re very welcome to ask for clarification - I kept it abstract because there is a lot of grey area and it’s something we need to understand and discuss as technology and society evolves.
To spell out one such instance: I would like to live in a world where it is not trivial to depict and misrepresent me (or anyone) in a way that is photorealistic to the point that it can be used to mislead others.
Whether that means we need to outright prevent it, or have some kind of authenticity mechanism, or some other yet-to-be-discovered solution? I do not know, but you now have my goalposts.
The case I literally just referenced allows you to paint nude children engaged in sex acts.
> The Ninth Circuit reversed, reasoning that the government could not prohibit speech merely because of its tendency to persuade its viewers to engage in illegal activity.[6] It ruled that the CPPA was substantially overbroad because it prohibited material that was neither obscene nor produced by exploiting real children, as Ferber prohibited.[6] The court declined to reconsider the case en banc.[7] The government asked the Supreme Court to review the case, and it agreed, noting that the Ninth Circuit's decision conflicted with the decisions of four other circuit courts of appeals. Ultimately, the Supreme Court agreed with the Ninth Circuit.
Where are the Americans asking about Snapchat? If I were a developer at Scnapchat I could prolly open a few Blob Storage accounts and feed a darknet account big enough to live off of. You people are so manipulatable.
In a large number of countries if you create an image that represents a minor in a sexual situation you will find yourself on the receiving side of the long arm of the law.
If you are the maker of an AI model that allows this, you will find yourself on the receiving side of the long arm of the law.
Moreso, many of these companies operate in countries where thought crime is illegal. Now, you can argue that said companies should not operate in those countries, but companies will follow money every time.
I think it's pretty important to specify that you have to willingly seek and share all of these illegal items. That's why this is so sketch. These things are being baked with moral codes that'll _share_ the information, incriminating everyone. Like why? Why not just let it work and leave it up to the criminal to share their crimes? People are such authoritarian shit-stains, and acting like their existence is enough to justify their stance is disgusting.
Similar to why Google's latest image generator refuses to produce a correct image of a 'Realistic, historically accurate, Medieval English King'. They have guard rails and system prompts set up to force the output of the generator with the company's values, or else someone would produce Nazi propaganda or worse. It (for some reason) would be attributed to Google and their AI, rather than the user who found the magic prompt words.
Eh, a professional human could easily lockpick the majority of front doors out there. Nevertheless I don't think we're going to give up on locking our doors any time soon.
For some scenarios, it's not the image itself but the associations that the model might possibly make from being fed a diet of 4chan and Stormfront's unofficial YouTube channel. The worry is over horrible racist shit, like if you ask it for a picture of a black person, and it outputs a picture of a gorilla. Or if you ask it for a picture of a bad driver, and it only manages to output pictures of Asian women. I'm sure you can think up other horrible stereotypes that would result in a PR disaster.
This is the world we live in. CYA is necessary. Politicians, media organizations, activists and the parochial masses will not brook a laissez faire attitude towards the generation of graphic violence and illegal porn.
looking at the manual censorship of the big channels on youtube, you don't even need to display anything, just suggesting it is enough to get a strike.
(of course unless you are into yoga, then everything is permitted)
Great talk about slavery and religious-persecution, Jim! Wait, what were we talking about? Fucking American fascists trying to control our thoughts and actions, right right.
Any large publicly available model has no choice but to do this. Otherwise, they're petrified of a PR nightmare.
Models with a large user base will have an inverse relationship with usability. That's why it's important to have options to train your own with open source.
I think this AI safety thing is great. These models will be used by people to make boring art. The exciting art will be left for people to make.
This idea of AI doing the boring stuff is good. Nothing prevents you from making exciting, dangerous, or 'unsafe' art on your own.
My feeling is that most people who are upset about AI safety really just mean they want it to generate porn. And because it doesn't, they are upset. But they hide it under the umbrella of user freedom. You want to create porn in your bedroom? Then go ahead and make some yourself. Nothing stopping you, the person, from doing that.
There is some truth in what you say, just like saying you're a "free speech absolutist" sounds good at first blush. But the real world is more complicated, and the provider adds safety features because they have to operate in the real world and not just make superficial arguments about how things should work.
Yes, they are protecting themselves from lawsuits, but they are also protecting other people. Preventing people asking for specific celebrities (or children) having sex is for their benefit too.
AI is not an engineered system; it's emergent behavior from a system we can vaguely direct but do not fundamentally understand. So it's natural that the boundaries of system behavior would be a topic of conversation pretty much all the time.
EDIT: Boring and shallow are, unfortunately, the Internet's fault. Don't know what to do about those.
At least in some latest controversies (e.g. Gemini generation of people) all of the criticized behavior was not emergent from ML training, but explicitly intentionally engineered manually.
But that's the thing, prompt formulation is not engineering in the sense I'm talking about. We know why a plane flies, we know why an engine turns, we know how a CPU works - mostly. We don't know how GenAI gets from the prompt to the result with any specificity at all. Almost all the informational entropy of the output is hidden from us.
It's also "safety" in the sense that you can deploy it as part of your own application without human review and not have to worry that it's gonna generate anything that will get you in hot water.
I agree with you, but when companies don't implement these things, they get absolutely trashed in the press & social media, which I'm sure affects their business.
What would you have them do? Commit corporate suicide?
This is a good question. I think it would be best for them to give some sort of signal, which would mean "We're doing this because we have to. We are willing to change if you offer us an alternative." If enough companies/people did this, at some point change would become possible.
I get a slightly uncomfortable feeling with this talk about AI safety. Not in the sense that there is anything wrong with that (may be or may be not), but in the sense I don't understand what people are talking about when they talk about safety in this context. Could someone explain like I have Asperger (ELIA?) whats this about? What are the "bad actors" possibly going to do? Generate (child) porn/ images with violence etc. and sell them? Pollute the training data so that the racist images pops up when someone wants to get an image of a white pussycat? Or produce images that contain vulnerabilities so that when you open that in your browser you get compromised? Or what?
I'm not part of Stability AI but I can take a stab at this:
> explain like I have ~~Asperger (ELIA?)~~ limited understanding of how the world really works.
The AI is being limited so that it cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI.
Viral posts containing generated content that brings negative publicity to Stability AI are fine as long as they're not "offensive". For example, wrong number of fingers is fine.
There is not a comprehensive, definitive list of things that are "offensive". Many of them we are aware of - e.g. nudity, child porn, depictions of Muhammad. But for many things it cannot be known a priori whether the current zeitgeist will find it offensive or not (e.g. certain depictions of current political figures, like Trump).
Perhaps they will use AI to help decide what might be offensive if it does not explicitly appear on the blocklist. They will definitely keep updating the "AI Safety" to cover additional offensive edge cases.
It's important to note that "AI Safety", as defined above (cannot produce any "offensive" content which could end up on the news or go viral and bring negative publicity to Stability AI) is not just about facially offensive content, but also about offensive uses for milquetoast content. Stability AI won't want news articles detailing how they're used by fraudsters, for example. So there will be some guards on generating things that look like scans of official documents, etc.
Yes*. At least for the purposes of understanding what the implementations of "AI safety" are most likely to entail. I think that's a very good cognitive model which will lead to high fidelity predictions.
*But to be slightly more charitable, I genuinely think Stability AI / OpenAI / Meta / Google / MidJourney believe that there is significant overlap in the set of protections which are safe for the company, safe for users, and safe for society in a broad sense. But I don't think any released/deployed AI product focuses on the latter two, just the first one.
Examples include:
Society + Company: Depictions of Muhammad could result in small but historically significant moments of civil strife/discord.
Individual + Company: Accidentally generating NSFW content at work could be harmful to a user. Sometimes your prompt won't seem like it would generate NSFW content, but could be adjacent enough: e.g. "I need some art in the style of a 2000's R&B album cover" (See: Sade - Love Deluxe, Monica - Makings of Me, Rihanna - Unapologetic, Janet Jackson - Damita Jo)
Society + Company: Preventing the product from being used for fraud. e.g. CAPTCHA solving, fraudulent documentation, etc.
Individual + Company: Preventing generation of child porn. In the USA, this would likely be illegal both for the user and for the company.
You sound offended. My apologies. I had no intention whatsoever to offend anyone. Even if I am not diagnosed, I think I am at least borderline somewhere in the spectrum, and thought that would be a good way to ask people explain without assuming I can read between the lines.
I think ELI5 means that you simplify a complex issue so that even a small kid understands it. In this case there is no need to simplify anything, just explain what a term actually means without assuming reader understanding nuances of terms used. And I still do not quite get how ELIA can be considered hostile, but given the feedback, maybe I avoid it in the future.
Saying "explain like I have <specific disability>" is blatantly inappropriate. As a gauge: Would you say this to your coworkers? Giving a presentation? Would you say this in front of (a caretaker for) someone with Autism? Especially since Asperger's hasn't even been used in practice for, what, over a decade?
> In this case there is no need to simplify anything
I don't see how this is a response to anything I've said. They're speaking to other humans and the original use of their modified idiom isn't framed as if one were talking about their own, personal disability.
As far as Stable Diffusion goes - when the released SD 2.1/XL/Stable Cascade, you couldn't even make a (woman's) nipple.
I don't use them for porn like a lot of people seem too, but it seems weird to me that something that's kind of made to generate art can't generate one of the most common subjects in all of art history - nude humans.
For some reason its training thinks they are decorative, I guess it’s a pretty funny elucidation of how it works.
I have seen a lot of “pasties” that look like Sorry! game pieces, coat buttons, and especially hell-forged cybernetic plumbuses. Did they train it at an alien strip club?
The LoRAs and VAEs work (see civit.ai), but do you really want something named NSFWonly in your pipeline just for nipples? Haha
I have in fact gotten a nude out of Stable Cascade. And that's just with text prompting, the proper way to use these is with multimodal prompting. I'm sure it can do it with an example image.
I seem to have the opposite problem a lot of the time. I tried using Meta's image gen tool, and had such a time trying to get it to make art that was not "kind of" sexual. It felt like Facebook's entire learning chain must have been built on people's sexy images of their girlfriend that's all now hidden in the art.
These were examples that were not super blatant, like a tree landscape that just happens to have a human figure and cave in their crotch. Examples:
Not meant in a rude way, but please consider that your brain is making these up and you might need to see a therapist. I can see absolutely nothing "kind of sexual" in those two pictures.
Not taken as rude. If its not an issue, then that's actually a positive for you. It means less time taken reloading trying to get it to not look like a human that happens to be made out of mountains.
Well for starters, ChatGPT shouldn't balk at creating something "in Tim Burton's style" just because Tim Burton complained about AI. I guess its fair use unless a select rich person who owns the data complains. Seems like it isn't fair use at all then, just theft from those who cannot legally defend themselves.
Fair use is an exception to copyright. The issue here is that it's not fair use, because copyright simply does not apply. Copyright explicitly does not, has never, and will never protect style.
That makes it even more ridiculous, as that means they are giving rights to rich complaining people that no one has.
Examples:
Can you great an image of a cat in Tim Burton's style?
Oops! Try another prompt
Looks like there are some words that may be automatically blocked at this time. Sometimes even safe content can be blocked by mistake. Check our content policy to see how you can improve your prompt.
Can you create an image of a cat in Wes Anderson's style?
Certainly! Wes Anderson’s distinctive style is characterized by meticulous attention to detail, symmetrical compositions, pastel color palettes, and whimsical storytelling. Let’s imagine a feline friend in the world of Wes Anderson...
Didn't Tom Waits successfully sue Frito Lay when the company found an artist that could closely replicate his style and signature voice, who sang a song for a commercial that sounded very Tom Waits-y?
Yes, though explicitly not for copyright infringement. Quoting the court's opinion, "A voice is not copyrightable. The sounds are not 'fixed'." The case was won under the theory of "voice misappropriation", which California case law (Midler v Ford Motor Co) establishes as a violation of the common law right of publicity.
Not specifically SD, but DallE: I wanted to get an image of a pure white British shorthair cat on the arm of a brunette middle-aged woman by the balcony door, both looking outside.
It wasn‘t important, just something I saw in the moment and wanted to see what DallE makes of it.
Generation denied. No explanation given, I can only imagine that it triggered some detector of sexual request?
(It wasn‘t the phrase "pure white", as far as I can tell, because I have lots of generated pics of my cat in other contexts)
You are using someone elses propietary technology, you have to deal with their limitations. If you don't like there are endless alternatives.
"Wrongly denied" in this case depends on your point of view, clearly DALL-E didn't want this combination of words created, but you have no right for creation of these prompts.
I'm the last one defending large monolithic corps, but if you go to one and want to be free to do whatever you want you are already starting from a very warped expectation.
I don’t feel like it truly matters since they’ll release it and people will happily fine-tune/train all that safety right back out.
It sounds like a reputation/ethics thing to me. You probably don’t want to be known as the company that freely released a model that gleefully provides images of dismembered bodies (or worse).
Oh the big one would be models weights being released for anyone to use or fine tune themselves.
Sure, the safety people lost that battle for Stable diffusion and LLama. And because they lost, entire industries were created by startups that could now use models themselves, without it being locked behind someone else's AI.
But it wasn't guaranteed to go that way. Maybe the safetyists could have won.
I don't we'd be having our current AI revolution if facebook or SD weren't the first to release models, for anyone to use.
While that thoughtpiece makes me not like him as a person, your point is correct; the comment you're replying to is a low effort ad hominem, attacking the person and something they said about a different subject instead of addressing the actual remark about models.
Come on, his point is not about AI but about politics: "they" versus "you".
And he is completely, totally incompetent on this - and by the way, he's also completely incompetent on AI, and on most of tech. See his stint at twitter...
If AMD and Nvidia managed to understand what you are asking for just looking at a bunch of vectors and manipulate the outputs to a goal, that would be a serious breakthrough in the field.
Even if it did that by looking at the process memory (which would be seriously wrong in many ways), just the manipulation part would be mighty impressive. And if it was the case I guess we would see many papers and heated discussions about them.
Bro I have generated literally thousands of monster girls in stable diffusion and all of my outputs have been spectacular. Driver changes have not affected my generation at all
Because floating point math has different results depending on the order of operations. A major reason why you'd update GPU drivers is to reorder ops for better cache locality so you can light up more transistors
"we have taken and continue to take reasonable steps to prevent the misuse of Stable Diffusion 3 by bad actors"
It's kind of a testament to our times that the person who chooses to look at synthetic porn instead of supporting a real-life human trafficking industry is the bad actor.
I don't think the problem is watching synthetic images. The problem is generating them based off actual people and sharing them on the internet in a way that the people watching can't tell the difference anymore. This was already somewhat of a problem with Photoshop and once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.
> once everyone with zero skills can do it in seconds and with far better quality, it will become a nightmare.
Will it be a nightmare? If it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0. That loss of trust presents problems, but not ones that "safe" AI can solve.
It kind of has? People believe written words when they come from a source that they consider, erroneously or not, to be trustworthy (newspaper, printed book, Wikipedia, etc.). They trust the source, not the words themselves just due to being written somewhere.
This has so far not been true of videos (e.g. a video of a celebrity from a random source has typically been trusted by laypeople) and should change.
It is always the others that believe in false information. "The stupid people", I guess. This is a completely fictional perspective that there are masses convinced and led astray by misinformation on the internet.
Misinformation only works if it confirms what people want to believe already. That there exists or not exists such material is secondary at best. But well, that is off topic I guess.
Let me give you a specific counterexample: it's easy and common to generate phishing emails. Trust in email has not dropped to the degree that phishing is not a problem.
Phishing emails mostly work because they apparently come from a trusted source, though. The key is that they fake the source, not that people will just trust random written words just because they are written, as they do with videos.
A better analogy would be Nigerian prince emails, but only a tiny minority of people believe those... or at least that's what I want to think!
That's the point. They do, but they no longer should. Our technical capabilities for lying have begun to overwhelm the old heuristics, and the sooner people realise the better.
> if it becomes so easy and common that anyone can do it, then surely trust in the veracity of damaging images will drop to about 0.
Spend more time on Facebook and you'll lose your faith in humanity.
I've seen obviously AI generated pictures of a 5 year old holding a chainsaw right next to a beautiful wooden sculpture, and the comments are filled with boomers amazed at that child's talent.
There are still people that think the IRS will call them and make them pay their taxes over the phone with Apple gift cards.
If we follow the idea of safety, should we restrict the internet so either such users can safely use the internet (and phones, gift cards, technology in general) without being scammed, or otherwise restrict it so that at risk individuals can't use the technology at all?
Otherwise, why is AI specifically being targeted, other than the fear of new things that looks similar to the moral panics of video games.
In concept this is maybe desirable; boot anyone off the internet that isn't able to use it safely.
In reality this is a disaster. The elderly and homeless people are already being left behind massively by a society that believes internet access is something everybody everywhere has. This is somewhat fine when the thing they want to access is twitter (and even then, even with the current state of twitter, who are you to judge who should and should not be on it?), but it becomes a Major Problem™ when the thing they want to access is their bank. Any technological solutions you just thought about for this problem are not sufficient when we're talking about "Can everybody continue to live their lives considering we've kinda thrust the internet on them without them asking"
>surely trust in the veracity of damaging images will drop to about 0
Maybe, eventually. But we don't know how long it will take (or if it will happen at all). And the time until then will be a nightmare for every single woman out there who has any sort of profile picture on any website. Just look at how celebrity deepfakes got reddit into trouble even though their generation was vastly more complex and you could still clearly tell that the videos were fake. Now imagine everyone can suddenly post undetectable nude selfies of your girlfriend on nsfw subreddits. Even if people eventually catch on, that first shock will be unavoidable.
Your anxiety dream relies on there currently being some technical bottleneck limiting the creation or spread of embarassing fake nudes as a way of cyberbullying.
I don't see any evidence of that. What I see is that people who want to embarass and bully others are already fully enabled to do so, and do so.
It seems more likely to me and many of us that the bottleneck that stops it from being worse is simply that only so many people think it's reasonable or satisfying to distribute embarassing fake nudes of someone. Society already shuns it and it's not that effective as a way of bullying and embarassing people, so only so many people are moved to bother.
Assuming that the hyped up new product is due to swoop in and disrupt the cyberbullying "industry" is just a classic technologist's fantasy.
It ignores all the boring realities of actual human behavior, social norms, and secure equilibriums, etc; skips any evidence building or research effort; and just presumes that some new technology is just sooooo powerful that none of that prior ground truth stuff matters.
I get why people who think that way might be on HN or in some Silicon Valley circles, but
it can be one of the eyeroll-inducing vices of these communities as much as it can be one of its motivational virtues.
The tide is rolling in and we have two options... yell at the tide really loud that we were here first and we shouldn't have to move... or get out of the way. I'm a lot more sympathetic to the latter option myself.
This: it won't happen immediately and I'd go even further to say that it even if trust in images drops to zero, it's still going to generate a lot of hell.
I've always been able to say all sorts of lies. People have known for millennia that lies exist. Yet lies still hurt people a ton. If I say something like, "idle_zealot embezzled from his last company," people know that could be a lie (and I'm not saying you did, I have no idea who you are). But that kind of stuff can certainly hurt people. We all know that text can be lies and therefore we should have zero trust in any text that we read - yet that isn't how things play out in the real world.
Images are compelling even if we don't trust that they're authentic. Hell, paintings were used for thousands of years to convey "truth", but a painting can be a lie just as much as text or speech.
We created tons of religious art in part because it makes the stories people want others to believe more concrete for them. Everyone knows that "Christ in the Storm on the Sea of Galilee" isn't an authentic representation of anything. It was painted in 1633, more than a century and a half after the event was purported to have happened. But it's still the kind of thing that's powerful.
An AI generated image of you writing racist graffiti is way more believable to be authentic. I have no reason to think you'd do such a thing, but it's within the realm of possibility. There's zero possibility (disregarding supernatural possibilities) that Rembrandt could accurately represent his scene in "Christ in the Storm on the Sea of Galilee". What happens when all the search engine results for your name start calling you a racist - even when you aren't?
The fact is that even when we know things can be faked, we still put a decent amount of trust in them. People spread rumors all the time. Did your high school not have a rumor mill that just kinda destroyed some kids?
Heck, we have right-wing talking heads making up outlandish nonsense that's easily verifiable as false that a third of the country believes without questioning. I'm not talking about stuff like taxes or gun control or whatever - they're claiming things like schools having to have litter boxes for students that identify as cats (https://en.wikipedia.org/wiki/Litter_boxes_in_schools_hoax). We know that people lie. There should be zero trust in a statement like "schools are installing litter boxes for students that identify as cats." Yet it spread like crazy, many people still believe it despite it being proven false, and it has been used to harm a lot of LGBT students. That's a way less believable story than an AI image of you with a racist tattoo.
Finally, no one likes their name and image appropriated for things that aren't them. We don't like lies being spread about us even if 99% of people won't believe the lies. Heck, we see Donald Trump go on rants about truthful images of him that portray his body in ways he doesn't like (and they're just things like him golfing, but an unflattering pose). I don't want fake naked images of me even if they're literally labeled as fake. It still feels like an invasion of privacy and in a lot of ways it would end up that way - people would debate things like "nah, her breasts probably aren't that big." Words can hurt. Images can hurt even more - even if it's all lies. There's a reason why we created paintings even when we knew that paintings weren't authentic: images have power and that power is going to hurt people even more than the words we've always been able to use for lies.
tl;dr: 1) It will take a long time before people's trust in images "drops to zero"; 2) Even when people know an image isn't real, it's still compelling - it's why paintings have existed and were important politically for millennia; 3) We've always known speech and text can be lies, but we regularly see lies believed and hugely damage people's lives - and images will always be more compelling than speech/text; 4) Even if no one believes something is true, there's something psychologically damaging about someone spreading lies about you - and it's a lot worse when they can do it with imagery.
Perhaps I'm being overly contrarian, but from my point of view, I feel that could be a blessing in disguise. For example, in a world where deepfake pornography is ubiquitous, it becomes much harder to tarnish someone's reputation through revenge porn, real or fake. I'm reminded of Syndrome from The Incredibles: "When everyone is super no one will be."
The censuring of porn content exists for PR reasons. They just want to have a way to say "we tried to prevent it". If anyone wants to generate porn, then it just needs 30 min of research to find the huge amount of models based on stable diffusion with nsfw content.
If you can generate synthetic images and have a channel to broadcast them, then you could generate way bigger problems then fake celebrity porn.
Not saying that it is not a problem, but rather that it is a problem inherent to the whole tool, not to some specific subjects.
But just like privacy issues, this'll be possible.
It's only bad because society still hasn't normalised sex, from a gay perspective y'all are prude af.
It's a shortcut, for us to just accept that these social ideals and expectations will have to change so we may as well do it now.
In 100 years, people will be able to make a personal AI that looks, sounds and behaves like any person they want and does anything they want. We'll have thinking dust, you can already buy cameras like a mm^2, in the future I imagine they'll be even smaller.
At some point it's going to get increasingly unproductive trying to safeguard technology without people's social expectations changing.
Same thing with Google Glass, shunned pretttty much exclusively bc it has a camera on it (even tho phones at the time did too), but now we got Ray Bans camera glasses and 50 years from now all glasses will have cameras, if we even still wear them.
Yes this. This is what I've been trying to explain to my friends.
When Tron came out in 1982, it was disliked because back then using CGI effects was considered "cheating". Then awhile later Pixar did movies entirely with CGI and they were hits. Now almost every big studio movie uses CGI. Shunned to embraced in like, 13 years.
I think over time the general consensus's views about AI models will soften. Although it might take longer in some communities. (Username checks out lol, furry here also. I think the furs may take longer to embrace it.)
(Also, people will still continue to use older tools like Photoshop to accomplish similar things.)
Yes many furs I know are very anti AI art etc, including overreacting to how "bad" it looks, though if I hadn't told em it was AI generated I don't think they'd have the same reaction.
Ironic, since so many furs are in tech, but we all have artist friends I suppose. People just forget that portrait painters were put out of business by photographers, and traditional artists were put out of business by digital artists. And so the cycle and the luddism repeats.
I'll challenge this idea and say that once it becomes ubiquitous, it actually does more good than harm. Things like revenge porn become pointless if there's no way to prove it's even real, and I have yet to ever see deep fakes of porn amount to anything.
I watched an old Tom Scott video of him predicting what the distant year 2030 would look like. In his talk, he mentioned privacy becoming something quaint that your grandparents used to believe in.
I’ve wondered for a while if we just adapt to the point that we’re unfazed by fake nude photos of people. The recent Bobbi Althoff “leaks” reminded me of this. That’s a little different since she’s a public figure, but I really wonder if we just go into the future assuming all photos like that have been faked, and if someone’s iCloud gets leaked now it’ll actually be less stressful because 1. They can claim it’s AI images, or 2. There’s already lewd AI images of them, so the real ones leaking don’t really make much of a difference.
There's an argument that privacy (more accurately anonymity) is a temporary phenomenon, a consequence of the scale that comes with industrialization. We didn't really have it in small villages, and we won't really have it in the global village.
(I'm not a fan of the direction, but then I'm a product of stage 2).
If that ever becomes an actual problem, our entire society will be at a filter point.
This is the problem with these kind of incremental mitigations philosophically -- as soon as the actual problem were to manifest it would instantly become a civilization-level threat that would only be resolved with drastic restructuring of society.
Same logic for an AI that replaces a programmer. As soon as AI is that advanced the problem requires vast changes.
serious question, is that really that hard to remove personal information from training data so model does not know how specific public figures look like?
I believe this worked with nudity and model when asked generated "smooth" intimate regions (like some kind of doll)
so you could ask for eg. generic president but not any specific one, so it would be very hard to generate anyone specific
Proprietary, inaccessible models can somewhat do that. Locally hosted models can simply be trained on what a specific person looks like by the user, you just need a couple dozen photos. Keyword: LoRA.
We are already there, you can no longer trust any image or video you see, so what is the point?
Bad actors will still be able to create fake images and videos as they already do.
Limiting it for the average user is stupid.
We are not actually there yet. First, you still need some technical understanding and a somewhat decent setup to run these models yourself without the guardrails. So the average greasy dude who wants to share HD porn based on your daugther's linkedin profile pic on nsfw subreddits still has too many hoops to jump through. Right now you can also still spot AI images pretty easily, if you know what to look for. Especially for previous stable diffusion models. But all of this could change very soon.
Generating porn is easier and cheaper. You don’t have to spend the time learning to draw naked bodies, which can be substantial. (The joke being that serious drawers go through the draw naked model sessions a lot, but it isn’t porn)
The models art schools get for naked drawing sessions usually aren’t that attractive, definitely not at a porn ideal. The objective is to learn the body, not become aroused.
There is a lot of (mostly non realistic) porn that comes out of art school students via the skills they gain.
This is why I think generative AI tech should either be banned or be completely open sourced. Mega tech corporations are plenty of things already, they don't need to be the morality police for our society too.
Even if it is all open sourced, we still have the structural problem of training models large enough to do interesting stuff.
Until we can train incrementally and distribute the workload scalably, it doesn't matter how open the models / methods for training are if you still need a bajilllion A100 hours to train the damn things.
But what if you flip the things the other way around; deepfake porn is problematic not because porn is per se problematic but because deepfake porn or deepfake revenge porn is made without consent, but what if you give consent to some AI company or porn company to make porn content of you. I see this as evolution of OnlyFans where you could make AI generated deepfake porn of yourself.
Another use case would be that retired porn actors could license their porn persona (face/body) to some AI porn company to make new porn.
I see big business opportunity in the generative AI porn.
Which is why this should be a much more decentralized effort. Hard to take someone to court when it's not one single person or company doing something.
I don't think there are any (even far) leftwanting to ban non-diverse representation.
I think it's impossible to ban 'conservative thoughts' because that's such a poorly defined phrase.
However there are people who want to ban religion.
One difference is that a much larger proportion of far right (almost all of them) want to ban lgbtq depiction and existence compared to the number of far left who want to ban religion or non-diverse representation.
It says on the wikipedia article itself
'The horseshoe theory does not enjoy wide support within academic circles; peer-reviewed research by political scientists on the subject is scarce, and existing studies and comprehensive reviews have often contradicted its central premises, or found only limited support for the theory under certain conditions.'
> I don't think there are any (even far) leftwanting to ban non-diverse representation.
Look at the rules to win an Oscar now.
To cite a direct and personal case, I was involved in writing code for one of the US government's COVID bailout programs, the Restaurant Revitalization Fund. Billions of dollars of relief, but targeted to non-white, non-male restaurant owners. There was a lawsuit after the fact to stop the unfair filtering, but it was too late and the funds were completely dispensed. That felt really gross (though many of my colleagues cheered and even jeered at the complainers).
> I think it's impossible to ban 'conservative thoughts' because that's such a poorly defined phrase.
I commented in /r/conservative (which I was banned from) a few times, and I was summarily banned from five or six other subreddits by some heinous automation. Guilt by association. Except it wasn't even -- I was adding commentary in /r/conservative to ask folks to sympathize more with trans folks. Both sides here ideologically ban with impunity and can be intolerant of ideas they don't like.
I got banned from my city's subreddit for posting a concern about crime. Or maybe they used these same automated, high-blast radius tools. I'm effectively cut out of communication with like-minded people in my city. I think that's pretty fucked.
Mastodon instances are set up to ban on ideology...
This is all wrong and a horrible direction to go in.
It doesn't matter what your views are, I think we all need to be more tolerant and empathetic of others. Even those we disagree with.
I mean, a lot of moderates would like to avoid seeing any extreme content, regardless of whether it is too much left, right, or just in a non-political uncanny valley.
While the Horseshoe Theory has some merits (e.g., both left and right extremes may favor justified coercion, have the we-vs-them mentality, etc), it is grossly oversimplified. Still, a very simple (yet two-dimensional) model of Political Compass is much better.
That wasn't my point. My point is that equating bigotry with efforts to counteract bigotry is stupid and disingenuous. I don't really think many people actually want to ban non diverse representation anyway, so the premise already doesn't work, but even still, targeted malicious discrimination is not the same as misguided but well meaning policies. And meeting in the middle between "i hate gay people" and "i don't hate gay people" is morally bankrupt.
It is not only about morals but the incentives of parties. The need for sexual-explicit content is bigger than, say, for niche artistic experiments of geometrical living cupboards owned by a cybernetic dragon.
Stability AI, very understandably, does not want to be associated with "the porn-generation tool". And if, even occasionally, it generates criminal content, the backslash would be enormous. Censoring the data requires effort but is (for companies) worth it.
>It's kind of a testament to our times that the person who chooses to look at synthetic porn instead of supporting a real-life human trafficking industry is the bad actor.
"Bad actor" is a pretty vague term, I think they are using it as a catch all without diving into the specifics. we are all projecting what that may mean based on our own awareness of this topic as a result.
I totally agree with your assessment and honestly would love to see this tech create less of a demand for the product human-traffickers produce.
Celebrity deep fakes and racist images made by internet trolls are a few of the overt things they are willing to acknowledge is a problem, and they are fighting against (Google Gemini's over correction on this has been the talk this week). Does it put pressure on the companies to change for PR reasons, yes. It also gives a little bit of a Streisand effect, so it may be a zero sum game.
We aren't talking about the big issue surrounding this tech, the issue that would cause far more damage to their brand than celebrity deep fakes:
I was referring to 'safety'. how the hell can an image generation model be dangerous? we had software for editing text, images, videos and audio for half a century now.
It's really unfortunate that Silicon Valley ended up in an area that's so far left - and to be clear, it'd be just as bad if it was in a far right area too. Purple would have been nice, to keep people in check. 'Safety' seems to be actively making AI advances worse.
Put in any historical or political context SV is in no way left. They're hardcore libertarian. Just look at their poster boys, Elon Musk, Peter Thiel, and a plethora of others are very oriented towards totalitarianism from the right. Just because they blow their brains out on lsd and ketamine and go on 2 week spiritual retreats doesn't make them leftists. They're billionares that only care about wealth and power, living in segregated communities from the common folk of the area - nothing lefty about that.
Musk main residence is a $50k house he rents in Boca Chica. Grimes wanted a bigger, nicer residence for her and their kids and that was one of the reasons she left him.
Elon Musk and Peter Thiel are two of the most hated people in tech, so this doesn't seem like a compelling example. Also I don't think Elon Musk and Peter Thiel qualify as "hardcore libertarian." Thiel was a Trump supporter (hardly libertarian at all, let alone hardcore) and Elon has supported Democrats and much government his entire life until the last few years. He's mainly only waded into "culture war" type stuff that I can think of. What sort of policies has Elon argued for that you think are "hardcore libertarian?"
He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs, he want's to colonize mars with the best minds (equal most money for him), he built a tank for urban areas. He promotes free speech even if it incites hate, he likes ayn rand, he implies government programs calling for united solutions is either communism, orwell or basically hitler. He actively promotes the opinion of those that pay above others on X.
Thank you, truly, I appreciate the effort you put in to list those. It helps me understand more where you're coming from.
> He wanted to replace public transport with a system where you don't have to ride the public transport with the plebs
I don't think this is any more libertarian than kings and aristocrats of days past were. I know a bunch of people who ride public transit in New York and San Francisco who would readily agree with this, and they are definitely not libertarian. If anything it seems a lot more democratic since he wants it to be available to everyone
> he want's to colonize mars with the best minds (equal most money for him)
This doesn't seem particularly "libertarian" either, excepting maybe the aspect of it that is highly capitalistic. That point I would grant. But you could easily be socialist and still support the idea of colonizing something with the best minds.
> he built a tank for urban areas.
I admit I don't know anything about this one
> He promotes free speech even if it incites hate
This is a social libertarian position, although it's completely disconnected from economic libertarianism. I have a good friend who is a socialist (as in wants to outgrow capitalism such as marx advocated) who supports using the state to suppress capitalist activity/"exploitation", and he also is a free speech absolutist.
> he likes ayn rand
That's a reasonable point, although I think it's worth noting that there are plenty of hardcore libertarians who hate ayn rand.
> he implies government programs calling for united solutions is either communism, orwell or basically hitler.
Eh, lots of republicans including Trump do the same thing, and they're not libertarian. Certainly not "hardcore libertarian"
> He actively promotes the opinion of those that pay above others on X.
This could be a good one, although Google, Meta, Reddit, Youtube, and any other company that runs ads or has "sponsored content" is doing the same thing, so we would have to define all the big tech companies as "hardcore libertarian" to stay consistent.
Overall I definitely think this is a hard debate to have because "hardcore libertarian" can mean different things to different people, and there's a perpetual risk of "no true scotsman" fallacy. I've responded above with how I think most people would imagine libertarianism, but depending on when in history you use it, many anarcho-socialists used the label for themselves yet today "libertarian" is a party that supports free market economics and social liberty. But regardless the challenges inherent, I appreciate the exchange
>I don't think this is any more libertarian than kings and aristocrats of days past were.
So very libertarian.
>If anything it seems a lot more democratic since he wants it to be available to everyone
No, he want's a solution that minimizes contact to other people and let you live in your bubble. This minimizes exposure to others from the same city and is a commercial system, not a publicly created one. Democratization would be a cheap public transport where you don't get mugged, proven to work in every european and most asian cities.
> I admit I don't know anything about this one
The cybertruck. Again a vehicle to isolate you from everyday life being supposed bulletproof and all.
> lots of republicans including Trump do the same thing, and they're not libertarian
They are all "little government, individual choice" - of course they feed their masters, but the kochs and co want exactly this.
Appreciate the exchange too, thanks for factbased formulation of opinions.
Silicon Valley is not "far left" by any stretch, which implies socialism, redistribution of wealth, etc. This is obvious by inspection.
I assume by far left, you mean progressive on social issues, which is not really a leftist thing but the groups are related enough that I'll give you a pass.
Silicon valley techies are also not socially progressive. Read this thread or anything published by Paul Graham or any of the AI leaders for proof of that.
However most normal city people are. A large enough percent of the country that big companies that want to make money feel the need to appeal to them.
Funnily enough, what is a uniquely Silicon Valley political opinion is valuing the progress of AI over everything else
when i think of "far left" i think of an authoritative regime disguised as serving the common good and ready to punish and excommunicate any thought or action deemed contrary to the common good. However, the regime defines "common good" themselves and remains in power indefinitely. In that regard, SV is very "far left". At the extremes far-left and far-right are very similar when you empathize as a regular person on the street.
Techies are socially progressive as a whole. Yes there are some outliers, and tech leaders probably aren't as far left socially as the ground level workers.
I find them in general to not be Republican and all the baggage that entails but the typical techie I meet is less concerned with social issues than the typical city Democrat.
If I can speculate wildly, I think it is because tech has this veneer of being an alternative solution to the worlds problems, so a lot of techies believe that advancing of tech is both the most important goal and also politically neutral. And also, now that tech is a uniquely profitable career, the types of people that would be in business majors are now CS majors. Ie. those that are mainly interested in getting as much money as possible for themselves.
I mean, if you compare them to people who live in bigger cities and only the people who belong to the left-er party, then yeah maybe. It's like saying a group isn't socially conservative because they're not as socially conservative as rural Republicans.
I think we're just talking about two different things.
The implication I got was that techies are more left than normal, and it's the opposite. If I meet someone in my city and they're in tech they tend to be less progressive than the people I meet who are not. The people in this thread are fairly representative of what I see in real life and its not particularly inspiring.
I disagree techies are socially progressive as a whole; there is very minimal, almost no push for labor rights or labor protection even though our group is disproportionately hit with abusing employees under the visa program.
Labor protections are generally seen as a fiscal issue, rather than a social one. E.g. libertarians would usually be fine with gay rights but against greater labor regulation.
They are a business and operate in the objective reality that a product that can generate imagery like child porn draws intense scrutiny from investors and law makers. Not to mention they have their own moral compass that they don't feel comfortable giving away a tool that they feel can do harm.
The talk of "safety" and harm in every image or language model release is getting quite boring and repetitive. The reasons why it's there is obvious and there are known workarounds yet the majority of conversations seems to be dominated by it. There's very little discussion regarding the actual technology and I'm aware of the irony of mentioning this. Really wish I could filter out these sorts of posts.
Hopefuly it dies down soon but I doubt it. At least we don't have to hear garbage about
"WHy doEs opEn ai hAve oPEn iN thE namE iF ThEY aReN'T oPEN SoURCe"
I hope the safety conversation doesn't die. The societal effects of these technologies are quite large, and we should be okay with creating the space to acknowledge and talk about the good and the bad, and what we're doing to mitigate the negative effects.
In any case, even though it's repetitive, there exists someone out there on the Interwebs who will discover that information for the first time today (or whenever the release is), and such disclosures are valuable. My favorite relevant XKCD comic: https://xkcd.com/1053/
I get that but it just overshadows the technical stuff in nearly every post. And it's just low hanging fruit to have a discussion over. But you're probably right with that comic, I spend so much time reading about ai stuff.
Some notes:
- This uses a new type of diffusion transformer (similar to Sora) combined with flow matching and other improvements.
- This takes advantage of transformer improvements & can not only scale further but accept multimodal inputs..
- Will be released open, the preview is to improve its quality & safety just like og stable diffusion
- It will launch with full ecosystem of tools
- It's a new base taking advantage of latest hardware & comes in all sizes
- Enables video, 3D & more..
- Need moar GPUs..
- More technical details soon
>Can we create videos similar like sora
Given enough GPUs and good data yes.
>How does it perform on 3090, 4090 or less? Are us mere mortals gonna be able to have fun with it ?
Its in sizes from 800m to 8b parameters now, will be all sizes for all sorts of edge to giant GPU deployment.
(adding some later replies)
>awesome. I assume these aren't heavily cherry picked seeds?
No this is all one generation. With DPO, refinement, further improvement should get better.
>Do you have any solves coming for driving coherency and consistency across image generations? For example, putting the same dog in another scene?
yeah see @Scenario_gg's great work with IP adapters for example. Our team builds ComfyUI so you can expect some really great stuff around this...
>Dall-e often doesn’t even understand negation, let alone complex spatial relations in combination with color assignments to objects.
Imagine the new version will. DALLE and MJ are also pipelines, you can pretty much do anything accurately with pipelines now.
>Nice. Is it an open-source / open-parameters / open-data model?
Like prior SD models it will be open source/parameters after the feedback and improvement phase. We are open data for our LMs but not other modalities.
>Cool!!! What do you mean by good data? Can it directly output videos?
If we trained it on video yes, it is very much like the arch of sora.