I think that probably is to be expected, unfortunately. The model is 2GB; per the article iOS will kill apps that use 2GB on 4GB RAM devices (and the 13 Mini is 4GB).
It works just fine on my iPhone 12 Mini, which has (basically) the same RAM, so it seems like it's something else. They're basically the same phone in general though, so I would be surprised if it was a hardware issue.
I did have trouble closing the "adjustments" dialog (upper-right button) due to its close button being underneath the status bar, but found that I could just drag the dialog down to the bottom and it closed.
This is absolutely incredible. It takes about 45 seconds to generate a whole image on my iPhone SE3 - which is about as fast as my M1 Pro macbook was doing it with the original version!
SE 3rd Gen has 4GiB RAM, therefore the app defaults to 384x384 size. This is about 1/2 computation of your normal run (512x512) and the original version uses PLMS sampler, which defaults to 50 steps, while this one uses the newer DPM++ 2M Karras sampler, that defaults to 30 steps. All in all, your M1 Pro MBP is still 4x of your SE 3rd Gen in raw performance (although my implementation should be faster than PyTorch at about 2x on M1 chips)
For what it's worth, you can decrease resolution and use the sampler mentioned on the pytorch versions. The AUTOMATIC web UI supports this, for instance.
I would also welcome the additional optimizations, however.
Extreme respect to the developer for not including the "industry standard" clientside tracking/analytics/phone-home in this app. The fact that this runs locally on-device and doesn't send any information to anyone anywhere about what you're doing on your local device is wonderful.
All apps used to be like this, and now the ones that actually respect user privacy are a rare and glorious exception. Thank you!
It would be awesome if it just quickly took a picture from the front camera for that particular request and then just filter it to finish the "in the style of andy warhol"
I have generated some images, I think it only takes less than 1% of the battery for an image, this is already much better than most of the game(for having fun).
It took my battery from 80 to 77% for one generation on the default settings (384^2, 30 iterations). Less than a minute of compute time to complete a generation.
Iphone battery health reports a battery at 100% health. This is an iphone SE3.
Amazing how huge the difference in energy consumption is for the system in standby vs going full throttle.
EDIT: I generated 3 more images; every subsequent generation reduced battery capacity by another 2%. My phone doesn't seem to heat up at all, interestingly.
Late to the party but depending on your charger voltage, it may be. e.g. I can charge my mbp on a cheap usb-usb-c charger but as soon as I use it _too_ much, it will stall, or worse, lower.
I reached 3Gbps over Verizon 5G in San Antonio last year and this year i get about 4Mbps over Verizon 5G in Ohio. It’s so bad I disabled it. I did read an article that iPhone 12 (which is what I have) have some kind of radio issue with 5G. Can anyone in here confirm?
Verizon has been real wonky all over Austin. If there's more than a handful of people in the area, bandwidth just goes to the crapper. I'll go from a couple hundred mbps on a good day with no clouds/wind/holding my phone just perfectly right in the right space, but usually get less than 1mbps on their 5g UW.
> It took a minute to summon the picture on the latest and greatest iPhone 14 Pro, uses about 2GiB in-app memory, and requires you to download about 2GiB data to get started. Even though the app itself is rock solid, given these requirements, I would probably call it barely usable.
also
> Even if it took a minute to paint one image, now my Camera Roll is filled with drawings from this app. It is an addictive endeavor. More than that, I am getting better at it. If the face is cropped, now I know how to use the inpainting model to fill it in. If the inpainting model doesn’t do its job, you can always use a paint brush to paint it over and do an image-to-image generation again focused in that area.
Seems very worth a try. I'm downloading the model right now, it's going a bit slow, ~2MB/s.
This is super cool. I just tried the default prompt on my iPhone 13 with the image size set to 768x512 and using the 3D Model (Redshift v1) and it just crashed the whole phone and restarted. Just like when I get BSOD's at work on my Windows GPU desktop :)
Porting FlashAttention to Metal will be quite hard. Because for performance reasons, they did a lot of shenanigans to respect the memory hierarchy.
Thankfully, you can probably do something slower but more adapted to your memory constraints.
If you relax this need for performance and allow some re-computations, you can write a qkvatt function which takes q,k,v and a buffer to store the resulting attention, and compute without needing any extra memory.
The algorithm is still quadratic in time with respect to the attention horizon (although with a bigger constant (2x or 3x) due to the re computation). But it doesn't need any extra memory allocation which makes it easy to parallelize.
Alternatively you can use an O(attention horizon * number of thread in parallel) (like flash attention) extra memory buffer to avoid the re-computation.
Concerning the backward pass, that's the same thing, you don't need extra memory if you are willing to do some re-computation, or linear in attention horizon to not do re-computation.
One interesting thing to notice in the backward pass, is that it doesn't use the attn of the forward pass, so it doesn't need to be kept preserved (only need to preserve Q,K,V).
One little caveat of the backward pass (which you only need for training) is that it needs atomic_add to be easy to parallelize. This mean, it will be hard on Metal (afaik they don't have atomics for floats though they do have atomics for integer so you can probably use fixed points numbers).
CPU offloading doesn't work because Apple has shared memory arch already. The head slicing is similar to https://machinelearning.apple.com/research/neural-engine-tra... I think it is quite practical only if MPSGraph less mysterious about its allocation strategy. It is not the ideal way though. Ideally, FlashAttention / XFormer is the way to go.
It works extremely fast on an iPad Pro M1 (kind of expected, but it's _impressive_) although the app is built as "iPhone only", and strangely enough the iPad is cropping the upscaled iPhone app so the lower bar of image controls don't show at all, which is a pity
Yup, done. I thought the author would see it better here (also would make it visible for other people stumbling on the issue) but I have contacted separately explaining the issue.
Memory compression is a generalization of swap, which is only for dynamic memory; files on disk don't need it because you can just read them off the disk.
The problem is that GPUs don't support virtual memory paging, so they can't read files nor decompress nor swap anything unless you write it yourself, which is a lot slower.
Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!
Wait. This comment just blew my mind. Does that imply that you might be able to measure the efficiency of a model by it's compressibility? Note, I'm trying to recognize efficient and accurate are not the same. One could imagine evaluating a model on a 2d performance and compression map somehow.
> Also, ML models (probably) can't be compressed because they already are compressed; learning and compression are the same thing!
I feel like they're kind of two sides of the same coin: learning is about putting more information in the same data, while compression is about putting the same information in less data.
I'm wondering if some lossy floating-point compressor (such as zfp) would work.
> I'm wondering if some lossy floating-point compressor (such as zfp) would work.
Well apparently this can work; StableDiffusion comes as 32-bit and 16-bit float versions. I'm kind of surprised they both work, but that's lossy compression.
Sure, but 16-bit float is pretty primitive compression, as it does not exploit any redundancy in the input. zfp groups numbers together in chunks, which means that correlated numbers can be represented more precisely. Its algorithm is described here: https://zfp.readthedocs.io/en/release1.0.0/algorithm.html#lo...
I would like to see if the zfp can be applied to something like Stable Diffusion (or other ML models) and give better results than regular floats at the same size.
Memory compression? I can't find any good resources to read about it, any hints? I'm having trouble imaging how could it possibly work without totally destroying performance.
It doesn't destroy performance for the simple reason that nowadays memory access is slower than pure compute. If you need to use compute to produce some data to be stored in memory, your overall throughput could very well be faster than without compression.
There have been a large amount of innovation on fast compression and fast decompression in recent years. Traditional compression tools like gzip or xz are geared towards higher compression ratio, but memory compression tends to favor speed. Check out those algorithms:
It is not compressed swap, the compressed data is still in RAM. The OS just compresses inactive memory, with a couple of criteria to define “inactive”.
iOS uses memory compression but not swap. iOS devices actually have special CPU instructions to speed up compression of up to page size increments specifically to aid in this model [1]
It is not as useful for this case (inference) because the activations holds long (UNet holds downsampling passes' activations and use that for upsampling) is not that much of a memory (in the range of a few megabytes). If it is for training, it is probably more useful.
In-memory compression means the memory is inherently dirty memory
On Apple platforms if you mmap a read-only file into the process address space, then it is "clean" memory. It is clean because the kernel can drop it at any time because it already exists on disk. You essentially can offload the memory management to the kernel page cache.
The downside is that if you run up to the limit and the "working set" can't fit entirely in memory, then you run into page faults which incur an I/O cost.
The advantage is that the kernel will drop the page cache before it considers killing your process to reclaim memory.
That said, I don't know the typical access patterns for neural network inference, so I don't know how the page faults would effect performance
This isn't recommended, the decoding takes as much time as processing next step. I learned it the hard way when I tried displaying the intermediate steps for debugging.
yeah, running the full decoder takes a while. though, since the "latent" is just 4 channels and pretty close to representing RGB, you can use a linear combination of latent channels and get a basic (grainy, low-res) preview image like this [0] without much trouble. I expect you could go further, and train a shallow conv-only decoder to get nicer preview results, but I'm not sure if anyone's bothered yet.
I gave this and other available applications a try and I don’t understand what people see in ai image generation.
A simple prompt generated a person with 3 nostrils, 7 squished fingers, deformities everywhere I look, it just mashes a bunch of photographs together and generates abominations.
Pay close attention to generated models and you will find details which are simply wrong.
Early cars were terrible too, but here we are. The promise is that future versions of the technology will be able to draw anatomically correct people and images. A computer program that can do in mere minutes what takes a person hours. If you've never wanted a picture of something you can describe but aren't able to draw in your life, then there is no use case for you. For anyone else that's interacted with the world of art and graphic designers or used stock photos; this goes an order of magnitude faster, and is basically free, compared to hiring a skilled professional for hours. It's a game changer for an industry that it sounds like you've just never interacted with.
They were. They were loud and stinky and were unsuitable for dirt roads, spooking horses, causing the UK to basically ban them. Some were powered by steam or coal but those that were powered by gas had a different problem - there were no gas stations. You had to hand crank them to start. Moving goods and people around was already a solved problem with horses and trains and boats.
Cars then: take enormous energy to move very little, and slowly. Main use case then was as a rich person's toy (entertainment). They'll never replace work horses with them.
It's easy, in hindsight, to see cars as inevitable. But you had to see past the shortcomings of the earliest cars to "get it", much like you have to see past the 3 armed monstrosities that current image generation techniques produce and see the promise of the technology. There were undoubtedly those who saw cars as hype, much like image generation is seen today; I'm sure buggy whip manufacturers saw cars as hype and refused to get on what looked like a hype train to them.
I can't speak for others, but I've personally been quite impressed by the dalle output. It creates things that would take me hours (if not days) to create, which no other tech I've tried has been able to generate. It feels like it can absolutely replace at least the stock photo industry. It's also terrific for things like blog photos if you don't have the time or talent to create something yourself, but want some creative control.
Expansions like dream booth, which let you fine tune the system with your own submitted images are also quite amazing. Being able to give it just a few photos, and say things like "show me surfing in the ocean" and get a reasonable image back.
_Much_ more broadly, this space in AI/ML with GPT3/Dalle is exciting because it feels kind of like what the internet was made for. There's too much data on the internet for any one person to ever meaningfully process. But a machine can. And you can ask that machine questions. And instead of getting just a list of references back, you get an "answer". Image generation is the "image answer" part of this system. It's an exciting space because it feels like these systems will affect large chunks of how we use computers.
The 3 nostrils, 7 squished fingers are not that big of a problem, you can run other image enhancing AIs on top of the generated images to fix that, or just use inpainting and give it a few more tries to get it right. The models are also slowly getting better at it.
> What is the use case that I’m missing?
It's generating images from nothing more than a text description, a year ago that was something you'd only saw an StarTrek. Now it's real and we have barely scratched the surface of what is possible.
The images still need some manual work, but try to generate images of that quality and complexity by hand and you might have more appreciation how mindblowing it is that AI can not only do it, but do it in seconds.
Already on some of the homegrown models (https://rentry.co/sdmodels) these things are fixed already. For the Stable Diffusion "enthusiasts" the tools and models have improved at least 100% since the original release.
It's more of a cool technology that is rapidly advancing. A couple years ago, it couldn't do this much. A couple years from now, it will be much better. It does much more than mash images together, which you would know if you dug into it a bit. That's it, that's the whole thing.
There needs to be some sort of piece or filter that understands body geometries and inverse kinematics to prevent things like generating people with 3 limbs or joints in positions that would not normally be feasible without injury =). It'll come.
Nothing. It's IMHO just a hype of the younger nerdy generation. The real world applications of NN-based (there is no I in A) image generation are limited.
One hype comes the other hype goes. IMHO it did not come to stay ;-)
I haven't been able to get any good results with Stable Diffusion (via DiffusionBee on my M1 MacBook Air), but I've seen really good images of other AI generators like Midjourney.
Apple's mobile processors (especially the M1) are waaay faster than a steam deck. Even with the optimizations, it would take like half an hour to run I bet.
Any tips on how you “ If the face is cropped, now I know how to use the inpainting model to fill it in. If the inpainting model doesn’t do its job, you can always use a paint brush to paint it over and do an image-to-image generation again focused in that area.” using the app?
Were you focused on just making it work on the iPhone, or do you think you will keep adding functionalities to the app? Do you think it will ever be possible to train one's own model on an iPhone?
I think that fine-tuning the whole model (a.k.a. Dreambooth) on iPhone would require more RAM / processing power than it currently has. More viable path is to implement Hypernetwork + Textual Inversion, that is within possibility of today's hardware.
Even though I understood very little of that it was still wild fun reading it. I'm glad such wizards exist, because I and most people I know certainly don't qualify.
On a related note, has anyone been able to utilize Apple silicon GPUs? Running CPU-only is incredibly slow (and sad, since i've got these Apple accelerators idle!)
Is there any option to set so that every image is automatically saved, and not to camera roll, but to the local app folder (same folder that contains this apps data)
the developer is about to have a MASSIVE hosting bill
the download restarts from 0% if the app is sent to the background, as there does not seem to be a download manager. This is especially problematic for the large 1.5gb file.
I am using Cloudflare R2, which doesn't have egress fee and I am getting about 5k Class B operations right now. Unless Cloudflare changes their end of the deal, I think it is OK.
great to hear! please consider introducing a mechanism to suspend the download vs restarting, this may be especially valuable for those with a slower connection. With the traffic growth you'll be seeing chances are Cloudflare's enterprise team will soon be in-touch ;)
Is P2P torrent type of sharing the load possible under AppStore guidelines (be they iOS or Android)? I've honestly never even thought about this being a thing, but with large shared data that doesn't change, why not?
P2P, as in hosted from other people's phones? I think the issue is that people generally wouldn't be happy with P2P data uploads from their phones (compared to P2P on desktop, where internet is cheaper/faster, and battery isn't an issue).
On my iPad mini 5th generation with A12 the download is fast and fine. But with standard settings it first warns “Device capability warning” and then indeed crashes every time.
Is there a way to solve this?
A12 chip should work, no?
Given couple of weeks. Still playing to see what's the optimal UI looks like for such large screen. 16GiB should be able to generate several images to select from at once.
i have an ipad. the regular non air/pro/m1 one and the app installed and when i run it, it says "could be device incompatibility" and subsequently crashes
Does it crash upon downloading models, or generating? I haven't tested on all the devices, but 11 Pro seems have 4GiB RAM, and should run with 384x384 resolution (check if that is the selected resolution at top right).
There are reports that iOS is not happy with how I computing SHA256 for downloaded model file by loading them all in memory for Xr (3GiB RAM). If this is happening for other devices, I may need to do streaming hash computing and put up a bugfix.
Yeah, I thought Data(contentsOf:) already do that, but it appears not (tested, indeed allocated all the memory to load the data). Adding `mapIfSafe` in the reading options solved this.
I frequently see such comments about sites not loading fine on iPhone 13/14, while they continue to load just fine on for me on a 4 year old Android device (not pixel / Samsung).
I wonder if it's the hardware or just the blockers that i use. Might be worth trying using blockers to see if it makes the general browsing experience better on Apple devices
This is amazing. I'm kind of surprised that it doesn't have an NSFW image blocker. I want to be able to generate NSFW images but it probably should have one enabled by default.
Update: Draw Things uses “One-time photo selection” which according to Settings > Privacy & Security > Photos “Even if your photos were recently shown to you to select from, the app did not have access to your photo library.” Still, I didn’t realize apps could save to Photos without explicitly asking permission.
I don’t recall giving “Draw Things” permissions to access my photo library, yet the app is able to save to my photo library without prompting and able to read existing images.
I may have misunderstood what permissions apps should ask for when saving to the photo library.
I use PHPickerViewController: https://developer.apple.com/documentation/photokit/phpickerv... which runs out of the process such that when you want to select a photo into the app, I have no access to any information about your other photos and the location info is erased from what PHPickerViewController passed to me.
When save the photo, I only use UIImageWriteToSavedPhotosAlbum (https://developer.apple.com/documentation/uikit/1619125-uiim...) which asks you permission to write to the album, not read permission (they are separate). There are more things I can do if I have read permissions (like create a "Draw Things" collections and save to that, rather than save to generic Camera Roll). Ultimately decided to not do that because I don't want more permissions than I minimally absolutely need.
congratulations to liuliu on the launch!