Not my field so I don't have anything detailed to add, other than to say how fascinating it is to watch how quickly all these incremental improvements are occurring after the initial breakthrough.
Also, props for not cherry picking results too much, both for honesty and because it is fun to see how these fail. Like the tree that grew limbs to fill in a watermark, or the sunset sky that turned into a lava field.
Interesting how the lava field version imprinted my mind.
When going back to the original image I now see it as ambiguous - in the same way as a duck/rabbit multistable illusion - and can see a mountain range and sand instead of clouds.
I think this uses instruct-pix2pix which is different!
My understanding is that instruct-pix2pix requires a special fine tuned model, whereas this new pix2pix-zero thing is a new image to image technique that works without any need for additional training.
The intended audience of other ML developers and data scientists will likely be more familiar with Anaconda[1] than docker. I recommend starting with miniconda[2] which will install just the packages needed to run this project (as declared in that environment.yml file), whereas Anaconda is a giant batteries-included install.
I respectfully disagree. I find docker to be more-or-less unavoidable in ML work, but thankfully conda is still avoidable.
My recommendation is not to touch conda, unless you're on Windows and not don't know about cuda drivers or WSL. Otherwise IMHO it's far less trouble to stick with pip (and virtualenv/pipenv/poetry) and it's not difficult to translate the environment.
While your millage (clearly) varies from mine, Anaconda is in practice the standard environment for deep learning (and, generally, in most of the Python data science ecosystem).
For example, when you go to the front page of PyTorch (https://pytorch.org/), the default way to go is with Anaconda. It precisely makes it easy to install things regardless of the system and with matching versions. For example, out of box, it gives GPU support for Apple Silicon - not extra installation instructions.
Pip installers don't work with non-Python dependencies. Of course, you can manually install things any way you like (including inside Docker), but it is up to you to make sure that all dependencies are compatible. And it is a non-trivial task, given frequent updates of all things involved (including CUDA kernels, Python versions, PyTorch/TF versions, and all libraries related to them one way or the other).
I know times before Anaconda matured. Docker was often necessary to make code reusable - and I am grateful I don't need to go back to these times.
Of course, now you CAN use Docker. Quite a lot of decent installation scripts place conda (usually miniconda) instructions in a Dockerfile.
To add to that, Anaconda is for the data science ecosystem as NPM for TypeScript. For that reason, it is often assumed that people do know.
(Also, automatic tests with GitHub Actions, and similar tools, do wonders to make dependencies explicit AND tested.)
However, personally, I always link to such dependencies. "Obviousness" is a subjective criterion. It takes someone coming from a different language (or ecosystem), and it takes quite a lot of guesswork to figure out what to install. And if things do not work - it gives no clue if they are missing dependencies, incompatible versions of software, or maybe neither - there is something wrong with the code.
It's not just a question of whether the GPU is big enough. Sometimes you need to tweak this stuff so it runs on the type of GPU that you have. I would suggest trying it out first, and then looking at a VM later. Yes, you can rent a VM, you just need to get a VM with enough memory, and with a GPU attached. This is how I do it. Just remember to turn the VM off. A VM with a GPU may cost something like $0.45 an hour, which is over $300 per month.
I use GCS. There are some weird steps you have to go through--like your account might have a limit of 0 GPU instances, so you have to raise it. I also added storage read/write scope to the instance, and made a bucket that the VM's service account could write to, just to make it easier to get the output out. The Nvidia driver has randomly stopped working. I've often run out of system or GPU RAM.
I created the VM through the cloud dashboard and then ssh'd to it to set it up. When I'm done, I stop the instance. While it's stopped, I only accrue charges for the disk volume.
> Why is running software still such a tricky thing in 2023, where you have to read tutorials and tinker with manually setting up an environment?
I think this is unfair criticism. Conda is extremely common in data science, as common as Docker is in software development.
While I agree docker is easier to use as a consumer, Conda automatically sets up the environment and a quick Google would show you how to install conda.
This isn't software. Software is something like MS Office.
This is research code. It's raw. If you want to work with it, you need to know a little about how environments to build the code work. In the case of ML code using Python, nearly everyone is using Conda.
Install Anaconda on your system. Create a base environment and make sure your terminal is now using Anaconda for python and not your system python.
Then follow their one-liner to setup the environment with their yaml file. Conda will literally download and install all the python packages they need. It is really, really simple.
The conda file is for conda to read, it's not a set of instructions for you.
> How far down this rabbit hole is someone who just wants to run a software supposed to go?
Depends how much you want to run it really. It's a free thing and making easily installable software across wildly different systems is hard. Providing conda setup stuff is actually pretty nice.
These things are often hard to setup though, the alternative is waiting for someone else to do it for you.
Having set up various diffusion models I'm gonna say that the "read tutorials and tinker with manually setting up an environment" is EXTREMELY fair criticism. Much of the ML ecosystem is kinda held together with duct tape and string.
As I get older, I feel more and more that the initial reactions of people completely new to a software system are incredibly valuable, and the best response is to try and capture those reactions so you can find out what underlying problems (if any) that caused them. When you hire a junior engineer straight out of college, maybe they'll react with horror when they see what real production code looks like. It's EASY to just say that they'll get used to it. However, their reactions are sometimes correct and sometimes miss the mark. It takes work to sort it out, but the first step is capturing the initial reactions of people new to the system, and knee-jerk responses just get in the way.
Most of the senior engineers can't see the problems. They've been surrounded by those problems for too long and the problems just turn into background noise.
It's obvious you aren't the intended audience for the code; I mean this sincerely when I say their repo is absolutely standard and simple and easily implemented by any undergraduate student in a first year ML course.
They literally give a conda environment yaml. Anaconda the most widely used distribution manager for Python and R that takes care of package installs and virtual environment management. If you don't know what that is, I would argue that isn't the fault of the researchers.
The main idea is to specify an edit direction which is a broad statement of what you want to do. The model then uses gpt-3 to generate text for that direction and then it calculates embeddings with CLIP and then the mean difference. The difference is used to guide the noise to make the final change.
One pitfall seems like you can't specify very customized changes, but very cool approach
Also, props for not cherry picking results too much, both for honesty and because it is fun to see how these fail. Like the tree that grew limbs to fill in a watermark, or the sunset sky that turned into a lava field.