Hacker News new | past | comments | ask | show | jobs | submit login
pix2pix-zero: Zero-Shot Image-to-Image Translation (pix2pixzero.github.io)
115 points by macawfish on Feb 13, 2023 | hide | past | favorite | 28 comments



Not my field so I don't have anything detailed to add, other than to say how fascinating it is to watch how quickly all these incremental improvements are occurring after the initial breakthrough.

Also, props for not cherry picking results too much, both for honesty and because it is fun to see how these fail. Like the tree that grew limbs to fill in a watermark, or the sunset sky that turned into a lava field.


Interesting how the lava field version imprinted my mind.

When going back to the original image I now see it as ambiguous - in the same way as a duck/rabbit multistable illusion - and can see a mountain range and sand instead of clouds.


Try this colab, if you want to try pix2pix video https://github.com/camenduru/pix2pix-video-colab


I think this uses instruct-pix2pix which is different!

My understanding is that instruct-pix2pix requires a special fine tuned model, whereas this new pix2pix-zero thing is a new image to image technique that works without any need for additional training.

All that to say here's a pix2pix-zero demo: https://huggingface.co/spaces/ysharma/pix2pix-zero-01


Trying to understand it ...

It only lets me choose between "cat2dog" and dog2cat"? I thought I can add any prompt like "add glasses" or "change background to beach sunset"?

What is the .pt file it creates?

When I use the "Generate & Translate the SD image" button, I just get "Error" after a while of processing.


Does something like this exist for the single image version?

How hard is it to set up such a thing on HuggingFace?


The "code" link takes you to a GitHub repo. But I don't know how to set up the needed environment.

I wish authors of such tools would provide a Dockerfile to set up the environment needed to run their creations.

Their setup section starts with

    conda env create -f environment.yml
But that gives me

    bash: conda: command not found
So I tried

    apt install conda
and

    apt install python3-conda
But nothing worked.

Why is running software still such a tricky thing in 2023, where you have to read tutorials and tinker with manually setting up an environment?

Guess I will wait until some SaaS company provides an API to use this on their machines then. Any ideas, which company might do this first?


The intended audience of other ML developers and data scientists will likely be more familiar with Anaconda[1] than docker. I recommend starting with miniconda[2] which will install just the packages needed to run this project (as declared in that environment.yml file), whereas Anaconda is a giant batteries-included install.

[1] https://www.anaconda.com/

[2] https://docs.conda.io/en/latest/miniconda.html


I respectfully disagree. I find docker to be more-or-less unavoidable in ML work, but thankfully conda is still avoidable.

My recommendation is not to touch conda, unless you're on Windows and not don't know about cuda drivers or WSL. Otherwise IMHO it's far less trouble to stick with pip (and virtualenv/pipenv/poetry) and it's not difficult to translate the environment.


While your millage (clearly) varies from mine, Anaconda is in practice the standard environment for deep learning (and, generally, in most of the Python data science ecosystem).

For example, when you go to the front page of PyTorch (https://pytorch.org/), the default way to go is with Anaconda. It precisely makes it easy to install things regardless of the system and with matching versions. For example, out of box, it gives GPU support for Apple Silicon - not extra installation instructions.

Pip installers don't work with non-Python dependencies. Of course, you can manually install things any way you like (including inside Docker), but it is up to you to make sure that all dependencies are compatible. And it is a non-trivial task, given frequent updates of all things involved (including CUDA kernels, Python versions, PyTorch/TF versions, and all libraries related to them one way or the other).

I know times before Anaconda matured. Docker was often necessary to make code reusable - and I am grateful I don't need to go back to these times.

Of course, now you CAN use Docker. Quite a lot of decent installation scripts place conda (usually miniconda) instructions in a Dockerfile.


Pip certainly does install plenty of non-python dependencies, but also leaves to the system package manager many of the things managed by conda.

I don't know if there's a good definition for "in practice the standard" but I think conda is a heap of shit and I wish people would use it less.


To add to that, Anaconda is for the data science ecosystem as NPM for TypeScript. For that reason, it is often assumed that people do know.

(Also, automatic tests with GitHub Actions, and similar tools, do wonders to make dependencies explicit AND tested.)

However, personally, I always link to such dependencies. "Obviousness" is a subjective criterion. It takes someone coming from a different language (or ecosystem), and it takes quite a lot of guesswork to figure out what to install. And if things do not work - it gives no clue if they are missing dependencies, incompatible versions of software, or maybe neither - there is something wrong with the code.


Thanks I might try it later.

But I will probably not have a big enough GPU?

Can I rent a VM to run it, or do I need to rent a dedicated server?


It's not just a question of whether the GPU is big enough. Sometimes you need to tweak this stuff so it runs on the type of GPU that you have. I would suggest trying it out first, and then looking at a VM later. Yes, you can rent a VM, you just need to get a VM with enough memory, and with a GPU attached. This is how I do it. Just remember to turn the VM off. A VM with a GPU may cost something like $0.45 an hour, which is over $300 per month.


I'm totally fine with $0.45 per hour. Where do you rent your VM?


I use GCS. There are some weird steps you have to go through--like your account might have a limit of 0 GPU instances, so you have to raise it. I also added storage read/write scope to the instance, and made a bucket that the VM's service account could write to, just to make it easier to get the output out. The Nvidia driver has randomly stopped working. I've often run out of system or GPU RAM.

I created the VM through the cloud dashboard and then ssh'd to it to set it up. When I'm done, I stop the instance. While it's stopped, I only accrue charges for the disk volume.


What is GCS?


Google


> Why is running software still such a tricky thing in 2023, where you have to read tutorials and tinker with manually setting up an environment?

I think this is unfair criticism. Conda is extremely common in data science, as common as Docker is in software development.

While I agree docker is easier to use as a consumer, Conda automatically sets up the environment and a quick Google would show you how to install conda.


I just said Dockerfile because when I say "shellscript", people think that is oldschool.

A dockerfile shows me every step of what has to be done in an unmistakable way. I can easily turn it into a shellscript.

The conda file (environment.yml, right?) does not tell me how to set up the environment. It contains a list of .. I don't know what. It starts with:

    channels:
        - pytorch
        - nvidia
        - defaults
I don't know what to do with that. It does not look like computer code.

> a quick Google would show you how to install conda

No. It leads me to yet another tutorial with a bunch of manual steps:

https://docs.conda.io/projects/conda/en/latest/user-guide/in...

It even starts with decisions I shall make of which I know nothing:

    Download the installer:
    - Miniconda installer for Linux.
    - Anaconda installer for Linux.
How do I know which one the software in question needs? How far down this rabbit hole is someone who just wants to run a software supposed to go?


This isn't software. Software is something like MS Office.

This is research code. It's raw. If you want to work with it, you need to know a little about how environments to build the code work. In the case of ML code using Python, nearly everyone is using Conda.

Install Anaconda on your system. Create a base environment and make sure your terminal is now using Anaconda for python and not your system python.

Then follow their one-liner to setup the environment with their yaml file. Conda will literally download and install all the python packages they need. It is really, really simple.


The conda file is for conda to read, it's not a set of instructions for you.

> How far down this rabbit hole is someone who just wants to run a software supposed to go?

Depends how much you want to run it really. It's a free thing and making easily installable software across wildly different systems is hard. Providing conda setup stuff is actually pretty nice.

These things are often hard to setup though, the alternative is waiting for someone else to do it for you.


[flagged]


Conda is a special type of hell though. I have read the manual. Multiple tutorials. Yet I, not even once got conda to work for me.


Having set up various diffusion models I'm gonna say that the "read tutorials and tinker with manually setting up an environment" is EXTREMELY fair criticism. Much of the ML ecosystem is kinda held together with duct tape and string.

As I get older, I feel more and more that the initial reactions of people completely new to a software system are incredibly valuable, and the best response is to try and capture those reactions so you can find out what underlying problems (if any) that caused them. When you hire a junior engineer straight out of college, maybe they'll react with horror when they see what real production code looks like. It's EASY to just say that they'll get used to it. However, their reactions are sometimes correct and sometimes miss the mark. It takes work to sort it out, but the first step is capturing the initial reactions of people new to the system, and knee-jerk responses just get in the way.

Most of the senior engineers can't see the problems. They've been surrounded by those problems for too long and the problems just turn into background noise.


It's obvious you aren't the intended audience for the code; I mean this sincerely when I say their repo is absolutely standard and simple and easily implemented by any undergraduate student in a first year ML course.

They literally give a conda environment yaml. Anaconda the most widely used distribution manager for Python and R that takes care of package installs and virtual environment management. If you don't know what that is, I would argue that isn't the fault of the researchers.


I guess this involves training textual inversions which take quite a lot of computation vs other methods like instruct pix2pix which don't need to


Actually this is wrong, after reading the paper: https://arxiv.org/pdf/2302.03027.pdf

The main idea is to specify an edit direction which is a broad statement of what you want to do. The model then uses gpt-3 to generate text for that direction and then it calculates embeddings with CLIP and then the mean difference. The difference is used to guide the noise to make the final change.

One pitfall seems like you can't specify very customized changes, but very cool approach


Is that difference the "inversion" mentioned?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: