I like this idea. It could be handy to be able to focus on individual descriptio...

jbhuang0604 · on Oct 7, 2023

The UI part is basically a way to organize the user's intention. In the backend, we develop method for extracting "token maps" (i.e., which spatial regions correspond to specific words) and use region-based diffusion to achieve these localized editing results.

The second half of the video provides an overview of the method. https://www.youtube.com/watch?v=ihDbAUh0LXk

90-00-09 · on Oct 7, 2023

TIL about region-based diffusion. Thanks for the context!