Hacker News new | past | comments | ask | show | jobs | submit login

I like this idea. It could be handy to be able to focus on individual descriptions in complex prompts. Is this then mostly a "UI" feature that is being translated to a traditional prompt?

(As a side note: using decorative typefaces was an unconvincing example.)




The UI part is basically a way to organize the user's intention. In the backend, we develop method for extracting "token maps" (i.e., which spatial regions correspond to specific words) and use region-based diffusion to achieve these localized editing results.

The second half of the video provides an overview of the method. https://www.youtube.com/watch?v=ihDbAUh0LXk


TIL about region-based diffusion. Thanks for the context!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: