Hacker News new | past | comments | ask | show | jobs | submit login
Image Editing with Gaussian Splatting (unite.ai)
233 points by Hard_Space 3 months ago | hide | past | favorite | 60 comments



Isn't it quite a leap to go from single image to usable 3DGS model? The editing part seems relatively minor step afterwards. I thought that 3DGS typically required multiple viewpoints, like photogrammetry.


It's not "real" 3D -- the model doesn't infer anything about unseen portions of the image. They get 3D-embedded splats out of their pipeline, and then can do cool things with them. But those splats represent a 2D image, without inferring (literally or figuratively) anything about hidden parts of the image.


This is what I initially thought, however, I have already witnessed working demoes of 3DGS when using a single viewpoint, but armed with additional auxiliary data that is contextual relevant to the subject.


Yeah exactly, this page doesn't explain what's going on at all.

It says it uses a mirror image to do a Gaussian splat. How does that infer any kind of 3D geometry? An image and its mirror are explainable by a simple plane and that's probably what the splat will converge to if given only those 2 images.



I’ve been exploring some creative applications of Gaussian splats for photography/photogrammetry, which I think have an interesting aesthetic. The stills of flowers on my Instagram if anyone is interested: https://www.instagram.com/bayardrandel


These are great! What software do you use, and what does your pipeline look like?

If you wanted to capture a full 3D scene, my experience with photogrammetry and NeRFs has been that it requires a tremendously large dataset that is meticulously captured. Are Gaussian splat tools more data efficient? How little data can you get away with using?

What are the best open source Gaussian Splat tools for both building and presenting? Are there any that do web visualization particularly well?

I might have to get back into this.


Thanks very much! I use Polycam on iOS for photogrammetry and generating Gaussian splats from stills. It seems to work remarkably well, but has a subscription fee (given there's processing on their servers this seems reasonable). Typically to build a splat model takes about 30-50 stills for good results, depending on the subject.

The only open source tool I use in my workflow is CloudCompare (https://www.danielgm.net/cc/), for editing/cleaning point cloud data.

For animation I primarily use Touch Designer which is a node based visual programming environment, exporting splats as point clouds, and Ableton/misc instruments for sound.

No idea about web visualisation, but interesting idea!


Have you tried Kiri Vs polycam?

I was using kiris dev mode and then running that through nerfstudio to make nerfs and I'm wondering if polycam might give higher quality but I can't seem to find anyone else whose been doing this. I guess I might have to do some tests to compare.

+1 to this workflow. TouchDesigners point transform TOP is great for aligning too.


No afraid not sorry, I've only used Polycam. TouchDesigner is such a pleasure - I can't remember when I last found creative software as fun and interesting to explore.


I learned about 3D Gaussian Splatting from the research team at work just 2 weeks ago, and they demoed some incredible use cases. This tech will definitely become mainstream in camera technologies.


Having some sort of fast camera view position and orientation computation with colmap + initial point prediction + gaussian splatting for 5 minutes + cloudcompare normal estimation and 3d recon wields some incredible results.

Much better than nerf in my experience. There's however a need to clean the point cloud yourself and stuff like that.


The examples don't look like anything beyond what can be done with the puppet warp effect in Photoshop/After Effects


This is honestly genius. If I understand it correctly, instead of manipulating pixels, you turn any 2D image to a 3D model and then manipulate that model.


Yes! This really feels next-gen. After all, you're not actually interested in editing the 2D image, that's just an array of pixels, you want to edit what it represents. And this approach allows exactly that. Will be very interesting to see where this leads!


Or analogous of how you convert audio waveform data into frequencies with the fast-fourier transform, modify it in the frequency spectrum and convert it back into waveform again.

Their examples does however only look a bit like distorted pixel data. The hands of the children seem to warp with the cloth, something they could have easily prevented.

The cloth also looks very static despite it being animated, mainly because the shading of it never changes. If they had more information about the scene from multiple cameras (or perhaps inferred from the color data), the Gaussian splat would be more accurate and could even incorporate the altered angle/surface-normal after modification to cleverly simulate the changed specular highlights as it animates.


The type of 3D model, Gaussian splatting, is also pretty neat and has been getting a lot of attention lately.

There's been some good previous discussion on it here, like this one:

Gaussian splatting is pretty cool https://news.ycombinator.com/item?id=37415478


Gaussian splatting is clearly going to change a lot of things in 3D assets, surprise to see it doing the same for 2D here.


When a foreground object is moved, how are the newly visible contents of the background filled?


The demos show either totally internal modifications (bouncing blanket changing shape / statue cheeks changing) or isolated with white background images that have been clipped out. Based on the description of how they generate the splats, I think you’d auto select the item out of the background, do this with it, then paste it back.

The splatting process uses a pretty interesting idea, which is to imagine two cameras, one the current “view” of the image, the other one 180 degrees opposite looking back, but at a “flat” mirror image of the front. This is going to constrain the splats away from having weird rando shapes. You will emphatically not get the ability to rotate something a long a vertical axis here, (e.g. “let me just see a little more of that statue’s other side”). You will instead get a nice method to deform / rearrange.


It probably isn't.

The most logical use of this is to replace mesh-transform tools in Photoshop or Adobe Illustrator. In this case, you probably work with a transparent map anyway.


Why do gaussian splats benefit you for mesh transform applications? Name one, and think deeply about what is going on. The applications are generally non-physical transformations, so having a physical representation is worse, not better; and then, the weaknesses are almost always interacting with foreground versus background separation.

Another POV is, well generative AI solves the issue I am describing, which should question why these guys are so emphatic about their thing not interacting with the generative AI thing. If they are not interested in the best technical solutions, what do they bring to the table besides vibes, and how would they compete against even vibesier vibes?


Mesh transform is extensively used to create animations and warping perspectives. The most useful kind of warping is emulating perspective and rotation. Gaussian splats allow more intelligent warping in perspective without manually moving every vertex by eye.

Foreground-background separation is entirely uninteresting. Masking manually is relatively easy, and there are good semi-intelligent tools that make it painless. Sure, it's a major topic discussed within AI papers for some reason, but from an artist's perspective, it doesn't matter much. Masking out from the backing is generally step one in any image manipulation process, so why is that a weakness?


Now THIS is the kind of shit I signed up for when AI started to become able to understand images properly: no shitty prompt-based generators that puke the most generalised version of every motif while draining the whole illustration industry from life.

It's just good-ass tooling for making cool-ass art. Hell yes! Finally, there is some useful AI tooling that empowers artistic creativity rather than drains it.

Pardon the French; I just think this is too awesome for normal words.


Yep, there's a similar refrain amongst 3D artists who are begging for AI tools which can effectively speed up the tedious parts of their current process like retopo and UV unwrapping, but all AI researchers keep giving them are tools which take a text prompt or image and try to automate their entire process from start to finish, with very little control and invariably low quality results.


There have been some really nice AI tools to generate bump and diffusion maps from photos. So you could photograph a wall and get a detailed meshing texture with good light scatter and depth.

That's the kind of awesome tech that got me into AI in the first place. But then prompt generators took over everything.


Denoising is another good practical application of AI in 3D, you can save a lot of time without giving up any control by rendering an almost noise-free image and then letting a neural network clean it up. Intel did some good work there with their open source OIDN library, but then genAI took over and now all the research focus is on trying to completely replace precise 3D rendering workflows with diffusion slot machines, rather than continuing to develop smarter AI denoisers.


Because the investors funding development of those AI tools don't want to try to empower artists and give them more freedom, they want to try to replace them.


The investors want to make money, and if they make a tool that is usable by more people than just experienced 3D artists who are tired of retopologizing their models, that both empowers many more people and potentially makes them more money.

Aside from that, it's impossible to tools replace artists. Did cameras replace painting? I'm sure they reduced the demand for paintings, but if you want to create art and paint is your chosen medium it has never been easier. If you want to create art and 3D models are your chosen medium, the existence of AI tools for 3D model generation from a prompt doesn't stop you. However, if you want to create a game and you need a 3D model of a rock or something, you're not trying to make "art" with that rock, you're trying to make a game and a 3D model is just something you need to do that.


There's a ton of room for using today's ML techniques to greatly simplify photo editing. The problem is, these are not billion dollar ideas. You're not gonna raise a lot of money at crazy valuations by proposing to build a tool for relighting scenes or removing unwanted to objects from a photo. Especially since there is a good chance that Google, Apple, or Adobe are going to just borrow your idea if it pans out.

On the other hand, you can raise a lot of money if you promise to render an entire industry or an entire class of human labor obsolete.

The end result is that far fewer people are working on ML-based dust or noise removal than on tools that are generating made-up images or videos from scratch.


I share your excitement for this tool that assists artists. However, I don't share the same disdain for prompt generators.

I find it enlightening to view it in the context of coding.

GitHub Copilot assists programmers, while ChatGPT replaces the entire process. There are pros and cons though:

GitHub Copilot is hard to use for non-programmers, but can be used to assist in the creation of complex programs.

ChatGPT is easy to use for non-programmers, but is usually restricted to making simple scripts.

However, this doesn't mean that ChatGPT is useless for professional programmers either, if you just need to make something simple.

I think a similar dynamic happens in art. Both types of tools are awesome, they're just for different demographics and have different limitations.

For example, using the coding analogy: MidJourney is like ChatGPT. Easy to use, but hard to control. Good for random people. InvokeAI, Generative Fill and this new tool is like Copilot. Hard to use for non-artists, but easier to control and customise. Good for artists.

However, I do find it frustrating how most of the funding in AI art tools goes towards the easy-to-use side, instead of the easy-to-control side (this doesn't seem to be shared by coding, where Copilot is more well-developed than ChatGPT coding). More funding and development to the easy-to-control type would be very welcome indeed!

(Note, ControlNet is probably a good example as easy-to-control. There's a very high skill ceiling in using Stable Diffusion right now.)


Good analogy. Yes, controllability is severely lacking, which is what makes diffusion models a very bad tool for artists. The current tools, even Photoshop's best attempt to implement them as a tool (smart infill), are situational at best. Artists need controllable specialized tools that simplify annoying operations, not prompt generators.

As a programmer, I find copilot a pretty decent tool, thanks to its good controllability. ChatGPT is less so, but it is decent for finding the right keywords or libraries i can look up later.


Except this is explicitly not AI, nor is it even tangentially related to AI. This is a normal graphics algorithm, the kind you get from really smart people working on render-pipeline maths.


> nor is it even tangentially related to AI

It's not a deep neural network, but it's a machine learning model. In very simple terms, it minimizes a loss from refining an estimated mesh—about as much machine learning as old-school KNN or SVM.

AI means nothing as a word; it is basically as descriptive as "smart" or "complicated". But yes, it's a very clever algorithm invented by clever people that is finding some nice applications.


Whether you agree with what it means or not, the word AI most definitely has a meaning today, moreso than ever, and that meaning is not what we (myself included, I have a masters in AI from the before-times) used to use it for. Today, AI exclusively refers to (extremely) large neural networks.


If that is the definition, then I agree; calling this AI would downplay how clever this algorithm really is.

But most marketing firms disagree. AI has now absorbed the terms "big data" and "algorithm" in many places. The new Ryzen AI processor, Apple intelligence, NVIDIA AI upscaling, and HP AI printer all refer to much smaller models or algorithms.


Generative art hasn't been what you're describing for a long time.


Hasn’t been, or has been more than?


Has been more than.


[flagged]


I know people within creative fields who use it.


Your fault for poor prompting. If you don't provide distinctive prompts you can expect generalised answers


Let's say you want to rotate a cat's head in an existing picture by 5 degrees, as in the most basic example suggested here. No prompt will reliably do that.

A mesh-transform tool and some brush touchups could. Or this tool could. Diffusion models are too uncontrollable, even in the most basic examples, to be meaningfully useful for artists.


No, but you could rotate the head with traditional tools and then inpaint the background and touch up the neckline. It's not useless, just different.


These underlines make reading the text pretty difficult, it might be worth making the links a little less prominent to aid legibility.


this seems like an absolutely terrible idea! I thought this was going to be about editing gaussian splats, which is sorely needed. instead, it's about turning a few pixels into a gaussian splat in order to edit them?! My god, talk about using a nuclear bomb to kill a fly!


[flagged]


So for making an edit you're proposing to extract every (possibly partial) objects/subjects out of a picture, create 3d models out of them and then animate? And if I don't know how, first learn how to do it?

A sleep-deprived artist somewhere (not me): No, thank you, I need to get this for an ad tomorrow morning.


The kind of edits and animations this can do are currently not possible with 3D modelling and animation, with or without this tech.

This kind of warping of 2D images is currently used extensively, but a lot more manually (See Live2d, the Photoshop mesh transform tool, or Adobe Illustrator). So, this does not replace 3D modelling; it replaces some 2D editing techniques where 3D modelling isn't even on the table as an applicable tool.

This kind of 2d image warping is useful in advertisements, game art, photo touchups, concept art, photo bashing and digital illustration.


What's the line there though? To "learn" 3d modeling, should I learn to program and write my own modeler/cad system? To learn to program, should I start shoveling sand into a kiln and make my own hardware?


Sometimes new techniques can augment or replace old ones.

Imagine telling the Toy Story team “or you can just draw each frame like normal”


Just draw the rest of the owl.


Is that all?


"Or you can learn to read and write in Latin and be a practicing member of the clergy."

Everyone should have the means of being able to express the ideas in their heads visually in front of others. It shouldn't require arcane and difficult to use tools. Nor should it cost millions of dollars and require a studio apparatus to tell stories to an audience.

Pretty soon creating worlds, filming movies, and sculpting images and three dimensional forms will be accessible to everyone.


What do you mean by “should”? Everyone does and always has had the right to express themselves visually with the best tools available. But why shouldn’t it require difficult to use tools or cost a lot? That depends entirely on what ideas you’re trying to express and what tools exist, does it not?

Blockbuster movies will never get significantly cheaper, no matter what tools we invent. They will spend as much or more money on using the new tools to make more grandiose movies that individual people have no hope of doing themselves. There’s already a hundred year history of this happening, despite the tools having always been accessible to everyone.

I think this Gaussian splatting tool is great, and I’m in favor of forward progress in imaging tools, and I work in this field. But where is the line between having accessible tools, and putting time and effort into your craft to make something that’s high quality, or telling a story with images that are personal and new? And if you’re really talking about AI (even though this Gaussian splat tool isn’t really AI), where is the line between being able to express your ideas effortlessly and stealing the means to do that from artists who had to put it effort to produce the content that trained the AI?


> Everyone does and always has had the right to express themselves visually with the best tools available.

The fact is that not everyone can do so.

What about an elderly grandparent that wants to paint their hometown as they remember growing up? Do we expect them to spend years learning illustration before they can show their grandchildren?

> But why shouldn’t it require difficult to use tools or cost a lot?

Would "640K" [1] be enough for you? What about if computers were still only housed in DoD facilities and in universities? Too giant, arcane, and expensive for you to use?

> Blockbuster movies will never get significantly cheaper, no matter what tools we invent.

The industry thinks otherwise [2].

I, for one, would like to tell an epic sci-fi adventure story without needing Disney's budget. I've spent tens of thousands of dollars on indie film production and it's incredibly expensive and time consuming.

While I don't have time to learn Blender, I do know how I want my explosions to look. Why should I have to outsource to someone else to do the VFX? Someone that might not deliver my work in time? (Read: waiting on post has been a serious hang-up for me in the past and caused me to miss several festival deadlines.)

> There’s already a hundred year history of this happening, despite the tools having always been accessible to everyone.

No, the reason this happens is because large budget films are predictably bankable and wind up eating up all of the screen distribution real estate. There's a reason you don't want to launch your movie opposite The Avengers or another tent pole feature.

The most ROI comes from breakout low budget successes, but those are harder to gamble on.

> I work in this field

Same.

> But where is the line between having accessible tools, and putting time and effort into your craft to make something that’s high quality, or telling a story with images that are personal and new?

Why not let the users and the artists show us what they'll make rather than predicting they can't do a good job? I know dozens of artists, myself included, using these tools incredibly effectively.

> being able to express your ideas effortlessly

This is probably the source of our disagreement. An effortless tool doesn't mean works of art don't take effort.

> stealing the means to do that from artists who had to put it effort to produce the content that trained the AI

This argument will be moot soon. Adobe has full rights to all of their training data. Soon we'll have enormous synthetic datasets from Unreal Engine and mechanized turn table photo rooms. Other organizations will follow.

[1] https://www.computerworld.com/article/1563853/the-640k-quote...

[2] https://www.hollywoodreporter.com/business/digital/jeffrey-k...


Hehehe, the industry does not think otherwise, Jeffrey Katzenberg is just trying to sell you AI in that article, he’s raising funds for his AI startup. He wants you to believe you can make your own movie so you buy his startup’s software. Of course he’s exaggerating, his quotes in that article are silly and he knows it, so don’t believe everything you read. I’ve worked for Jeffrey Katzenberg making movies, and you don’t have to believe me, but I’m telling you: they will NEVER get significantly cheaper, regardless of what AI can do. There are several very good reasons why, and one of them is because it doesn’t make sense to spend 1 million on production and 50 million on marketing, another is because the studio next door will make a better movie spending more money. Music licensing and celebrity salaries are yet more reasons. It might take some industry experience to understand this, but people made the exact same claims about CG 30 years ago that effects would cut production budgets, and the exact opposite has happened: they use more effects and higher quality effects, but movie budgets have only gone steadily up, not down.

It’s true not everyone has the skills to make a movie. Why “should” they? You didn’t answer the question. I don’t expect people without any skills and without the will to learn to do anything, and that includes not expecting them to make movies. I don’t follow your point about 640K and the DoD, nobody is talking about making tools artificially difficult. Modern tools still require years of learning to paint or make movies, AI hasn’t changed that yet, and even if it does it will only raise the bar such that people with skills continue to produce things much better than people without skills, low effort art is going to remain crappy, same as it ever was.


It is already available to everyone. You can make a movie on your phone, create a song, edit images.

A.I. art is for the lazy.


"You can make a movie on your phone, create a song, edit images."

These are 2010-era tools. We're modernizing.

You wouldn't ask musicians today to stick to only pre-synthesizer, pre-DAW era tools to make their music. You wouldn't ask illustrators to ditch their tablets force them to mix their own cerulean unless that's what they wanted to do.

The tools that are coming are going to be fantastic and they're going to enable artists to do wild new things. That isn't subtractive, it's additive. There's more space to explore.

New tools don't even prevent folks from making music and media the old fashioned way: there are still people shooting on film to this day. And there's a vibrant community that embraces anachronistic tools to make new art.

I'm looking forward to a world where teenagers cut their teeth on making their own Marvel movies instead of just watching them. We'll have so many young Scorseses before they even graduate.


Current generative AI isn't additive. It's generative. That's about half of the problem. DAWs don't revert your changes back to means, but genAI always do, being a statistical model. The roughly other half is that the output is inexplicably bad, not always noticeable to everyone but often obvious to artists and connoisseurs, so connoisseurs can't promote themselves into artists by use of AI.

The almost violent anti-AI sentiment seen among art cohort is sometimes hard to understand to subgroups of tech literates without enough training epochs in human generative image data(especially the kind prevalent on the Internet), and I would understand that without grasp of rather subjective quality issues it could indeed look like an artificially incited luddite conspiracy.

Once someone makes an AI that would be additive and outputs entertainment worthy, then the "luddites" will change, they must. Until then, it's just a technically remarkable white noise generator.


I'll take "People who feel threatened by AI" for $500, Alex.


Or you could just learn to draw the images on a physical medium.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: