Nice looking results, hopefully not too cherry-picked. Every 3D model generation paper posted on HN has people complaining that the meshes are bad, so this kind of research is welcome and necessary for generated 3D assets to be used in actual games.
Weird custom non-commercial license unfortunately. Notes from the GitHub readme:
> It takes about 7GB and 30s to generate a mesh on an A6000 GPU
> trained on meshes with fewer than 800 faces and cannot generate meshes with more than 800 faces
Certainly a lot of scope for this kind of thing .. people who do lidar scans or photogrammetry of buildings tend to end up with very large meshes or very large point clouds.. which means they need souped up PCs and expensive software to wrangle into some usable CAD format.
Its an area where things can be improved a lot imho - I did some work a while back fitting flat planes to pointclouds, and ended up with mesh model anything from 40x to 100x smaller data than the ptcloud dataset. see quato.xyz for samples where you can compare the cloud, the mesh produced.. and view the 3D model in recent browsers.
My approach had some similarity to gaussian splats... but using only planar regions .. great for buildings made of flat slabs, less so for smooth curves and foliage.
Applying their MeshAnything algo to fine meshes from photogrammetry scans of buildings would be of great benefit - probably getting those meshes down to a size where they can be shared as 3D webgl/threejs pages.
Even deciding on triangle points to efficiently tesselate / cover a planar region with holes etc, is basically a knapsack problem, which heuristics, monte-carlo and ML can improve upon.
If you want to show photogrammetric pointclouds of buildings potree db&algorithm is pretty good and if you don’t like the library for some reason it’s pretty easy to reimplement (potree.org).
You just dump the pointcloud to a hierarchical octree and at viewer end just download the nodes in your frusttum and voila.
There are other approaches but this wins hands down on usability/simplicity.
Im quite familiar with potree and a big fan .. having hacked some of the internals and added features to my own custom version - so people can share annotations, measurements, save to cloud or export linework .. without writing code/custom html.
Also added code to import e57 cube panoramas ..
Still, I think if one can use ML to simplify a pointcloud or fine mesh .. then the data is much smaller and cleaner, easier to import to existing CAD tools etc.
Understandable! Thanks for what you've shared. I'm doing academic work on something that could leverage a digital twin, hence my interest.
There are many uses for this tech, particularly in less techy crowds that still make significant use of traditional photogrammetry.
Your solution, if local, could give a significant advantage over other products such as Polycam. Again, if local, you could allow for much bigger scans (wink for those doing architecture, particularly those in the restoration field). Anyhow, hope you get that funding!
3) UV for textures that are aligned with the natural flow of textures on those components.
4) Repeating textures (although sometimes not) that work with the UVs and combine to create PBR textures. (Getting closer all the time: https://gvecchio.com/stablematerials/)
After the above works, I think people should move on to inferring proper CAD models from an image. Basically infer all the constraints and the various construction steps.
It's much easier and cleaner to subdivide quads to refine shapes when modeling. For example, you can split the quads along an entire edge to get a new clean edge for manipulation (ex. to bevel it). If you try to do the same with triangles, you get a jagged mess.
This might be a naive/stupid question, but wouldn't it be relatively easy to merge triangles in the same plane into polygons automatically? (I suppose few triangles from this process would be in the same plane maybe?)
They are not. Polygons are a terrible representation since unlike triangles they do not cleanly represent a unique planar surface. With more than 3 points you will always have an ambiguity (or several) about which (numerical) plane corresponds to the actual face. For some graphics applications this may or may not matter much, but it is very important for anything using the mesh for physical computation.
Co-planar quads will always subdivide into two coplanar tris. That's the crux on why the modeling work flow works. The GPU is going to turn it into triangles anyway as long as a few fundamental rules with indices are up kept, so you're mostly getting the best of both worlds here.
I feel like maybe CAD would be easier? You only need represent form/edges, rather than meet all the requirements that you have for using a model for games/rendering.
I am all in for any development in this domain. Just to spread some sense of scale, We recently processed (manually) the point cloud scan of one of the (<1% of whole complex) working Oil Refinery. The total volume of point cloud was 450GByte. Our previous project of slightly larger scope was 2.1TByte.
So the scale shown in this paper feels like toys! Not undermining the effort at all. We need to start somewhere anyway.
For the same reason, I feel puzzled looking at Industrial scenes in Video Games. They are like 3 order of magnitude simplified compared to a real plant.
Real life castles were designed to withstand a siege, video game castles are designed to give off a castle vibe. Once you've achieved that you stop adding stuff, as anything beyond that just creates problems - you start killing performance, visibility starts to suffer, it's not clear what's interactive and what is decoration, gameplay starts to take a hit as the AI and player start getting stuck in the clutter, etc, etc...
Most people don't care as they don't have deep knowledge of how a castle or a power plant really functions, you only notice oversimplifications in media in the field you work in.
It's also very likely the designers and artists didn't have time to do much research, and the whole thing is based off a Pinterest reference board.
A personal pet-peeve of mine is "movies that feature an airplane flying away and turning left off into the sunset..."
ThEy NeVeR AniMaTe The FlAps!!
It's like you'd have an animated motorcycle scene and they don't turn the handlebars or make the bike lean when going around a corner. Like, the graphics are _soooo_ good but then they make the danged plane turn and immersion breaks (for me).
In the same vein, any time someone plays an instrument that they don't really play, and their hands aren't moving to match the music. Or when the sound for a vehicle doesn't match the actual vehicle type - there was a CGI short film with a motorbike that was clearly a Yamaha MT-01 with its massive V-twin, and it sounded like a 600cc 4-pot rather than a tractor.
> For the same reason, I feel puzzled looking at Industrial scenes in Video Games. They are like 3 order of magnitude simplified compared to a real plant.
Because they are games, not oil refinery simulators. They are typically intending to only convey a general sense of “industrial environment” and nothing more.
Do your models of oil refineries include the correct grass and other plant species growing in cracks in the pavement?
That's an excellent point. I do feel compelled to mention the exception of oil refinery simulator games. Maxis (of SimCity, The Sims fame) made SimRefinery way back when.
Yes, if a game is in fact a refinery simulator I would expect it to have an accurate representations of oil refineries. But whatever the latest Call of Duty game is? It’s going to be a grey block environment designed for gameplay that then gets covered in industrial props and textures and called a refinery.
I feel puzzled looking at Industrial scenes in Video Games. They are like 3 order of magnitude simplified compared to a real plant.
Really? You don't know why video games don't have 80 billion points and you don't know why a tool made to simplify meshes into video game objects isn't using your 80 billion point lidar scan?
For starters, these are meshes and you're talking about points. If anyone is meshing those points and they have any sense, they are working with "toy" sized chunks too so they avoid doing nearest neighbor calculations on terabytes of data.
One group finds a way to automate a job, and then our whole society agrees that the people who previously did that job should be tossed out into the street. But for some reason we blame the first group rather than the second.
It's a funny euphemism, in a dark sort of way. But if there is a domain where AI is not getting humans out of a job anytime soon, I think it's this one. I've read dozens of papers about remeshing, but for all of the research, very few algorithms make it to production pipelines. And those that do, still crash and fail in spectacular ways, even after a decade or more of refining and bug-fixing.
"Our method points to a promising approach for the automatically generation of Artist-Created Meshes, which has the potential to significantly reduce labor costs in the 3D industry, thereby facilitating advancements in industries such as gaming, film, and the metaverse. However, the reduced cost of obtaining 3D artist-created meshes could also lead to potential criminal activities."
That last statement is worded in such a weird way, lol. Funny Chinese->English transliteration.
"The FBI has issued a warning for potential criminal activity resulting from the automatic generation of low-poly models. The public is advised to minimize outdoors exposure and report any suspicious activity."
MeshAnything generates meshes with hundreds of times fewer faces, significantly improving storage, rendering, and simulation efficiencies, while achieving precision comparable to previous methods.
When working with meshes what you generally want is is quads, not triangles. The reason is that quads form nice closed loops.
Further more you would only allow quads to meet in 3, 4, or 5 edges per vertex. The 4 edges per vertex is the "normal" case that most of your mesh should have, it causes a regular grid of parabolic (euclidian) geometry with neutral curvature. Then patches of these meet in vertices with 3 edges to make it elliptic geometry with positive curvature or 5 edges to make it hyperbolic geometry with negative curvature.
You can ignore all of these and just randomly connect nearest neighbors to form triangles. But, then you still have only geometry, no useful topology, so not any better than a point cloud. A good topology is necessary for texturing, skinning, animation etc.
Sure. I (mostly) knew all that. I was specifically asking why you said "thy are full of n-gons" - my understanding of the terminology seems to be different to yours in that "n-gons" means "5 or more sides on a face". i.e. not a tri or a quad.
Mate I really don't know how to help you but even on the examples in the pdf there are clearly n-gons. In 5 of my 10 test there were n-gons. There are always starfishes with 5 or more connected verts. If you want to nitpick on the wording go ahead but these meshes are shite.
I wasn't picking a fight or scoring points. This isn't Reddit and I'm a grown adult. I'm trying to understand what you're saying and maybe learn something in the process.
> In 5 of my 10 test there were n-gons. There are always starfishes with 5 or more connected verts.
Ok. So you are basing your definition on the number of edges that meet at a vertex. My understanding was that the important metric was "number of edges on a given face"
The topology is decent but no artist is creating meshes like this. The name feels mismatched. I’ve seen some better topology generation papers at siggraph last year which addressed quads better, though I’d need to dig through my archive to find it.
The triangle topologies in this paper made don’t follow the logical loops that an artist would work as. Generally it’s rare an artist would work directly in triangles, versus quads. But that aside, you’d place the loops in more logical places along the surface.
The face and toilet really stand out to me as examples of meshes that look really off.
Anyway, I think this is a good attempt at a reasonable topology generation, but the tag line is a miss.
Yep, hard to reason with industry people pushing slop on commercial production teams.
Low-poly re-mesh tools have been around for ages (some better than others), but there are good reasons pro's still do this step manually all the time. Primarily "good" is based on _where_ the quads, loops, and unavoidable n-gons end up in the model (or stuff ends up looking retro 90's.)
There is also the complex legal side of algorithms not being able to create copyrightable works in some jurisdictions. Talk with your IP lawyer, this area gets messy fast when something famous or trademarked is involved.
That's fair, as someone pretty proficient in 3D modelling I understand your point. However, it also boils down to the scale of the project.
Imagine recreating part of real life city, creating a digital twin, for scientific purposes (testing human behaviour in fire hazards, or simply iterating on better park planning and road design for greater perceived safety). There's a lot to be done, and it's difficult to use procedural building methods if your aim is for people to recognize that area.
I'm making such a thing myself, purely academic, but god I wish I could speed things up.
Procedural emission of textures, biomes and cities is not ML/AI generated... Also physics simulation of erosion for landscapes may look natural to most people.
The problem is when groups start gleaning styles and artwork from 3rd parties to make something in the same style... they cross an ethical line, and a legal one in some situations (even if the original work is completely isolated from the output.)
Thus, while a stochastic parrot may be able to dodge outright plagiarism, it cannot sidestep copyright laws in some Markets.
I'd rather pay folks for royalty free content like Poly Haven offers to the community. =3
Oh, for sure. Wholeheartedly agree. We are little by little eroding the foundations of an economic system which allows individuals to get recognized and rewarded for their hard work.
I may have not worded things well, I was trying to speak of modern 3D reconstruction methods, such as NeRF or Neuralangelo. I can see good uses for them, as I need to fool the senses reliably (participants will be taken to a VR world... mimicking a real place). But as many things in this field, the reality is that these methods aren't up to snuff. Still, it would be nice to be able to capture reality for non commercial purposes.
As for Polyhaven, haven't donated yet... but I hope to do so soon :)
on the provided sample "hat". I tried with and without checking "Preprocess with marching cubes" and "Random Sample". Both outputs had holes in the output mesh where the original did not.
Calling these meshes "Artist-Created Meshes" is disgusting. I know researchers in this field want the word "artist" to follow the same fate as "computer" thanks to their work, but it's too soon to say the least. Can we get AI researchers? I bet RLHF can make their writing more humble than the current ones.
Weird custom non-commercial license unfortunately. Notes from the GitHub readme:
> It takes about 7GB and 30s to generate a mesh on an A6000 GPU
> trained on meshes with fewer than 800 faces and cannot generate meshes with more than 800 faces