Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I made a computer vision addon for Blender (github.com/cartucho)
268 points by morroida on Oct 3, 2020 | hide | past | favorite | 35 comments



So many likes, was not expecting that! I will be presenting this work tomorrow at MICCAI and then I will post my presentation link in the README of the repository!


Thank you for your work! Looking forward to the presentation.

I had to look up MICCAI. To others: 23rd INTERNATIONAL CONFERENCE ON MEDICAL IMAGE COMPUTING & COMPUTER ASSISTED INTERVENTION (4-8 OCTOBER 2020) https://www.miccai2020.org/en


done! added to the GitHub README

[presentation video](https://imperialcollegelondon.box.com/s/cg54pddsf2pkx4ngf4pg...)


Love it! Should maybe add a link in addition to the video "image". Was not intuitive if it is a an image from a video or a link to a video (I am not used to video "preview" on Github).


Thank you! Yeah, I found that "hack" of the "image-video" in StackOverflow. Will add the link too as suggested


Guys I won the best paper award!


Very, very cool work, I fully expect to use this exact project in the future!

To those of you who always thought this stuff looked neat but never tried it out, and to those who may have used blender in the past and gave up, I would HIGHLY encourage you to try again with the latest version of the software. Although there is still a bit of a learning curve, there have been massive improvements have to the software suite.

The ability to code audio/visual in blender is just incredible. I describe it like this: Imagine yourself as someone trying to code an image that looks like a tree in machine code. Then imagine your partner comes over, sees what you are working on, and hands you Python and a fully setup IDE, it's like being given literal magic.

I downloaded the latest version earlier this year to write some basic AI simulations (cube wars!) and to create models for my 3D-printer. Like when I first learned to code, the fun of the machine totally sucked me and I got completely off-task from my original goal. Lately I've been working on two things with the same lines of code, music videos and simulated walks through forests(Think the movie Avatar). With only a few hundred lines of code I am able to generate infinite forest trails in which you can walk (or fly a drone-style camera) through, synced to music that is generated by the AI-mushrooms WITHIN in the scene itself! Literally was able to go from 0 to highly visually engaging trippy music videos in the last year with minimial musical production experience and with no music-video production background. The ease in which you are able to generate things via code is stunning and the limits feel completely boundless.


Similarly, when my niece recently got married I was tasked with putting together the photo/video show for the event.

I chose Blender because it's easy to use and loaded with image and video capability.

Plus, it gave me a good reason to come upto speed on the latest version.


Very cool! I've done something similar for improving an OCR system on crinkled paper[0]. Blender is a powerful and totally underutilized tool for this kind of work

0. https://www.arwmoffat.com/work/synthetic-training-data


I've thought about doing this myself! Did it end up improving the OCR system for real world images?


The startup ran out of money before we could find out :) It was sort of a skunkworks project.


Uou this is awesome! And it's very nicely presented in the website. I'm wondering how you mapped from the UV to the 3D model. I would like to add that feature to the addon.


It's been awhile since I've looked at the code, but take a look at the code around this https://github.com/amoffat/metabrite-receipt-tests/blob/mast... for mapping from UV space to image space

TLDR: using a KD-tree, I find the face containing the UV coordinate. Then I transform the UV coordinate to barycentric coordinates within that containing face, then put that barycentric coordinate through the local -> world -> view -> perspective transform matrices


A common approach in rendering engines to convert screen space coordinates to objects is to render a second image with light and shadow disabled where the color uniquely maps to an id. You then can uniquely identify 24 bits worth of objects without needing to maintain a KD tree.


What the heck. This is beyond awesome I totally want to try it out


Does using synthetic training data introduce any problems? How do you ensure your synthetic data matches real data?


Ok, this turned out to be far more interesting than the title here reads like. The little abstract at the top is far more informative:

> A Blender user-interface to generate synthetic ground truth data (benchmarks) for Computer Vision applications.

And it lets you make stereo images, depth maps, segmentation masks, surface normals and optical flow data from the rendered animation, and export it all in .npz numpy format. Quite interesting project.


This is awesome! I’ve been working with blender scripts a lot lately for my side project where I generate jewelry for 3D printing (https://lulimjewelry.com)

It’s an incredibly powerful tool, IMO one of the best large open source applications. I’ve learned some good ideas by reading the plug-in here, thank you!


Really cool! I just sent you an email (I love your contact info on HN btw!)



The difference is that none of the above examples actually process real image input - the Cartucho app utilizes stereo cam images.


I dont think so. It generates stereo images from 3D (not takes them as an input) so it is exactly related to blenderproc and other tools.


Yes... you are correct. My mistake.


Very cool! Just the other day I was trying to set Blender’s camera based on a standard 3x4 computer vision KRT matrix, and it is surprisingly a pain in the ass —- I wish more of these graphics CAD packages (Blender, Houdini, Maya) made it easier to deal with vision data.


I agree, these tools should have an official computer vision module since so many people are using synthetic data these days.


Excellent! For a while my job entailed this very thing: creating synthetic data for computer vision, and I used Blender as well! You've done a great job.


Wouldn't a Show HN post make more sense here? I noticed that they tend to fare better (keep the interest active for longer)


One question: why does it need to run inside Blender instead of just using its graphical primitives as a library?


That's a good idea, it would make it faster for sure.


Is this face motion tracking using a camera to apply to 3D models?


"synthetic ground truth"... isn't that an oxymoron?


Ground truth in this case means images with labels of some sort on parts of the image or objects in the image, not that they're real images themselves. So a cat in a photo with a label or mask on the cat would be "ground truth" on what part of the image a cat is in, or that there is a cat.


But if it's CG cat then it's not a real cat, hence not "ground truth".


Haha, just pointing out how the term is used in the industry. If you want to really play with semantics then I would argue that even a photo of a real cat isn’t a real cat and thus not ground truth. But while the consequences of that might have the awesome effect of clearing the shelters, it’s not practical, so we just call any labeled pixels representing a cat to a fidelity good enough for our purposes ground truth. ;)


Nerb here. I can't say what this does but build a frame for ai comparison? Blender doesn't need so much eyes anyways? Buuuuut it doesnt have them either in this way?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: