Hacker News new | past | comments | ask | show | jobs | submit login

I have a few comments for you guys about our demo:

1. Image to Image is a bit misleading because we are fairly strict in our online demo in pose requirements, so many faces are "non-suitable" because they aren't frontal.

2. It works much better for video settings (but that type of online demo is way too CPU intensive) because we can organize each person into continuous tracks and we only need them to become 'frontal' once to do matching. And the more they are frontal, the more data points we have.

Here's an example of recognition applied to video: http://www.youtube.com/watch?v=jsjf3IDXef8




Does this technology work using both images simultaneously? Or could lots of images be pre-processed individually into something that allowed a cheaper comparison?

Just to try to explain the question better, here's an example. Let's say I have 10 images and I want to find the most similar people among any pair of images. Do I need to run every pair of images (45 full comparisons) or can I pre-process the 10 images into something such that the 45 comparisons can be done in a less expensive way?


Recognition is generally broken down into two steps, processing faces into "templates" and then comparing those templates. Generating templates includes all the preprocessing stuff as well: detecting faces in an image, estimating their pose, and finding landmark points. Our site goes into these issues in some depth (with some examples). So yes, we do break the process down: generating the templates can be done individually for every image, which allows you store that result and use it for future comparisons.

Generating 2 templates is many more times expensive than comparing those templates. However, as your dataset grows, generating templates grows at N, and the number of comparisons you need to do grows at N^2. So eventually, comparisons dominate.


It worked quite well for my face but not my wife's ( http://webdemo.pittpatt.com/recognition_demo/view.php?id=X3Y... ) because her head is slightly tilted and not facing front in these pictures. I'm certain your video recog. is better but I think you should look into making the requirements for photos a bit more lax because very few photographs have full-frontal, upright faces. And once you get it right, talk to Facebook/Myspace etc. :)


Impressive. What's the implementation language? I assume C.


It's a hodgepodge of low-level stuff. Some C. My favorite is the SSE-optimized assembly :)


Why do you assume that? It could just as well be MatLab or something similar, that makes fiddling with multimedia very easy.


How would you create a stand-alone executable file using MATLAB?


Performance


MatLab-performance can be quite good :-)


OK, I'm not familiar with MatLab performance. :-) But image processing is very computationally expensive, and the demo here is unbelievably fast. It must be optimized for its specific tasks. You just can't get that from a general package.

In retrospect, it's not surprising that they're optimizing some parts with assembler - even SSE (Streaming SIMD (Single Instruction, Multiple Data) Extensions http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions), which I hadn't heard of, but which is the kind of vector parallelism that gives supercomputers their speed.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: