Hacker News new | past | comments | ask | show | jobs | submit login
This Is a Photoshop and It Blew My Mind - Photosketch (gizmodo.com)
321 points by AjJi on Oct 6, 2009 | hide | past | favorite | 67 comments



Sigh...third time today and I only got one karma point for pointing it first. http://news.ycombinator.com/item?id=862216 if you feel sympathetic :)

EDIT: gee guys, the smiley should tell you this ^ is not a serious complaint. I thought people might actually be interested in reading the actual paper v, since project page link has been inaccessible. Sheesh.

Site is down but this is the Siggraph paper: http://www.ece.nus.edu.sg/stfpage/eletp/Papers/sigasia09_pho...

I have downloaded the binaries (also requires openCV1.1, recently updated OpenCV2.0 doesn't work) and have made some progress, though it's very clunky and the instructions are, um...lacking. http://sourceforge.net/projects/opencvlibrary/files/ http://opencv.willowgarage.com/wiki/


In today's hypercompetitive world, branding is key. Compare:

"Photorealistic image composition from simple sketches"

             vs.
"This Is a Photoshop and It Blew My Mind - Photosketch"

Which one would you rather click on?


The former, obviously. This is why I can't have nice things :-D


And some one said HN doesn't accept sensational headings.

PS: It was in the days I started here. Fair enough, it has been a while.


The one that isn't a PDF.


This guy got downvoted for giving the most informative comment in this thread?


Well, I think he got his karma back now. But yeah, the way some people downvote gives me a funny feeling.


I'm tired of meta-comments. I'd rather discuss the link. I down vote all meta-comments.


Don’t you now have to down vote yourself?

It’s ironic :)


You can downvote our subsequent comments but his original was about the paper that belongs to the article :-)


> the smiley should tell you this ^ is not a serious complaint.

Then why put the link at all?


because it's useful?


The obligatory shark attacking a helicopter is a nice touch. This technology has epic potential in the LOLcat market.


I'm having a very hard time believing that this is real. If it is, they have just made breakthroughs in multiple domains in computer graphics, recognition, and composition at the same time. Here's hoping it's real.


My guess is that it's real, but that it doesn't usually work as well as the example images provided. If it really works for any kind of input sketch, then how come there are so many images of dogs catching frisbees and bears catching fish?


Yup. When I see a real-time demonstration showing a bear catching a frisbee (probably, but not definitely, while jumping out of a shark-threatened helicopter), then I'll believe it.

Edit: I'll also believe it if anyone on this site claims to have seen same.


The algorithm is optimized for dogs/Frisbees and bears/fish.

Reminds me of a company I worked for they were working on a video compression and the video source the developers had was a security camera.

The compression was optimized for bookshelves and hand waving.


The Gizmodo story says that they presented it at SIGGRAPH Asia 2009. According to http://www.siggraph.org/asia2009/ that doesn't take place until December.


Papers for SIGGRAPH & SIGGRAPH Asia are accepted a few months before the conference.

These are the best two conferences in computer graphics, and the bar to get in is extremely high -- reviewers are very very tough and submitted papers have to not only be technically accurate, but also very polished in writing and presentation (and video).

That being said, a lot of these kind of systems have been coming out in the past few years, and it's a little hard to judge how successful they are due to the enormous amount of work required to reproduce results (their releasing a binary is commendable).

My own personal feeling is that this system probably works quite well for "common" things in their database, but there might be many small artifacts in generated images. Also, if you start trying to include stuff that's not well represented in the database, then the artifacts probably become quite severe.


Well, it is listed here: http://www.siggraph.org/asia2009/for_attendees/technical_pap... "Friday, 18 December | 9:00 AM - 10:45 AM | Room 301/302" Not sure what to think. Maybe they're writing in past tense in the same way sports sites write "such and such team played on Tuesday" even though it was Tuesday at the time of writing.


The technical terms and researcher's names pass the Google sanity check, e.g. http://cg.cs.tsinghua.edu.cn/prof_hu.htm, http://en.wikipedia.org/wiki/Kadir_Brady_saliency_detector.


Surprisingly, not really. Computer vision has been moving along lately. This is good stuff, but incremental progress against the backdrop of other good stuff in the field.

A few times in the past few months the question of whether "all the good stuff" has been discovered or whether there's no progress left to be made came up, and computer vision has been one of my go-to examples of a field that has just been booming lately. Interestingly, digging in shows that it's still not "AI", just as you don't see any "AI" here, but there's still been a qualitative sea change in the past few years. I think the fact that a $1000 machine is now unbelievably powerful has been a real boon for the field.


I have seen this argument being made frequently here recently--that the "AI" that we have produced is not "real" AI. This is understandable true in a way, since after all, AIs cannot do many of the things that natural intelligences can do. And so there is this perceived disconnect between the weak AI that we have (rules, heuristics, belief networks, bayesian inference and friends) and the strong that we AI (magic?).

But there is a similar disconnect that looms between the rudimentary intelligence of simple organisms and the more sophisticated intelligence higher mammals. Many researchers have noted this taken the insight that with a simple set of rules you can get complex behavior. This has lead to something of a revolution in ML, robotics and neuromorphic engineering. If this analogy holds, then it may be that the distance between weak and strong AI will be bridged in the same way that the distance between single celled organisms and primates appears to be bridged: by iteratively building complex systems on top of simpler building blocks. There simply won't be a secret sauce to be found, but each layer of complexity will allow for increasingly sophisticated behavior.


When I took an AI class back in University, the professor noted that the term AI is usually only applied to the problems we can't solve. As soon as we determine an algorithm for something, they no longer call it AI.

So technically, it was a Data Mining class I guess.


This is one of those SIGGRAPH papers that are like magic tricks -- unbelievable until you find out how they're done and what their limitations are.

There are existing projects that pair an image composition function with a tagged image database (like this one: http://graphics.cs.cmu.edu/projects/photoclipart/). This project sounds like it adds a fitness function so that it can find an output image that looks good.


Most of the parts have been done (a couple things look new, but just read through the list of SIGGRAPH papers for the last few years; the field is sprinting forward, as computers become fast enough to do this sort of computation), but putting it all together is pretty damn impressive.

Now, as hard as what's been done on the algorithms side, someone has to figure out a usable user interface for these tools, so that regular people can play with them.


I agree... But they have the source available if you can get to the site. Otherwise, here's the Google cache link:

http://74.125.95.132/search?q=cache:http://cg.cs.tsinghua.ed...


The download doesn't include source.

It does include 5 executables totaling ~140k. I am not quite brave enough to run them but I did run them through strings, and a cursory inspection suggests either that they're either attempting to do what they're claimed to do or they're the best disguised malware in history.


So gentlemen our plan is simple. We study computer graphics for 10years, get PhDs, develop a system that gets headlined at siggraph, become famous and - then we release a binary that the HN readers will download.

World domination is in our sights <evil laugh>


Oh the software's legit - I ran them all and nothing bad happened :) It's just not very stable or easy to get going with, is all.


"Real" is relative. It might be "real" for a set of a hundred images but it might seriously bog-down or spit-out garbage if you tried to implement it for ten thousand, never mind ten million.

Image processing is full of things that sort-of work, that are impressive but not fully reliable.


We've been seeing some pretty fascinating developments in graphics and pattern recognition recently. A few days ago someone submitted that amazing Photoshop plugin that allows you to select a region and move it around seamlessly, and also fill in areas seamlessly. The demo showed the software filling in the broken areas of the Pantheon and the results were quite impressive. A combination of both technologies would be simply incredible.

Edit

The official site of the plugin is definitely down. Its probably overloaded by all the people who are interested in checking it out. ;)


Just FYI, that stuff to which you refer is a teaser for CS5, the next iteration of Photoshop...due around April, most likely.


The SIGGRAPH presentation video for this is here: http://www.youtube.com/watch?v=dgKjs8ZjQNg


Siggraph is home to some crazy presentations. I'm always surprised by the more impressive videos and papers, but on the other hand, never that surprised, just because it's always only one step crazier than the crazy thing from last year's.


the automated object recognition and extraction is what impresses me. everything else is a killer usage example of that core technology.


Ariel Shamir (4th author listed in the paper) also worked on Seam Carving (http://www.seamcarving.com/) and Improved Seam Carving (http://www.shaiavidan.org/papers/vidretLowRes.pdf), two widely publicized papers from the past two years.


Hey, this is the technology from Wag The Dog.

De Niros character is a Hollywood producer, hired to produce a fake war in Albania to distract from a sex scandal before an election. At one point, he's directing news footage by shouting something like "there's a girl in front of a village, it's on fire .. hmm, no, more smoke.. her hair is too light. Can she have a cat? Show me cats" while having a technician type in the request and watching the result appear in real time.


Ha, true. FWIW, though, the character you're describing is played by Dustin Hoffman. Awesome movie, though.


Yeah, you're right. I think the quotes page on IMDB has the names backwards, or I'm really in need of re-watching that film.


I work in the VFX industry, and can confirm this is how we have been doing things for years.


A link to the original research paper that works: http://cg.cs.tsinghua.edu.cn/montage/main.htm


This is going to make the copyright brigade go into apoplexy if it starts to become popular!


The probability of mis-matched images seems like it would be incredibly high, but if they manage to pull this off, it would be amazing.

However, I wonder about this technology being used for evil (custom porn). =/


Wait, custom porn is evil? I guess I'm the only one who's tired of all this cookie-cutter porn.


So glad the downvote maxes at -4

|-----------|.......................................

|..............|Bear...................................

|..............|.........|-------------|...............

|..............|.........|-------------|Fish...........

|-----------|................................|--------|

....................................Yo Mom|--------|


Box for 'Yo Mom' needs to be bigger! ;)


  probability of mis-matched images seems like it would be incredibly high
It would still be quite useful if the sketch narrowed the options to a list. Traditional point-n-click could refine/revise from there.

This innovation seems to be a shortcut mechanism based on pictographic gestures, where they also take the position, relative size into account. This is very nifty.

I cannot see how this would be faster than traditional Illustrator/Indesign workflows of selecting the elements from dropdowns, dragging to correct location and dragging the handles. If you use their sketching method, you are probably going to have to fall back to 2D drag'n'size methods post generation


If this could be extended to movies, the possibilities would be awesome. Just write a script for a movie, sketch out the different scenes and voila, a ready made rough of your new romantic comedy!

The program could be extended further to recycle old movies, and to replace the actors heads with new one.

I have to start writing a business plan for this right away. Please send me a message if you want to join in on this.


In all seriousness, that's part of why I find it so interesting. The software works (I use the term loosely!) on downloaded images, ie it doesn't yet include a web client that does an image search for you.

So for movie purposes, you could save a lot of time using a (more polished) version of this to prepare your storyboards - shoot pictures of your selected or desirable actors, background plates that look like your desired locations, and major props. Draw stupid sketches and voila, you have rough photoboards. In conjunction with some other imaging technologies, it has massive possibilities. There's a saying that a film (especially a low budget one) lives or dies in pre-production; the more decisions you can make before you begin shooting, the less expensive the production process is and the more predictable and cheaper your post-production will be. Sure, quite a lot of Great Art happens on the spur of the moment, but serendipity is rare while dithering is common, and expensive. Adobe, for one, is pushing strongly to bring the use of their tools forward in the production process, so that the film is sketched out before shooting takes place and directors can spend more of their time 'filling in the blanks'.

What you mention (going from this to an actual movie) is obviously not practical now, but I'm happy to say that it's being reduced to an engineering problem - the fundamental technology to do most of what you describe already exists, and it's a matter of making it usable and timely. I will venture a guess that we'll be able to do this in clunky/very expensive form by 2015, and by 2020 it will be possible to make an entire feature film this way that looks about as good as a mid-90s low-budget sci-fi film - say, Escape from LA - at home.

tl;dr although this has been presented as an entertaining toy, with thematically-organized material there's a lot of near-term commercial potential.


  ------------------------------------
  | |+Obi-Wan|                       |
  |  --------                        |
  |     -------------                |
  |    |+Qui-Gon Jinn|               |
  |     -------------                |
  |  -----------------------------   |
  | |s/Jar-Jar Binks/Halle Berry/g|  |
  |  -----------------------------   |
  ------------------------------------
Yeah. I'm so in.


Did anyone get a chance to try it before their site went down so we can hear some first hand accounts?


Alright I may have a tendency to be overly excited about new technologies, but this is seriously amazing. The possibilities (and problems) that arise from this application are vast. This could completely revolutionize the stock photography industry and web design in general.

Copyright and privacy issues are my greatest concern though. For example, could you imagine seeing yourself as a digital model on some corporate website--doing something you never actually did. That is a scary thought in my opinion. I wonder if you could input your own pictures into the system and have it perform the same procedure though (I loathe the pen tool).


Sounds like hyperbole, but this technology has the potential to change the way people use language and speak and think.

Ever notice how many Americans use "like" as a preamble to a reenactment of a scene from a TV show, or an event, or even an abstracted, generalized occurrence? "It was like..." then on to the enactment. People wouldn't speak like this if it wasn't for ubiquitous video entertainment.

In David Brin's Uplift trilogy, uplifted sentient dolphins sometimes "spoke" to each other by mimicking echolocation returns and beaming pictures and scenes directly into each other's heads. If software like this gets good enough to compose scenes for us on the fly, it will drastically alter the way we speak, just as television did.


People wouldn't speak like this if it weren't for California girls in the 80's.


That's not just a California thing, actually. Growing up in 1970s Ireland, the word was sprinkled liberally throughout our sentences, like. You may like this exploration of metaphor as a fundamental part of consciousness (if so, do follow up and read the mentioned book): http://www.therebel.org/opinion/health/the_thing_to_be_descr...


Wow, this is revolutionary. Anything that can use sophisticated technology to actually make things simpler for the end-user, instead of the other way around, is definitely a plus. I want to try this thing out right now.


If this is for real, then why can't the same thing be done for music? Drum a beat, whistle a tune, sing a few notes - and let the software extract best-matching samples from a library of millions of songs.


I don't get how this is going to replace photoshop.

The images will not look as good, or as clean as a regular photoshopped image, I doubt that automated image editing is that advanced right now.

The examples they gave, is probably the best that was available, I bet on average, the results don't look nearly as good or clean.

But hey, I could be wrong.


Good enough for display on TV news or a newspaper ?

Get me a picture of the president, put a girl in a red dress in his eye-line. Get me a picture of evil dictator - add some WMD in the background.

At the moment you need an intern to do this be much better when you can do it yourself.


photoshop + 24hours of work = awesome image

photoshop + 1h of work = worthless image

this snake oil thing + 1h of work = so-so usable image


Gotta love the random deer in picture 5.


I really want to try this, but the actual site seems to be down now.


boring. i have been able to do this in emacs for over 10 years now


That picture with the weeding, the sunset, the sailboat and the birds is so unnatrual that it makes my soul twist.


ah, nothing better than a bit of weeding at sunset


Images aren't from flickr, not enough cats.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: