Hacker News new | past | comments | ask | show | jobs | submit login
ImageMagick, four point perspective distortion in a video (silveiraneto.net)
145 points by silveira on Dec 7, 2014 | hide | past | favorite | 60 comments



Well... you can do that with only ffmpeg.

ffmpeg -i zelda_720p.mp4 -vf perspective=60:90:589:147:50:415:582:418 output.mp4

Edit: BTW on your picture it's not (50,145) but (50,415) for the bottom left corner.

Edit: if quvi supports still works, you can ffplay "https://www.youtube.com/watch?v=rMXD5DxbXog" -vf perspective=60:90:589:147:50:415:582:418 (otherwise something similar should be doable with youtube-dl)

Edit: you can also do better and enable the perspective only in the part where it matters. Typically with something like -vf "perspective=60:90:589:147:50:415:582:418:enable='not(between(t,0,5))'"


Wow, impressive, I think I now more about ffmpeg now. And I fixed the (50,415), thanks.


ffmpeg is the ImageMagick of video.

(GraphicsMagick is the ImageMagick of pictures, for me. IM has a few more features, but GM is more stable and usually much faster.)


GraphicsMagick also has fewer dependencies, I imagine because it does less things, but none is something I ever needed. I.e. within macports

$ port deps imagemagick Full Name: ImageMagick @6.8.9-8_0 Extract Dependencies: xz Build Dependencies: autoconf, automake, libtool Library Dependencies: bzip2, djvulibre, xz, jbigkit, jpeg, lcms2, libpng, tiff, webp, zlib, fftw-3, freetype, fontconfig, ghostscript, libiconv, libtool, openjpeg, openexr, expat, libxml2, pkgconfig Runtime Dependencies: urw-fonts

$ sudo deps graphicsmagick Full Name: GraphicsMagick @1.3.20_0+q8 Extract Dependencies: xz Library Dependencies: libxml2, bzip2, xz, zlib, libpng, tiff, freetype, libiconv, libtool, lcms2, jasper, jpeg


Completely unrelated, but what do you think about vips (http://www.vips.ecs.soton.ac.uk/)? I recently discovered the difference in performance between im and gm and that let me to investigate vips, which is presumably (a lot) faster than both.


In principle, it looks promising (in particular, using ORC to compile image manipulation kernels to SIMD), but it is a different thing to im or gm. It's a library and a GUI. The library is slightly too low-level for most things I'd use gm for, but a GUI is too laborious.

Perhaps it could be integrated into GM or something similar, and used as a back end for certain operations.


yeah, was going to say something similar, but with a simple openGL program...


If you're dumb like me and don't realize what's going on at first: the source clip was of some guys talking about a game being shown on a video monitor at an angle. But he just wanted to see the game, not the guys. So he extracted the video and made it fullscreen. He did it by writing a script to turn every frame into a PNG, run them through an image-processing tool, then recompress a new video. Thankfully the monitor did not move much so some fixed distortion parameters worked.


Thankfully the monitor did not move much so some fixed distortion parameters worked.

Even if the monitor moved alot (http://hyperboleandahalf.blogspot.com/2010/04/alot-is-better...) it'd be fairly easy to write an algorithm to detect the four corners of the video. You basically want to throw some edge detection at it, and then look for anything that seem like corners.


> You basically want to throw some edge detection at it, and then look for anything that seem like corners.

Which, if you're a web dev like I am, seems scary at first, especially when you're (like me) lacking a CS education. However, as it turns out, understanding edge detection and Haar cascades (for feature detection, that was the problem I was solving) enough to be dangerous with OpenCV is surprisingly easy! I recently built some facial feature detection stuff in it that is in production right now, and it only took me a couple weeks :) So, have a play!



Right, but then you wouldn't use imagemagick for the task anymore...


This is a built-in filter with Cinelerra, with no intermediate files required.


Some guys talking about a game: Shigeru Miyamoto and Eiji Aonuma talking about the new Zelda game.


With some fiddling in matrices this could also be done live in the browser as a CSS 3D transform using matrix3d or in a WebGL context on the iframe or video element.

Have no time to do the rectifying math on the components today but basically it would be some kind of inverse of this projective transformation:

http://franklinta.com/2014/09/08/computing-css-matrix3d-tran...

quick non-inverse transformation adapted from http://math.stackexchange.com/a/339033/70086

http://jsfiddle.net/w4bkmeaq/


The math for working out the transform is actually the same (not an inverse!) since you are still just trying to map 4 points to 4 points.

I had a demo for the 'inverse' transform also: http://codepen.io/fta/pen/LHonf

Here is the same thing except using the video from this post: http://codepen.io/fta/pen/JoGybG


If you want to do this in much faster than realtime, calculate the transform once and apply it e.g. using OpenCVs remap (1).

1: http://docs.opencv.org/doc/tutorials/imgproc/imgtrans/remap/...


I have played a little bit with OpenCV in the past (http://silveiraneto.net/tag/opencv/) but I wouldn't know how to do it with OpenCV, while with ImageMagick it seems simple. I'd love to see an example with OpenCV as I could use it one day in real time with the camera input.


Here is how it would work in OpenCV:

https://gist.github.com/stschake/445aea35a3c9846573ad

I'm getting 50fps with the imshow, 100 without on an ooold Q6600. That said, remap is basically memory-bandwidth limited.


I always assumed OpenCV was about computer vision, and I'm (pleasantly!) surprised it's this extensive. Does this mean it's good as a general-purpose image processing library? Not a fan of Imagemagick and have been looking for something better and faster.


With OpenCV, it doesn't get any simpler than this:

http://www.pyimagesearch.com/2014/05/05/building-pokedex-pyt...

It's even written in Python.


if the camara and tv are stationary, sure. The most interest is being able to calculate the transformation and then apply in faster than realtime - especially for AR/VR applications as you can move around and it would still work.


What's the point of the PNG-to-JPEG step? Can't FFMPEG use PNG frames to make the final video?


I think I could. I'm encoding one to compare. I converted to JPEG because I originally was going to use a encoding that required jpeg input.


Aw, when I clicked this I expected it to be something much cooler: taking a video of one of the old, 2D-tile-based Zelda games, and then doing recognition on the video against the game's tile "alphabet" in order to correct for both perspective and noise. (Basically, a crude implementation of a live-video to machinima converter.)


If you're interested in ImageMagick take a look at Fred's ImageMagick Scripts (link below). There is some really interesting stuff there. I spent a good 2 hours on Friday evening just trying to digest some of it.

http://www.fmwconcepts.com/imagemagick/


With someone else mentioning Avisynth here in the comments (which is actually still used widely by video enthusiasts today), I became curious if there was a plugin for this sort of thing, and sure enough, I found Reformer[1]. With that, I figured I'd try reproducing OP's results. (By the way, while Avisynth is Windows software, as far as I know it works quite well under Wine).

Step 1: Download the video with youtube-dl. This gives us zelda.mp4 (original video) and zelda.m4a (extracted audio, requires ffmpeg)

    youtube-dl -x -k -o "zelda.%(ext)s" https://www.youtube.com/watch?v=rMXD5DxbXog
Step 2: Write the Avisynth script (in AvsPmod). Plugins used: LSMASHSource, Reformer, RemapFrames

    lwlibavvideosource("zelda.mp4")
    deskewed = q2r(last,"lanczos",60,90,51,412,588,148,581,417)
    normal = "[0 149] [1488 2063]" # frame ranges where to use original video
    ReplaceFramesSimple(deskewed,last,mappings=normal)
Plugins used: LSMASHSource, Reformer, RemapFrames

Step 3: Pipe it to x264 with avs2yuv (necessary under Wine and with 32-bit Avisynth & 64-bit x264) to encode the video.

    avs2yuv zelda.avs -o - | x264 --preset slower --crf 16 -o zelda.mkv - --demuxer y4m
Step 4: Merge encoded video and original audio back together with mkvmerge from mkvtoolnix. You could use ffmpeg here as well, but I find mkvmerge much nicer for simple muxing like this.

    mkvmerge -o zelda_muxed.mkv zelda.mkv zelda.m4a
And with that, we're done. The whole process took about 20 minutes (of which ~13min was spent encoding the video) and a few hundred megabytes of space (since there's no need to have all the frames as individual image files several times over). Other benefits include having the video run at the original framerate of 29.970 (OP's video runs at 25.000 since he forgot to set the framerate when encoding the processed images), including the audio as well not having distorted picture when the TV isn't visible (which was simple enough to do with the ReplaceFramesSimple function from RemapFrames). You can see the end result here:

https://www.youtube.com/watch?v=Jk_z4TiweHs

[1] http://www.avisynth.nl/users/vcmohan/Reformer/Reformer.html


Thanks for the frame ranges. So...

ffmpeg -i zelda.mp4 -vf "perspective=60:90:589:147:50:415:582:418:enable='not(between(n,0,149)+between(n,1488,2063))'" -c:a copy -c:v libx264 zelda.mkv


>-vf "perspective=60:90:589:147:50:415:582:418:enable='not(between(n,0,149)+between(n,1488,2063))'"

Incidentally, this is a pretty big reason why I'd pick Avisynth over ffmpeg for video filtering any day of the week.


Haters gonna hate. You can put that filtergraph in a file with line returns or whatever clarity you fancy.

Note that FFmpeg doesn't require wine to work on Linux (or any other system), and it also handles here the audio, the encode, and the remuxing.

(Ah, and it doesn't need to download dll on random dubious sites or lost forum threads to make it usable)


As a fan of AviSynth, the scripts that I build often make that line look so, so simple. :) And you can use ffmpeg (and a couple of other tools) in VirtualDub 1.10+ to directly render to MP4 with no interstitial file (it'll handle all the pipelining for you). They're all useful tools.


>As a fan of AviSynth, the scripts that I build often make that line look so, so simple. :)

And now imagine what those scripts would look like as an ffmpeg -vf command! That was basically the point - the -vf line is already pretty messy with just one range-applied command, and would become even more so if you started doing something more complicated. Avisynth on the other hand has actual scripts for the video processing, which scales to much more complex processing while still remaining accessible.

ffmpeg -vf might be good for doing one or two simple things to the whole video, but for anything more complex than that you really should use an actual video processing solution instead.


That's why you have -filter_script and -filter_complex_script options.


Since you're using ImageMagick, you could do without mplayer and use a simpler command for the first step:

    convert infile.mp4 %08d.png
Add -verbose in there if you want to see the progress as it goes.


Back in the day there was AviSynth to do this kind of thing. Unfortunately the version 3 rewrite which was supposed to use GStreamer and Ruby never went anywhere. Is there finally something similar for Linux?


VapourSynth[1] is what the AviSynth rewrite was supposed to be from what I have gathered. Hence the name ;). OTOH, it is still for the Windows platform, so not really what you were after but I thought I'd mention it anyway.

[1] http://www.vapoursynth.com/


DON'T FOLLOW THIS TUTORIAL!

Unless you like Adware or what else ...

(JDownloader installs some crap Adware.)


Hi, I'm the author of the post. I've been using JDownloader on Linux for years and never noticed a problem with Adware. Can you please elaborate? I'd really like to know more.


Never heard of JDownloader, but you might also want to look into http://rg3.github.io/youtube-dl/


E.g. here:

http://board.jdownloader.org/showthread.php?t=44832&page=16

It seems to occur only using the web-installer, which is the one you're offically prompted at by the creators of JDownloader.

Well, I removed JDownloader and manually fixed the start page and search engines of firefox to remove the sweet-page stuff, but somehow my firefox is still screwed.

Questionable decision by the maker of JDownloader. I can't imagine that the money he gets for this is worth the damaged reputation.

Edit: I just ran an adware scanner. Besides the Sweet Page stuff, WindowsManagerProtect and Hold Page was installed. Several Registry Entries and Firefox setting were infected. Unbelievable ...


Try http://clipconverter.cc

Online, free.


Since you didn't give alternatives, allow me: the program you should use is called quvi. It comes with a large number of scripts for extracting videos from sites and you can easily write more or patch existing ones, since they're all written in lua.


youtube-dl

Also, a while ago I was working on making JDownloader less terrible (it should work like wget, UNIX philosophy and all). It's very labor-intensive though, but I may continue if enough people are interested (or help out): https://github.com/espes/jdget


Can you upload the resulting video to a private youtube video? Would love to see this video proper. :)



The embedded YouTube video at the top looks to be the result of this technique.


My favorite us of this technique was at Hack Princeton a while back. Some people made an app that lets you take pictures of blackboards, automatically cropping and fixing the perspective of the board for later use.


I don't know if it was based on that, but a co-worker had a point-and-shoot digital camera a few years back (2009~2010) that had a similar feature for taking photos of black-/white-boards


I had a Canon P&S that did this in 2003 or so. It was nice - whiteboard mode was used after every meeting.


Some other apps that do this:

Office Lens: http://research.microsoft.com/en-us/news/features/officelens... - imports to OneNote and does OCR

Post-it: https://itunes.apple.com/gb/app/post-it-plus/id920127738?mt=...



For windows users (and with simple video like this one)you can do that in two minutes.

Drop the video in Virtuadub 1.10.4 and use it's buit-in "perspective" filter.

Using Blend mode and curves editor, you can only apply the filter to the parts of the video you want, so the part when gamepad is shown is not affected. (tutorial is here: https://www.youtube.com/watch?v=2MWoVY9mYbk)


Is the merit of the article in not using closed source applications? Because that can be done in 3 steps in Photoshop, AfterEffects, etc.


Or OpenCV even, which is open source. So no, I don't think the plus of the article is that it's avoiding the use of closed source. If you read some of the comments by the poster, you'll see that it's simply him solving his problem using the tools he knew and spliced together.


This seems a bit tortuous. You can do this sort of thing in Blender or After Effects (even very old versions that you can get cheap or free) very easily, and skip the PNG conversion stage altogether.


a) I totally knew what this was going to be since I am a Zelda fan and was surprised at the way the original video was shown

b) We live in an amazing time for software and computing

Fantastic job!


There was a similar story a couple weeks ago on HN where someone remapped the video from a time lapse of hand drawing. Can't find the link.


The story wasn't on HN

http://uberhip.com/python/image-processing/opencv/2014/10/26...

youtube link, https://www.youtube.com/watch?v=BPijRAK2NHg

This second video better shows the effect with uncorrected and corrected video side by side. https://www.youtube.com/watch?v=7xQ0WDmTyVY


its called homography... fairly strait forward to code...

http://en.wikipedia.org/wiki/Homography_(computer_vision)


Lots of useful info about ImageMagick, ffmepg, etc. in the comments.

Thanks everybody.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: