Hacker News new | past | comments | ask | show | jobs | submit login
Videogrep: Automatic Supercuts with Python (lav.io)
352 points by dvanduzer on June 19, 2014 | hide | past | favorite | 37 comments



Very cool. What's the current best tool to help make the .srt file? e.g. the current best-of-breed text recognition / text alignment tool. Last time I looked at something like this there didn't seem to be a particularly robust solution, especially for text alignment.


Hello! Original author here. I'm not sure about the best tool for creating new .srt files, but for films, you can find pretty much anything on opensubtitles.org or subscene.com (although the quality varies). You can also download .srts for youtube videos (again, quality varies).


I will caution people that downloaded subs are very often misaligned (timing) with your video, due to various cuts of videos, intro logos, disc versions, framerates, etc., etc. Leave your download page open and check one-by-one against the source video.


For television, addic7ed.com


I think Aegisub (http://www.aegisub.org/) is the current golden standard in the anime community, which is one of the most active when it comes to subtitles.


Aegisub is indeed the subtitle editor, and the one I would recommend over everything else when it comes to making subtitles.


Last time I did this (~2011} "Subtitle Editor"[1] was a pretty decent solution for *nix-based systems. On Windows, VisualSubSync[2] was all the rage, iirc.

[1] http://home.gna.org/subtitleeditor/ [2] http://www.visualsubsync.org/


On the Mac, Subler can generate .srt files by OCR from both DVD and Blu-ray image subtitle tracks:

https://code.google.com/p/subler/

...although its OCR engine (Tesseract) can be a little sketchy.


Gnome subtitles is pretty easy to use and works well: http://gnome-subtitles.sourceforge.net/


I wanted to do exactly this, but I wanted to do it at a word level granularity. So that you could input any text and get a video of random clips from many sources each saying exactly the words in the input string.

I don't think the metadata is quite there yet.


Word (and phoneme) level granularity is usually used for lip-synching (CG, video games) and karaoke-type applications.

If you have an accurate text transcript, but not detailed enough timings, you can use speech recognition on the audio and it will be very accurate, since you know exactly what is being said (unlike speech recognition on arbitrary speech, this is more like command-and-control). You can do word-level or phoneme-level timing granularity pairing an accurate transcript with the original audio.

The metadata isn't there in regular subtitles, but you can certainly get it there with some post-processing.


Is there a good easy-to-use software/service that does that kind of alignment of accurate text transcript to audio/video?

I used to have some links but the companies went out of business.


Probably not for free. I also haven't done this kind of work in over a decade. The current links I have are:

Annosoft's SDK: http://www.annosoft.com/prices

Annosoft made a command-line front-end to the Microsoft Speech API, which is what many of these other Windows-based systems may also use, and I used in a project in 1999-2001: http://www.annosoft.com/sapi_lipsync/docs/ (There are other SAPI front-ends if you dig around online, too.)

Others, including open-source ones: http://en.wikipedia.org/wiki/List_of_speech_recognition_soft...

Magpie, used in animation and gaming: http://www.thirdwishsoftware.com/magpiepro.html

Crazytalk, used in animation, uses SAPI: http://www.reallusion.com/crazytalk/crazytalk.aspx

FaceFX, used in gaming: http://www.facefx.com/documentation/2013.2/W194

Source Filmmaker includes it, although I'd be surprised if it wasn't Sphinx or SAPI or some other existing library: https://developer.valvesoftware.com/wiki/SFM/Lip-sync_animat...


You could get a gigantic bunch of film & television subtitles. Almost every movie ever made and every TV show in the past 30 years has pretty good subtitles available from a variety of online sources.

Using common Python NLP techniques you could very easily search for every instance of a phrase across a massive corpora of subtitles.

If you got a large enough collection of subtitles and videos in a single directory this tool would do what you are asking.


I've had the same idea on the back burner for some time. My primary concern was - and still is - copyright.


Surely a one word clip would fall under fair use.


How about a program that automatically finds, say, the smallest number of segments that covers all of the input words? At least then the program would do a lot of the heavy lifting for you, and you could do the last bit of cleanup manually.


You should check out the Op3nVoice API: http://www.op3nvoice.com/

Lets you search video/audio for spoken words.


Something similar to this (but automated, of course)?

http://video.bobdylan.com/desktop.html


Here's an example of "true" video search: http://www.baarzo.com/


Nice tool, also I had no idea about the moviepy library used by the tool. Looks like a really nice little library for making small video edits in python. Cool!


Really impressed with the example of instances of specific grammatical structures. Really great application of something useful with this script.


I remember reading a while back that employees at big news networks (think Fox, CNBC, CNN, etc.) had access to some massive database of broadcast videos, and tooling built around the database to do exactly this. I can't find the source at the moment, but if anyone knows it, a link could be relevant.


This is also what "Google Video" originally did, before it became more like Youtube/Netflix.


I'm assuming you meant this article at ars http://arstechnica.com/gadgets/2013/09/with-30-tuners-and-30...


One thing that annoys me with subtitles is that when they even have all the sound effects. [SCREAMS LOUDLY], [OMINOUS MUSIC PLAYS] etc. So something like the Total Recall silence thing probably won't work to a great degree of accuracy in those cases.


Those are for people who are deaf.

Some places call this "closed captioning" (i.e. deaf target audience), versus "subtitles" (target audience is people who can hear, but not understand the language).


Solution: use foreign language subtitles (since the goal is translation and not helping with hearing impaired people, those effects aren't usually included).


http://www.youtube.com/watch?v=Wpd2VaFt5iY

please somebody make an automatic rap impersonation generator

can make use of karaoke youtube clips for the background music...


I can't find, anywhere in the documentation or a quick skim of the source code, any clue as to which version of Python this requires.


Based on the style of print statements (no parens), I'd say python2.


Fascinating!


very cool!


Absolutely irrelevant correction: Jay Carney is the current, not former, press secretary.

https://en.wikipedia.org/wiki/Jay_Carney


Well, yes, until tomorrow.


Ah, didn't realize that. Everything is clear now


The video produced was extremely entertaining and insightful. I can imagine this tool being very useful for big data analysis.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: