Oh, someone has to manually time the movie. They only support about six movies. I expected that it would use closed captioning data, do the sync automatically, and support far more titles.
We had seamless synchronized open-standard multimedia (W3C SMIL) in 2000 but not 2020. There are now attempts to bring back a subset via epub standards. Meanwhile, most Internet traffic is now video and there's no standard mechanism to provide contextual timed commentary and other annotations.
The syncing is done automatically, at least mostly.
TL;DR: ScreenplaySubs fetches the subtitles from Netflix, parses the PDF-formatted screenplays into JSON, and syncs by calculating the sentence similarities between subtitle and screenplay dialogue.
In particular, we use the Universal Sentence Encoder for deciding whether a subtitle matches with a screenplay dialogue. If a screenplay dialogue is similar enough with the subtitles, the former will be tagged with the timestamp provided by the latter.
Our screenplay-pdf-to-JSON converter is able to contain all dialogues, transitions, actions within a particular screenplay scene. With this, we’re treating scenes as atomic, being able to detect changes in scene ordering based on the tagged scene timestamps. This also means if dialogues are swapped within a scene in the movie, there will be some syncing inconsistencies.
Some scenes do have little to no dialogues, which would pretty much cause the extension to work on a best-effort basis. E.g. The opening scene of There Will Be Blood has very minimal if not no dialogue at all. This is the case where I need to jump in and sync up the screenplay manually. OTOH, the opening scene of Inglourious Basterds will work very well, since there are tons of dialogues in it. This is the reason why I can’t just add movies and instantly upload it to the site.
Would you be interested for me to get into more details? I was thinking of writing a series of technical blog posts if there are enough interests!
Interesting work. Glad you've been able to chart a path through some tedious problems.
Over the last several years I've imagined a lot of projects (both serious utilities, and the absurd/artistic) in roughly the territory you're exploring...
- For my MFA thesis (2012) I used plaintext (thankfully, though they had plenty of their own problems) transcripts of a TV show as a corpus for generating poems from, and at the time I thought it would be an interesting follow-up project to turn them back into video clips.
- Mapping film quotes/citations back to the script/film and accuracy-checking movie quotes. (can imagine both of these being useful for film forums like the movies/sci-fi stack-exchange sites).
- Generating script-cuts of movies that re-order/drop scenes and just show the printed script on-screen where scenes were cut.
- A film-analysis/screenwriting-class sort of interface oriented around reading a segment and then playing it (could be particularly interesting when there happen to be multiple known script drafts?)
- Re-constructing a character monologue from lines spoken by an actor that turned down the role.
- Generating a super-cut of actor X saying Y.
- Generating focused cuts of a film that cover, say, every scene a given character does/doesn't appear in, or every scene that mentions X.
I'd definitely be interested to read more about the tech. I wonder if it can be used to time-sync audiobooks to their ebooks counterparts.
This is my use-case:
Kindle has a feature called "Audible Narration." You buy a Kindle book, and the Audible audio book, which allows you to play the audio book while it highlights the words on the Kindle book as you're listening. This effortless switching between audio and text enables some interesting reading behavior. Certain books become easier to read. Note taking also gets much easier (Highlighting text is much easier than bookmarking timestamps on an audio book).
The problem is, getting your annotations and highlights and other data out of Kindle is very difficult, because Kindle does not have a public API. Same with Audible.
So I'm thinking of emulating Audible narration with a hybrid ebook/audiobook reader app. The ebook would be a simple HTML page (converted from epub, formatting be damned) and a simple audio player. As the audio plays, the HTML page would scroll and words would be highlighted.
Challenge is to timestamp tag the HTML with the audio track. I'd guess I could TTS the audio track and then somehow diff the generated text with the epub content. Given that some audiobooks are abridged, some read the footnotes on each mention, and some explain the visuals, I would assume diffing would not be very straightforward.
The task is called 'forced alignment', take a look at aeneas and other projects at https://www.readbeyond.it/ :) IIRC, Aeneas has some features for handling extra text and the beginning/end of the book, while abridgement etc. isn't handled.