Two questions for the compression gurus here. Suppose you have a bunch of raw vi...

rokweom · on March 17, 2022

> If they look for scene changes and make new I-frames on scene changes, then we might expect that at least for the scenes that start identically in M1 and M2 they will get identical I-frames and P-frames up to their first edit if any.

> Question: are there any encoders that when encoding M2 to produce C2 can be given M1 and C1 as references using them to adjust I-frame spacing so as make as many C2 I-frames as possible match C1 I-frames?

I suspect if you were to encode both files with x264 with identical settings, in crf mode, with vbv disabled and keyint=0 (unlimited keyframe length), its scene detection should place I-frames in the same places (that is, on scene cuts, not on the timeline). Maybe some scenecut tuning would be necessary.

> That would allow C2 to be stored efficiently as a binary diff from C1. This could be handy if C1 and C2 needed to be checked into a version control system, or you needed to distribute C2 over a low bandwidth or expensive link to someone who already had C1.

I'm not aware of any automated way to do that, but you can do a similar thing manually using mkv's ordered chapters. You first compress the original cut, then you compress any additional scenes and insert them where you need. For example, you can make a mkv file for a theatrical cut of a movie, and then make a separate file for a director's cut that is only as big as the additional scenes are, since it uses the theatrical cut file for the common scenes.

323 · on March 17, 2022

> Are there any compressors that can figure out that they are dealing with something that already has undergone the "throw out imperceptible parts to make it more compressible" step done and just skip to the next stage, so they produce a C' that is a lossless representation of S'?

The compressor will try throwing away exactly the information that was thrown away the first time you compressed it. So basically it will leave the content as is, because there is nothing extra to throw away.

You can easily see this with an MP3 file at 128 kbps - the first time you compress it most of the very high frequency content will be thrown away - you can see this on a spectogram of the uncompressed file compared with one of the compressed file. But the if you compress it again, the second compression spectogram will look very similar to the first compression one, because there is not anything else that you can throw away.

But there is a complication - the audio file is typically stored in the time domain (PCM), but the compressor operates in the frequency domain (FFT), and there will be a conversion between these two domains that you can't avoid. This conversion unfortunately will lose a bit of information and degrade quality a bit.

jerich · on March 17, 2022

For the first question, I ended up using this trick for a video editing app on Android phones about 10 years ago in order to cut video together on low end, out of date (Android 2.3) phones. They couldn’t handle video compression in a reasonable time and we didn’t want to upload/download to process on a server.

The point of the app was to sync the cuts to music cues, so each clip had a defined length. I ended up doing it all through file manipulation. You can cut into a video file starting at any arbitrary I-frame then trim it to the desired length. I would cut the input videos down to size then concatenate the files, replacing the audio with the new soundtrack at the end.

It worked great, only took a few seconds to create the final edit. Of course you couldn’t overlay text or filter video, but I still think it was a valid solution.

With the requirement of starting each clip on an I-frame, there was some imprecision in where your cut would actually start—an arteur might have a problem with their masterpiece being butchered that way, but it would certainly work well for some special cases like efficient distribution or being able to show a diff that a video was unaltered outside of timing cuts.

acchow · on March 17, 2022

> If they just do something simple like make every Nth frame an I-frame then after the first different between M1 and M2 it is unlikely that C1 and C2 will have many I-frames in common, and therefore also not have many P-frames in common.

Modern encoders won't be using a fixed rate for I-frames unless you force it to. It will choose an I-frame when deemed optimal.

You're correct in that using a compressed source which goes through a video editor and then to another encoder will likely not pass through to the encoder which of the frames to encode were originally I-frames in the source. This is because video editors combine multiple sources together so there is no "single" source.

But if you're not actually editing and you're just "trimming" and "joining", this can be done with perfectly matched i-Frames and p-Frames, but probably not b-frames?

Even when using the commandline x264, you can specify which frame numbers you'd like encoded as I-frames.

walrus01 · on March 17, 2022

> You make an H.264 encoded copy of that. Let's call it C1.

> You then make a new cut of your movie, M2, which is mostly the same footage as M1 except that you've shortened a few scenes and lengthened others.

Your new cut gets encoded from whatever you've chosen from frame-index numbers and timestamps in the various original raw yuv project files (or ProRes, etc, depending on what your camreas output).

The existence of the original cut should be irrelevent because what you're not doing is taking the compressed output product of C1 and re-cutting it into C2 unless for some catastrophic reason you have lost or deleted all of the original raw/uncompressed original camera files.

If you are generating new C2 from your original uncompressed (or lightly compressed) files, of course the encoder will be making new decisions on where to insert I-frames and P-frames based on the duration of each camera cut, changes you've made to CGI, other visual factors in the re-cutting.

> Suppose someone has an uncompressed source S. They compress it with a lossy compressor producing C and distribute C to you. You decompress C producing S'.

> You then compress S' with a lossy compressor (the same type that the original producer used--e.g.., if C is an MP3 you use an MP3 compressor) producing C'. I don't know about video, but for audio (at least back in days when MP3 was starting to get big) C' would be lower quality than C.

all of this is generally a bad idea unless you have no other option but to work with received files that are already compressed. say for example you're working on a documentary about something in a conflict zone and somebody sends you a highly compressed h264 file recorded on a smartphone. in this case there is no uncompressed original available.

you're going to want to first extract it into an uncompressed yuv raw file on disk so that you can work with it in your standard editor, and then whatever scenes you choose from within it will get re-encoded into your final output.