Ha, the famous sandboxing remark. I wish it was that simple!
So, let me share some light on the sandboxing for multimedia (I work on VLC).
If you sandbox an application like VLC, in the current way of doing sandboxing, which we've done for macOS, WinRT/UWP, and snaps, you still need a lot of permissions.
Namely:
- you need to be able to open files without user interactions (no file picker), in order to open playlist, MXF or MKV files;
- you need the same if ever you have a database of files (media center oriented);
- you need raw access to /dev/* to play DVD, CD and other optical disk (and the equivalent on Windows);
- you need ioctl on such devices, to pass the MMC for DVD/Bluray;
- you need raw access to /dev/v4l* for your webcams and be able to control them;
- you need access to the GPU stack, which is running in kernel-mode, btw, to output video and get hw acceleration;
- you need access to the audio stack, also in low-level mode;
- you need access to the DSP acceleration (not always the GPU);
- on linux, you have access to x11 for the 3 above features, which is almost root;
- you need access to /etc/ (registry) for proxy informations, fonts configuration and accessibility;
- many OpenGL client libraries need access to the /etc too;
- you need access to the network, as input and output (think remote control);
- you need access to the system settings to disable screensavers, and adjust brightness;
- you need access to mounts to be able to see the insertion of DVD/Bluray/USB/SD cards and such;
- you need to expose an IPC (think MPRIS on Linux);
- you need to unzip, untar, decrypt, decipher and so on;
- you need access to the fonts and the fonts configuration (see fontconfig).
and I probably forgot one or another case.
The point is, all those features have good reasons to exist and very good use cases; but the issue is that for a media player, it will request almost all permissions except GPS and address book.
And quite a few of them are very close to kernel mode.
So, what is the solution?
Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them.
I've been working on such a prototype, but it's a lot of work... I accept donations :)
Thanks for that. This type of thing comes up all the time. I used to wonder how web sites could be so dangerous, but it becomes clear when you think about all the extra access developers wanted for good reasons - imagine a web browser that didn't have access to the file system, and so on. I still don't like this state of affairs, but I don't have an alternative solution. Wayland should be more secure than X, but they're starting to poke holes in there for various reasons (color picker, warp pointer for compat, etc...).
Not even multi-process. Threads on Linux can have their own seccomp profiles. You don't need to sandbox absolutely everything at the same time either. In this case opening the file in the main, unrestricted app and spawning a new thread that will read from the existing FD and only send you simple, time sorted messages over a shared IPC/pipe is not that crazy.
Other points may be more tricky, and it's a good list of potential issues, but we can start chipping away some stuff right now. There's a lot we can fix without fixing everything at the same time.
> Threads on Linux can have their own seccomp profiles.
Not on Windwows or on macOS.
> new thread that will read from the existing FD and only send you simple, time sorted messages over a shared IPC/pipe is not that crazy.
Of course that does not solve anything, because your demuxer|decoders|output needs access to the FS, have access to kernel-mode and those are the dangerous parts.
> > Threads on Linux can have their own seccomp profiles.
> Not on Windwows or on macOS.
It's a shame, then, that Windows & macOS are holding back security improvements for software running on Linux. I understand (& even agree with!) your desire to have a sandboxing mechanism which runs acceptably on all supported systems; it's just sad that this security mechanism in the Linux kernel can't be taken advantage of in vlc.
I'm not sure what you're trying to say. Yes, I meant Linux. Yes, it can solve the issue of separate subtitle files, which this article is about. Read access to an existing FD is not the same as full FS access, and there's no demux involved here.
> Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them.
That's not actually how Chrome's renderer sandboxing works. Both Windows and OS X allow you to share a GPU-resident texture between processes (DXGI shared surfaces and IOSurface respectively), so there's no need to copy any video data.
But you need to pass data from the access to the stream_filter, from the stream_filter to the demuxer, from the demuxer to several decoders, from the decoders to potentially a few video-filters and chroma-converters, and then finally to the output. Each of them need different access policies, and several of them require FS access.
The last part is just one of the issues, very far from all of them.
Seriously, stop thinking that noone has given a thought to the question...
These shouldn't require IPC at GB/s speed either. Modern sandboxes, like the one in Chrome, have a broker process which can open filesystem objects, device objects and sockets (file descriptors or handles) and pass them to a sandboxed decoder/renderer process, so there would be no need to stream filesystem data to the sandbox when the sandbox could do the file I/O itself. Even for Matroska ordered chapters, where the demuxer would have to tell the broker which files to open, the broker could enforce certain rules, such as enforcing that local mkv files only reference other local files, the files are all in the same directory, and that the files are always opened in read-only mode.
As for isolating decoders from video filters and chroma conversion, I'm not sure why that would be necessary, since those shouldn't require any additional privileges. I understand that retrofitting an existing program to use a multi-process sandboxing model is far from easy, and I'm definitely not volunteering to do it, but I don't think there is anything specific about a video player that is harder to sandbox than a web browser.
> I understand that retrofitting an existing program to use a multi-process sandboxing model is far from easy, and I'm definitely not volunteering to do it,
I don't think nobody has thought about it, but since you were apparently unaware that there was an alternative to performing several GB/s of IPC for moving buffers around there's obviously some options that haven't been considered. The Chromium sandbox has to deal with every issue you've listed (it's even calibrated to run ffmpeg inside the sandbox, since that's something Chromium needs to do).
It's not an attack. Each platform's methods of GPU IPC are pretty sparsely documented. Two months ago I wouldn't have known about them; I only learned by working on integrating Chromium's sandbox into an application that needed to work with the GPU within a sandboxed process.
That doesn't change the fact that none of the things you listed are unsupported by Chrome's sandbox model, and if you only need to establish a barrier around the video pipeline (and not e.g. VLC's ability to notice device status or interact with webcams) you don't even need 3/4 of what Chromium's sandbox has implemented. Like I said, I've actually walked the walk when it comes to using their sandbox for Windows and Linux with a process that needed to access certain user files, the GPU, and even each platform's font server equivalent, so this isn't me just spitballing about some theoretical solution.
A 10 year old PC carries 2.1 GiB/s (= 17 Gbps) over bog standard pipes without tuning or parallelism, as measured by "pv /dev/zero | cat > /dev/null". Uncompressed full HD is 1.5-3 Gbps. (Less actually, since codec output is going to be 4:2:2 or similar)
Yeah, you can come up with high bandwidth scenarios like stereo VR 144 Hz 4k HDR running on barely capable hardware. But 99% of users don't require such tricks and never see any upside from the performance-over-security compromise.
Even if you decide basic IPC is not fast enough, a shared memory buffer for raw frame data is reasonably secure too.
All this means Linux is misdesigned for user apps, forcing low-level code instead of proper APIs.
Maybe stuffing everything into the kernel isn't a good idea after all? All these things are exploit attack surfaces.
You actually don't NEED a lot of these things
I'm perfectly fine with a default / embedded font.
I don't have an optical drive
A database can be in the local app storage.
I'm fine opening a subtitle file myself.
Why would I need IPC?
Why would I need to unzip anything? If it's subtitle files, it can be done in-memory.
Are you sure we need low-level audio?
I don't have a remote, so I'd like it to be disabled by default. I don't need any access to the network.
So, let me share some light on the sandboxing for multimedia (I work on VLC).
If you sandbox an application like VLC, in the current way of doing sandboxing, which we've done for macOS, WinRT/UWP, and snaps, you still need a lot of permissions.
Namely:
- you need to be able to open files without user interactions (no file picker), in order to open playlist, MXF or MKV files;
- you need the same if ever you have a database of files (media center oriented);
- you need raw access to /dev/* to play DVD, CD and other optical disk (and the equivalent on Windows);
- you need ioctl on such devices, to pass the MMC for DVD/Bluray;
- you need raw access to /dev/v4l* for your webcams and be able to control them;
- you need access to the GPU stack, which is running in kernel-mode, btw, to output video and get hw acceleration;
- you need access to the audio stack, also in low-level mode;
- you need access to the DSP acceleration (not always the GPU);
- on linux, you have access to x11 for the 3 above features, which is almost root;
- you need access to /etc/ (registry) for proxy informations, fonts configuration and accessibility;
- many OpenGL client libraries need access to the /etc too;
- you need access to the network, as input and output (think remote control);
- you need access to the system settings to disable screensavers, and adjust brightness;
- you need access to mounts to be able to see the insertion of DVD/Bluray/USB/SD cards and such;
- you need to expose an IPC (think MPRIS on Linux);
- you need to unzip, untar, decrypt, decipher and so on;
- you need access to the fonts and the fonts configuration (see fontconfig).
and I probably forgot one or another case.
The point is, all those features have good reasons to exist and very good use cases; but the issue is that for a media player, it will request almost all permissions except GPS and address book.
And quite a few of them are very close to kernel mode.
So, what is the solution?
Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them.
I've been working on such a prototype, but it's a lot of work... I accept donations :)