Hacker News new | past | comments | ask | show | jobs | submit login

Ha, the famous sandboxing remark. I wish it was that simple!

So, let me share some light on the sandboxing for multimedia (I work on VLC).

If you sandbox an application like VLC, in the current way of doing sandboxing, which we've done for macOS, WinRT/UWP, and snaps, you still need a lot of permissions.

Namely:

- you need to be able to open files without user interactions (no file picker), in order to open playlist, MXF or MKV files;

- you need the same if ever you have a database of files (media center oriented);

- you need raw access to /dev/* to play DVD, CD and other optical disk (and the equivalent on Windows);

- you need ioctl on such devices, to pass the MMC for DVD/Bluray;

- you need raw access to /dev/v4l* for your webcams and be able to control them;

- you need access to the GPU stack, which is running in kernel-mode, btw, to output video and get hw acceleration;

- you need access to the audio stack, also in low-level mode;

- you need access to the DSP acceleration (not always the GPU);

- on linux, you have access to x11 for the 3 above features, which is almost root;

- you need access to /etc/ (registry) for proxy informations, fonts configuration and accessibility;

- many OpenGL client libraries need access to the /etc too;

- you need access to the network, as input and output (think remote control);

- you need access to the system settings to disable screensavers, and adjust brightness;

- you need access to mounts to be able to see the insertion of DVD/Bluray/USB/SD cards and such;

- you need to expose an IPC (think MPRIS on Linux);

- you need to unzip, untar, decrypt, decipher and so on;

- you need access to the fonts and the fonts configuration (see fontconfig).

and I probably forgot one or another case.

The point is, all those features have good reasons to exist and very good use cases; but the issue is that for a media player, it will request almost all permissions except GPS and address book.

And quite a few of them are very close to kernel mode.

So, what is the solution?

Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them.

I've been working on such a prototype, but it's a lot of work... I accept donations :)




Thanks for that. This type of thing comes up all the time. I used to wonder how web sites could be so dangerous, but it becomes clear when you think about all the extra access developers wanted for good reasons - imagine a web browser that didn't have access to the file system, and so on. I still don't like this state of affairs, but I don't have an alternative solution. Wayland should be more secure than X, but they're starting to poke holes in there for various reasons (color picker, warp pointer for compat, etc...).


Not even multi-process. Threads on Linux can have their own seccomp profiles. You don't need to sandbox absolutely everything at the same time either. In this case opening the file in the main, unrestricted app and spawning a new thread that will read from the existing FD and only send you simple, time sorted messages over a shared IPC/pipe is not that crazy.

Other points may be more tricky, and it's a good list of potential issues, but we can start chipping away some stuff right now. There's a lot we can fix without fixing everything at the same time.


> Threads on Linux can have their own seccomp profiles.

Not on Windwows or on macOS.

> new thread that will read from the existing FD and only send you simple, time sorted messages over a shared IPC/pipe is not that crazy.

Of course that does not solve anything, because your demuxer|decoders|output needs access to the FS, have access to kernel-mode and those are the dangerous parts.


> > Threads on Linux can have their own seccomp profiles.

> Not on Windwows or on macOS.

It's a shame, then, that Windows & macOS are holding back security improvements for software running on Linux. I understand (& even agree with!) your desire to have a sandboxing mechanism which runs acceptably on all supported systems; it's just sad that this security mechanism in the Linux kernel can't be taken advantage of in vlc.


Well, no. Because you can do it per-process. I don't see the reason of doing it per threads here.


I'm not sure what you're trying to say. Yes, I meant Linux. Yes, it can solve the issue of separate subtitle files, which this article is about. Read access to an existing FD is not the same as full FS access, and there's no demux involved here.


> Yes, I meant Linux

The demo is on Windows. The goal is to do a sandbox that works on most OSes.

And, it will not solve the decoder issue, since it is on the decoding side, which still has access to the GPU/Aout and the kernel.

> Read access to an existing FD is not the same as full FS access, and there's no demux involved here.

You're totally missing the point here. The issue is demuxers/decoders/output, not really the access.

Reading from an FD or not would not solve the buffer overflow exploitation (if it was actually exploitable).


> Not even multi-process. Threads on Linux can have their own seccomp profiles.

Feels kinda pointless, since all threads in a process share the same memory protection.


They don't have to. Clone can do a lot of magic without full processes.


But then you need to copy the memory from the decoder to the video output or you get back to the same problem to work-on.


No, you can use a shared memory segment for a buffer just for that.

It's more coding, certainly, but it's possible. Security is an option if we wanted it.


That's exactly the point above. See my above comment.


I'm not sure, it sounds like you're saying we'd need to copy memory.

The shared memory segment can be a GPU image buffer, so I don't think that's true.


See comment above with "the solution".

Either you need to have multi-process and correct IPC, or you need to copy.


> Probably do a multi-process media player, like Chrome is doing, with parsers and demuxers in a different process, and different ones for decoders and renderers. Knowing that you probably need to IPC several Gb/s between them.

That's not actually how Chrome's renderer sandboxing works. Both Windows and OS X allow you to share a GPU-resident texture between processes (DXGI shared surfaces and IOSurface respectively), so there's no need to copy any video data.


But you need to pass data from the access to the stream_filter, from the stream_filter to the demuxer, from the demuxer to several decoders, from the decoders to potentially a few video-filters and chroma-converters, and then finally to the output. Each of them need different access policies, and several of them require FS access.

The last part is just one of the issues, very far from all of them.

Seriously, stop thinking that noone has given a thought to the question...


These shouldn't require IPC at GB/s speed either. Modern sandboxes, like the one in Chrome, have a broker process which can open filesystem objects, device objects and sockets (file descriptors or handles) and pass them to a sandboxed decoder/renderer process, so there would be no need to stream filesystem data to the sandbox when the sandbox could do the file I/O itself. Even for Matroska ordered chapters, where the demuxer would have to tell the broker which files to open, the broker could enforce certain rules, such as enforcing that local mkv files only reference other local files, the files are all in the same directory, and that the files are always opened in read-only mode.

As for isolating decoders from video filters and chroma conversion, I'm not sure why that would be necessary, since those shouldn't require any additional privileges. I understand that retrofitting an existing program to use a multi-process sandboxing model is far from easy, and I'm definitely not volunteering to do it, but I don't think there is anything specific about a video player that is harder to sandbox than a web browser.


> I understand that retrofitting an existing program to use a multi-process sandboxing model is far from easy, and I'm definitely not volunteering to do it,

Yes, that's the core of the issue.


I don't think nobody has thought about it, but since you were apparently unaware that there was an alternative to performing several GB/s of IPC for moving buffers around there's obviously some options that haven't been considered. The Chromium sandbox has to deal with every issue you've listed (it's even calibrated to run ffmpeg inside the sandbox, since that's something Chromium needs to do).


> but since you were apparently unaware that there was an alternative

I will refrain from answering to such attacks. As you seem to know better, I'm waiting for your patches.


It's not an attack. Each platform's methods of GPU IPC are pretty sparsely documented. Two months ago I wouldn't have known about them; I only learned by working on integrating Chromium's sandbox into an application that needed to work with the GPU within a sandboxed process.

That doesn't change the fact that none of the things you listed are unsupported by Chrome's sandbox model, and if you only need to establish a barrier around the video pipeline (and not e.g. VLC's ability to notice device status or interact with webcams) you don't even need 3/4 of what Chromium's sandbox has implemented. Like I said, I've actually walked the walk when it comes to using their sandbox for Windows and Linux with a process that needed to access certain user files, the GPU, and even each platform's font server equivalent, so this isn't me just spitballing about some theoretical solution.


What features could the OS offer you (to help your program be "sandboxed") that it currently does not?


I think we can do everything now for the majors OSes, but I'd guess this is a 50-100 man-month work for VLC.


You don't need special fast IPC. Even uncompressed video is fine over standard IPC.


Blurays are 60Mbps.

Then with 40k60 + HDR, displaying is quite a lot of bandwidth.


A 10 year old PC carries 2.1 GiB/s (= 17 Gbps) over bog standard pipes without tuning or parallelism, as measured by "pv /dev/zero | cat > /dev/null". Uncompressed full HD is 1.5-3 Gbps. (Less actually, since codec output is going to be 4:2:2 or similar)

Yeah, you can come up with high bandwidth scenarios like stereo VR 144 Hz 4k HDR running on barely capable hardware. But 99% of users don't require such tricks and never see any upside from the performance-over-security compromise.

Even if you decide basic IPC is not fast enough, a shared memory buffer for raw frame data is reasonably secure too.


Knowing that today we still see bandwidth issues in VLC, even without IPC, I kind of doubt it.


All this means Linux is misdesigned for user apps, forcing low-level code instead of proper APIs. Maybe stuffing everything into the kernel isn't a good idea after all? All these things are exploit attack surfaces.


Interesting.

I am only interested in these features:

- you need access to the GPU stack, which is running in kernel-mode, btw, to output video and get hw acceleration;

- you need access to the audio stack, also in low-level mode;

- you need access to the DSP acceleration (not always the GPU);

- you need access to the system settings to disable screensavers, and adjust brightness;

- you need to unzip, untar, decrypt, decipher and so on;

- many OpenGL client libraries need access to the /etc too;

Is there a lighter version where these features are cut?

- you need to be able to open files without user interactions (no file picker), in order to open playlist, MXF or MKV files;

- you need the same if ever you have a database of files (media center oriented);

- you need raw access to /dev/* to play DVD, CD and other optical disk (and the equivalent on Windows);

- you need ioctl on such devices, to pass the MMC for DVD/Bluray;

- you need raw access to /dev/v4l* for your webcams and be able to control them;

- on linux, you have access to x11 for the 3 above features, which is almost root;

- you need access to /etc/ (registry) for proxy informations, fonts configuration and accessibility;

- you need access to the network, as input and output (think remote control);

- you need access to mounts to be able to see the insertion of DVD/Bluray/USB/SD cards and such;

- you need to expose an IPC (think MPRIS on Linux);

- you need access to the fonts and the fonts configuration (see fontconfig).


The first part are the dangerous parts.


You actually don't NEED a lot of these things I'm perfectly fine with a default / embedded font. I don't have an optical drive A database can be in the local app storage. I'm fine opening a subtitle file myself. Why would I need IPC? Why would I need to unzip anything? If it's subtitle files, it can be done in-memory. Are you sure we need low-level audio?

I don't have a remote, so I'd like it to be disabled by default. I don't need any access to the network.

etc. etc. etc


Those restrictions work for you, but would make VLC borderline useless for me.

> I don't need any access to the network.

90+% of what I use it for comes from my NAS or the Internet.

> I don't have an optical drive

Most of the rest is from optical discs.

> I'm perfectly fine with a default / embedded font. [...] I'm fine opening a subtitle file myself.

It's _fine_ but far from ideal. Both are useful quality of life features.

> Why would I need to unzip anything?

Non-essential, but being able to play video from a ZIP is a useful feature.


Let's play, "why isn't my use case the only use case."


Congratulations, you don't need those things. What about the other 19.999.999* users? Are you sure they don't need any of those things? :)

* Arbitrary number.


That's you. Most users expect one or more of the other mentioned features.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: