Openrsync imported into the tree

geofft · on Feb 11, 2019

> The actual work of porting, however, is matching the security features provided by OpenBSD's pledge(2) and unveil(2). These are critical elements to the functionality of the system. Without them, your system accepts arbitrary data from the public network. ... rsync has specific running modes for the super-user. It also pumps arbitrary data from the network onto your file-system. Do you want that running without specific mitigation in place?

This is a confusing claim. What exactly does "accepts arbitrary data from the public network" mean? (Most servers do that, they just choose not to process the data without additional validation.) And in what way is it critical to the functionality of the system?

Is the claim that, after calling pledge() and unveil(), the openrsync process is happy to satisfy arbitrary read/write requests from the other side of the connection, and so without them it is insecure?

Does openrsync view peer-induced memory corruption after pledge() or unveil() as a vulnerability? Or is the idea that the attacker can already "pump arbitrary data from the network onto your filesystem" and that the attacker gaining control flow is not a meaningful escalation of privileges?

My impression is that pledge() and unveil() are hardening tools, intended to limit the damage from a process that has already gotten out of control (in the same way that e.g. running Apache as non-root does not mean that you're actively fine letting attackers run code as www-data). Is that impression wrong? Is openrsync using them for the basic functionality of making sure that a file is only being rsynced to the filename given on the command line?

bentley · on Feb 12, 2019

I’m trying to understand your question.

Typically, a process once compromised can do all sorts of things: touch files, access the network, execute programs, and so on. Among other things, OpenBSD’s security culture focuses on mitigating the damage done by compromised code through development practices such as privilege separation.

Traditionally this was done by splitting functionality into multiple processes, each serving a specific purpose such as doing network communication or parsing configuration, and dropping privileges in any way possible such as chrooting and switching to a dedicated user. Thus the attack surface is reduced, and the potential damage done by a compromised (sub‐)process is reduced as well.

pledge() and unveil() are the latest evolution in OpenBSD’s technique. pledge() whitelists syscalls, and unveil whitelists files that can be accessed.

So your process reads this arbitrary data from the public network. You validate it through some function and pass the data on to the next stage of your program. But what if there’s a bug in your validator, and your process gets compromised?

If your process hasn’t had its capabilities reduced, the attacker can do practically anything, especially if the process has superuser privileges.

But if the program uses a multi‐process privilege‐separated architecture, your validation process can’t access the filesystem or the network and isn’t running as root. If it tries, the kernel will kill it for violating its pledge. All the compromised process can do is pass malicious data through whatever interface you’ve provided between your validator and filesystem processes, hopefully an interface that is simple, well‐defined, and well‐audited.

What if your filesystem process gets compromised? With pledge() it can’t access the network or execute external code. With unveil(), even its file accesses are limited to the files whitelisted earlier in the program. It can’t read your SSH keys or delete your photos.

Certainly, if the process can be compromised that’s a bug that needs to be fixed. But we see new bugs constantly in the software we use every day. It’s a safe bet to say we will encounter more. By using a secure architecture, the damage these bugs can cause is drastically reduced.

There’s a really good description and demonstration of privilege separation in another project by Kristaps, acme-client (a Let’s Encrypt/certbot alternative): https://kristaps.bsd.lv/acme-client/

Another such project is Google Chrome, which uses pledge() and unveil() on OpenBSD.

geofft · on Feb 12, 2019

My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea. That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

I do expect this is structured as you describe - that it has a validator, and that it uses these kernel features as additional hardening if the validator has a bug. But I would not describe that as requiring pledge() / unveil() and certainly not requiring it for functionality. So I don't know what the author means.

And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the worst the remote side could do is corrupt files but it could have just have sent different contents for the files in the first place. This seems unlikely to me, but I'm having trouble figuring out an alternate interpretation.

bentley · on Feb 12, 2019

> My question is that the README implies that pledge() and unveil() are required for functionality, to the point that porting to an OS without support for that is an inherently questionable idea.

Knowing Kristaps, he probably considers strong privsep and privdrop basic functionality. That is after all why he developed acme-client in the first place; he acknowledged at the time the plethora of “lightweight” certbot alternatives but was more concerned with security architecture.

> That certainly isn't true of Chrome (I run it on non-OpenBSD and there are no functionality / security issues in doing so). That isn't true of OpenSSH, which also supports pledge() on OpenBSD and still is pretty secure on other OSes.

Chrome uses different techniques depending on the platform. On OpenBSD it uses pledge() and unveil(), while on Linux it uses seccomp. Kristaps isn’t a fan of seccomp’s complexity, as he mentions in the readme: “Linux's security facilities are a mess, and will take an expert hand to properly secure.” He’s not suggesting it can’t be done, and the Google Chrome team in particular has the kind of expertise he’s talking about.

For projects of less‐than‐Chrome scale, though, Kristaps feels that seccomp is too difficult: https://github.com/kristapsdz/acme-client-portable/blob/mast...

> And I am worried, in particular, that the author means that the validator is not very strong, and the bulk of the validation is that it unveils the filesystem to the files it's supposed to write to and then blindly trusts the input, on the grounds that the remote side could just have sent different files.

I don’t understand this interpretation. It’s not what I got from the readme at all. What kind of validation do you expect Kristaps to be overlooking?

aidenn0 · on Feb 12, 2019

It is possible to read the readme[1] to imply that unveil is the only protection from escaping the root (e.g. with a ".." directory). The only way to know for sure is to dig through the code though.

https://github.com/kristapsdz/openrsync

admax88q · on Feb 11, 2019

You're not wrong but the point being made is that wouldn't you want a tool which writes data from the network to disk to have those mitigations enabled?

hawski · on Feb 11, 2019

From a comment on the site: "(...) its (original rsync's) compressed manual page is almost as big as the compressed openrsync sources (...)"

It's license (ISC ofc.) and size makes it great resource to study rsync. I would like to have Dropbox on my phone as legendary combination of rsync and cron. It may be nice to have a port to Java so it would work without JNI, but maybe that's only my fetish.

ComputerGuru · on Feb 11, 2019

I just want to point out that rsync is, in fact, no longer ISC licensed but rather GPL (v3, at that), which is likely a big part of the reason this new implementation even exists.

meruru · on Feb 12, 2019

rsync was never ISC licensed afaik. The parent is referring to openrsync's license.

tinus_hn · on Feb 12, 2019

Rsync was developed by the Samba people, it is under the same license (GPL).

accrual · on Feb 12, 2019

Very cool news. rsync(1) is one of the first things I install on a new OpenBSD instance.

Tangentially related, I've been using Time Machine-like wrapper [0] around rsync(1) for a few years. It's very helpful for maintaining snapshots of my home directory.

[0] https://blog.interlinked.org/tutorials/rsync_time_machine.ht...

davewongillies · on Feb 12, 2019

I use rsnapshot [0] for the same thing.

[0] https://rsnapshot.org/

amaccuish · on Feb 11, 2019

For those wondering what this is, see https://github.com/kristapsdz/openrsync

benatkin · on Feb 11, 2019

I'll try explaining it. It's a new implementation, from scratch (clean room) of rsync, which will become the new rsync in OpenBSD. The tree that it's been imported into is the openbsd cvs tree that contains openbsd, openssh, opencvs, and other major projects.

CaliforniaKarl · on Feb 12, 2019

I would not be surprised if, in a few years, this becomes one of the CLI tools installed on macOS, either as part of the default install or as part of the Xcode CLI tools.

gpvos · on Feb 12, 2019

Does macOS have any security features similar to pledge/unveil or any of the Linux hardening packages?

Fnoord · on Feb 12, 2019

It has a port of PF.

gpvos · on Feb 12, 2019

I'm more interested to know about system call and filesystem access restrictors. I think pf is only a packet filter.

Fnoord · on Feb 12, 2019

There's SIP and Keychain, but it does not prevent say Safari from accessing Mail or user memory in general. If macOS becomes an iOS port (instead of iOS being the derivative work of the barely used UNIX system called macOS) perhaps we'd see some of the iOS specific hardening. AFAIK that kind of sandboxing does not exist in macOS. How difficult would it be to port something like pledge or unveil to macOS?

riffraff · on Feb 12, 2019

Why? MacOS has a bunch of GPL stuff, such as bash, IIRC.

perbu · on Feb 12, 2019

GPL2, I doubt you'll find any GPL3 code in there.

Which is why bash on MacOS is from 2007.

__david__ · on Feb 12, 2019

And it already has actual rsync.

avar · on Feb 12, 2019

It has 12 year old rsync due to Apple not wanting to ship anything that's GPLv3: https://bayton.org/2018/07/how-to-update-rsync-on-mac-os-hig...

AdmiralAsshat · on Feb 11, 2019

Interesting. This is the first project I can think of where a clean-room implementation was done so that a project could use a less free license ("free" as defined by the FSF).

Does anyone else know of instances where a company did a clean-room implementation of a previously FOSS tool so that they could make a paid/proprietary version? Usually it goes the other way.

joshklein · on Feb 11, 2019

ISC is a more free license for its users. GPL protects theoretical future users of theoretical derivative software by restricting freedom for its users.

It's important to remember that GNU is Not Unix, but OpenBSD userland is much more so. There isn't much reason to protect future forks if you expect that future software should start from first principles instead of extending software until it becomes a monolith that must be protected from its own developers.

m463 · on Feb 12, 2019

That is not precisely accurate.

The GPL does not place any restrictions on how software is used, so the (literal) users are not restricted.

It restricts how it is redistributed.

joshklein · on Feb 12, 2019

Apologies, I intended "user" in my comment to mean "a developer using the license". Thank you for clarifying.

e12e · on Feb 12, 2019

This is the core difference between gnu and bsd - guaranteeing freedom for all current and future users VS all current and future distributors (in particular, the bsd guarantees the right to fork and close - often seen as essential for commercial use in a new software or software+hardware appliance; while gnu attempts to guarantee that any downstream user will always have the four freedoms).

TeMPOraL · on Feb 12, 2019

It's so easy to forget that at the very end, there are people who are using software. Developers are middlemen for most code in the products they build (think dependencies). GPL cuts through that, and always has the end-user in mind.

demoray · on Feb 11, 2019

WSL. The implementation, lxcore.sys, is a clean room implementation of the Linux kernel ABI.

protomyth · on Feb 11, 2019

How is the ISC (version of BSD license used by OpenBSD) less free than the GPL3? This is very far from a "paid/proprietary" version.

meruru · on Feb 12, 2019

Using "less free" or "more free" in this context just leads to pointless semantic debates. What happened is that someone made a clean-room implementation of a copyleft program in order to have it available under a copyfree license. Both licenses are Free.

http://copyfree.org/policy/copyleft

TeMPOraL · on Feb 12, 2019

First time I see this, thanks. The website isn't explicit about this point, but from what I gather, "copyfree" isn't viral in the way GPL is. It seems to provide "Free as in Freedom", but unlike GPL, doesn't protect that freedom from being immediately taken away.

jimktrains2 · on Feb 11, 2019

I think gp means that this code is allowed to be used in products that choose to limit the end users freedoms.

(I don't mean that as a plus or negative, but as just a statement on one of the largest philosophical differences between the bsd-style and gpl licenses: Who's freedoms are being protected? Those of the final end user or those of the developer?)

mouldysammich · on Feb 11, 2019

Its much closer to a proprietary version than a GPL version would or could be however.

kevin_thibedeau · on Feb 11, 2019

Proprietary-friendly is not less-free.

stavros · on Feb 12, 2019

How not?

bentley · on Feb 12, 2019

I want my code to be usable by anyone developing free software of their own. I want them to be able to integrate it, modify it, redistribute their modified copies, and more.

The GPL, being long and complicated (over 5000 words, and that’s just the GPLv3!), and with the ideological restrictions built in, is incompatible with many widely used free licenses, not least previous versions of itself. In any situation where social or legal barriers prevent the target audience from switching to the specific version of the GPL in question, any code I release under it is unusable to them.

Releasing my software under a simple, understandable, and permissive free license prevents this from ever happening.

I dislike proprietary software. I don’t use it or create it, and advocate against it wherever I can.

But given the choice between letting some Chinese featurephone developer use my code without “giving back,” and preventing swaths of the free software community I care about from using and improving my code for themselves, I will favor permissiveness every time.

GalacticDomin8r · on Feb 12, 2019

Because you are actually free to do what you want with it, instead of free to do what someone else wants you to do with it.

stavros · on Feb 12, 2019

Yes, you are free to close it up and sell it, but everyone else then isn't free to use your changes. "More friendly to proprietary purposes" is "less free".

It's kind of like arguing that a country where anyone can steal from anyone else with impunity is more free. Not when you consider the rights of the person being stolen from.

derefr · on Feb 12, 2019

Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free. Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done” for all intents and purposes and there is no reason at all to fork, proprietary or otherwise. (This comes up in the context of pure algorithms code a lot.)

Also, even if a project is copylefted, people can still just do... exactly what they did here. Which, while different in the weak sense of “avoiding copyright” or maybe “avoiding patents”, in the context of systems code like this almost always results in the same code on both sides anyway. If the choice is between either giving the proprietary developers your code to use, or making them re-implement exactly what you wrote without your copy for reference—with no option for “they don’t implement it at all”—then exactly what is the point of choosing the latter over the former?

stavros · on Feb 12, 2019

> Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

We aren't talking about non-free, we're talking about less-free.

> Just because some forks of a project aren’t free, doesn’t mean that the project itself isn’t free

No. The fact that there can be forks of the project that aren't open is what means that the project itself is less free than a project where all forks must be open.

> Especially if the project, like much of the e.g. MIT-licensed code in the world, is “done”

I don't consider this relevant to the argument at hand.

> then exactly what is the point of choosing the latter over the former?

Are you asking me what the point is of making something you don't want people to do hard for them vs making it easy?

adamrt · on Feb 12, 2019

That's not a good analogy. In this case, people aren't being stolen from, they are freely giving it away for someone else to do as they wish.

Additionally, "theft" as you put it, in this case, doesn't affect the original property owner.

TeMPOraL · on Feb 12, 2019

Digital analogies to theft rarely are good, but this one is passable. The main point is that this license grants software freedoms, but then doesn't do anything to protect it - thus enabling middlemen (like most of us devs are, for most software we write!) to immediately strip those freedoms away.

Proclamations of rights aren't really useful if they don't have means for enforcing those rights are not taken away.

stavros · on Feb 12, 2019

I wasn't using it as an example of being deprived of property, but as an example of how infringing on other people's freedoms leads a system to be less free than one that doesn't.

chriscappuccio · on Feb 12, 2019

You have the freedom to create your own proprietary derivative, a freedom you lose with the GPL version of rsync.

int_19h · on Feb 12, 2019

On a very high level, LLVM/Clang happened because Apple needed a clean-room implementation of GCC.

bsder · on Feb 12, 2019

And because the gcc code was an impenetrable mess--intentionally so in order to prevent people from making a non-GPL alternative.

yjftsjthsd-h · on Feb 12, 2019

Of gcc, or "a C compiler (with extensions as seen in the wild)"?

int_19h · on Feb 12, 2019

Well, Clang implemented gcc extensions long before it went for MSVC ones...

meruru · on Feb 12, 2019

The BSDs have a strong preference for copyfree licenses. They tolerate copyleft programs, but try to switch to copyfree when possible. See for instance GCC -> Clang/LLVM.

wmf · on Feb 11, 2019

Both the ASF and FSF have a variety of NIHed projects that appear to exist purely for license ideology reasons. The most famous that comes to mind is Apache Geronimo, a clone of JBoss that few people used but was bought by IBM for ~$120M IIRC.

mindslight · on Feb 12, 2019

https://en.wikipedia.org/wiki/Bionic_(software)

(You know, since we're tossing grenades)

meruru · on Feb 12, 2019

I hope this ends up being a lot simpler and easier to understand than the original rsync. The rsync manpage is way too long.

gmueckl · on Feb 12, 2019

Rsunc solved a complex problem that comes in many nuanced variants. It may seem trivial at the outset, but it is actually not. So I don't think that rsync has many features that are somehow unnecessary or bloat.

meruru · on Feb 12, 2019

Well, the manpage for this is looking really good and it already has almost everything that I care about. The -a option isn't in yet, but it's in the TODO.

https://github.com/kristapsdz/openrsync/blob/master/openrsyn... https://github.com/kristapsdz/openrsync/blob/master/TODO.md

I hope the -c and maybe -X option make it.

m0nty · on Feb 12, 2019

> The rsync manpage is way too long

I see its thoroughness as a feature, not a bug. It's very well written and I can just ignore the bits I'm not interested in. I wish more man pages were "too long" like this one is.

joppy · on Feb 12, 2019

What does a "clean-room implementation" mean?

Tor3 · on Feb 12, 2019

The first (well-known) 'clean-room' implementation was when Phoenix implemented an IBM PC-compatible BIOS by having one team studying the IBM source (which was available), then writing up a specification for how it worked, handing that specification over to somebody else (they were Phoenix' legal team, IIRC), which then handed the specs over to another team that had never seen the IBM source. They sat down in their "clean room" (b/c it wasn't tainted by actual IBM source) and implemented a BIOS from specs only. In that way Phoenix was protected from any claims of copyright infringement: Nothing was copied, and the people writing the code had never seen the original source.

In that particular case the specs were reverse-engineered from actual source, but that's not a necessary part of the process. It's more common to have one team study the protocol, data going over the wire, disassembling, etc, then use the knowledge gained to write specs, and then another team implements the equivalent functionality from specifications only.

bentley · on Feb 12, 2019

Not derived from the existing code. The reason it’s mentioned is to assert that openrsync is not subject to the original rsync’s GPL.

gerdesj · on Feb 12, 2019

Is it any better than rsync?

yjftsjthsd-h · on Feb 12, 2019

It is better in some ways, and worse in some, both largely subjective. It has a different license, is smaller, less battle tested, from different developers, designed with different goals in mind.

It Depends™ on how you judge.

rstuart4133 · on Feb 13, 2019

All openrsync implements is the equivalent of a fast "cp -a" across the network, plus it can also remove files if they don't exist. rsync does much more and over the years I've used most of it, so there is no way I would use openrsync. The upside is the manpage of openrsync isn't that much more complex than cp, which is a definite bonus if that's all you are doing.

The only thing I would change about rsync is it's default, which IMO should be to copy all meta data supported by both sides. Ie, the default should be to make the destination as similar to the source as possible. It's default is to only copy the data, and you must add options to say what else you want copied. To make matters worse you can just add every option because if you say you want to copy something not supported by one side of the other it errors out. I may have missed it as I am reading the man page source, but openrsync didn't seem to change that.

kristapsdz · on Feb 13, 2019

No. openrsync implements the rsync protocol. It doesn't have all of its options, but the protocol is what it is. Do you have any idea what you're talking about?

theamk · on Feb 11, 2019

It is interesting that "open" part of openrsync refers to license -- BSD, vs original rsync's GPL

It's not often I see "open" to mean "non-GPL" in software :)

bentley · on Feb 12, 2019

Fun bit of history: the “Open” here comes from OpenBSD; but the “Open” in OpenBSD came from the development process, not the license.

Before Git and SVN, we had CVS, and to check out code from a CVS repository you needed to have an account on the CVS server. If you wanted to contribute but didn’t already have a developer account, you were limited to writing patches against release tarballs or whatever alternative method upstream supplied.

One of OpenBSD’s major projects in the mid 90s was creating anonymous CVS, where anyone could check out code without any account. This came from Theo’s experience after losing his NetBSD account, where he found himself unable to make meaningful contributions anymore without the ability to cvs checkout, cvs diff, etc. So when he started OpenBSD, he had in mind to open up the development process to everyone, account or not.

This is described in the commentary for the OpenBSD 6.1 release song: https://www.openbsd.org/lyrics.html#61

zdw · on Feb 11, 2019

rsync went GPLv3 a while ago, and many businesses don't trust some of the newer clauses that were added.

Similarly, the more strict strict BSD crowd has issues with the Apache2 license clauses regarding revocation - see here: https://www.openbsd.org/policy.html

enneff · on Feb 12, 2019

Stallman himself makes a big fuss about "Open Source" not being equivalent to "Free Software". See: https://www.gnu.org/philosophy/open-source-misses-the-point....

protomyth · on Feb 11, 2019

I get the feeling the "open" part was because they were hoping to get it included in OpenBSD like OpenSMTPD, etc.

meruru · on Feb 12, 2019

Yeah, that was my assumption too. It's coming from the OpenBSD community, so openrsync it is.