Improving Tailscale via Apple’s open source

dcow · on April 14, 2023

There are Apple-supported ways to solve both these issues built in to the `NEPacketTunnelProvider` framework. No need to drop into obscure `ioctls`!

The correct way to solve the looping issue is by configuring the `excludedRoutes` property on the `NEIPv4|6Settings` object associated with your network extension. See: https://developer.apple.com/documentation/networkextension/n.... I imagine this is an important property for a dynamic mesh network to keep updated regularly as the topology changes.

The traffic egressing the wrong interface issue happens because the system handles swapping between WiFi and Cellular radios quite judiciously, and the BSD socket API isn't plugged into this management (AFAIK there's no way to get notified when the delegated interface changes, so you don't really know when to re-bind). Instead, you should use the provided `packetFlow` to write IP packets to the tunnel, which will always egress via the "correct" (whichever one the system decides is best at the moment) interface.

Tailscale's go library would need to implement hooks to call out to the platform implementation for Apple vs others, which is annoying. But, it's the right way and avoids these weird edge cases and bugs.

pjmlp · on April 14, 2023

It is the usual issue that Linux folks keep looking at NeXTSTEP derivatives as yet another UNIX clone, instead of what is on top of it.

ayewo · on April 14, 2023

Would be interesting to hear the thoughts of Tailscale folks regarding your suggested fix.

ayewo · on April 18, 2023

In addition to bradfitz returning to comment, I managed to get a response out of apenwarr on Twitter https://twitter.com/apenwarr/status/1647058229304651782

bradfitz · on April 15, 2023

We know about that. Very early versions of Tailscale tried to use it. It's not enough, though. A number of things in Apple's APIs are 90% of what we need.

dcow · on April 16, 2023

> Very early versions of Tailscale tried to use it, It’s not enough though.

I know you’re not the author, but I think that type of background would have been an interesting/helpful addition to the blog post and have given it more grit/mileage. Personally I found the post to be link heavy and shallow. I read it wondering why you’re going through all this trouble, instead of understanding a clear problem and solution. A reader like I would be super interested in an explanation of the shortcomings of Apple’s framework and what product requirements or technical constraints make Tailscale’s bespoke approach necessary. I think it sets the stage better for a “we’re off the rails good thing Apple publishes the ifconfig sources, +1 for open source” type of message.

> A number of things in Apple’s APIs are 90% of what we need.

I know the feel.

dcow · on April 14, 2023

Also note there's an Android analog: https://developer.android.com/reference/android/net/VpnServi.... You call `protect` on sockets that should not send data through your tunnel. It's a bit more flexible and straightforward IMHO (:

drewg123 · on April 13, 2023

Shared / Open source is super helpful when you're trying to figure out how to do something unusual that's not on the paved path covered by the docs and examples.

In the early 2000s, when doing Mac drivers for an OS-bypass HPC NIC (Myrinet, a sort of pre-cursor to inifiband/roce), I spent 90% of my time reading the Apple source and about 10% of my time coding. In my case the first goal was to figure out how to use BSD ioctls rather than Mach based userclient stuff, as we had tons of shared kernel and userspace code for ioctls (across linux, dec unix, solaris, freebsd, windows, etc). And the second was figuring out exactly which iomemory descriptor variant would work well for registering memory.

ignoramous · on April 13, 2023

> Shared / Open source is super helpful when you're trying to figure out how to do something unusual that's not on the paved path covered by the docs and examples.

Not just the unusual, but the usual too: Just this past month or so, a user on Fly.io forums helped debug just why reading from stdin / writing to stdout didn't work for non-root users (https://community.fly.io/t/10375/8). It all started from the fact that Fly.io had open sourced a snapshot-in-time of their init process back in 2021.

Also, it isn't uncommon in the Android world for developers to routinely find the right APIs to use by reading the OS source code.

iudqnolq · on April 13, 2023

A very meaningful point in my learning as a junior developer was when I realized I should regularly click go to definition on standard library functions.

I needed to tweak the behavior of an Android standard library function, so I copied it's source code (~20 lines) and made some edits. Blew my mind at the time it was that simple.

jamesfmilne · on April 13, 2023

Interesting background. I've recently been toying with the idea of writing some open-source Mac DriverKit drivers for Mellanox cards, so I'm not forced to pay through the nose for ATTO cards (which are just rebranded Mellanox cards.)

Sadly all the DriverKit stuff appears to be completely closed source. At least its user-space though, so perhaps a bit less painful than normal kernel driver development...

thayne · on April 14, 2023

On the flip side of that, I've recently had to deal with a few issues with some proprietary software, where it would have been really, really useful to be able to look through the source code, but I couldn't. Instead I had to rely on incomplete documentation, and customer support that didn't have enough technical knowledge to fully understand the problem.

EMIRELADERO · on April 13, 2023

Nice blog post! I like the way this problem is approached and solved. Peeking under the "table" to see how the internals work is actually creative when trying to fix things "upstream" on an upper abstraction layer.

Unfortunately, I still don't love Tailscale. I do like it, a lot even. But their refusal to open-source all their clients (and the server) is baffling, especially considering that they have an employee contributing to Headscale, the community-led FOSS tailscale server. At that point just open-source the damn thing!

Issues aside, it's still a great product. It actually felt like magic when I first used it in a way few technologies have.

bradfitz · on April 13, 2023

We'd planned to open source our control server but it wasn't in a good shape to release at the time we released the other stuff. Then Headscale came along and removed all need for us to do so. Headscale is _much_ easier for people to run & understand. The Tailscale closed source one is kinda a monster, built for a very different scale. We're busy enough without also helping people struggling to run our control plane. We'd rather focus the community (or the subset of the community that wants to run their own server) to use Headscale instead.

And if you don't trust our server, use https://tailscale.com/blog/tailnet-lock/ and then you don't need to trust us. Or run Headscale. :)

EMIRELADERO · on April 13, 2023

Thank you for your answer.

Honestly I (and I suspect most here) wouldn't mind if the server was not easy to set up. The goal here would be transparency. It's true that, with lock, a user can run Tailscale without having to trust it. But it is still a good show of good faith and goodwill to have everything in your infrastructure be as transparent as possible, barring actual user data and service credentials.

Same concept applies to the proprietary GUI clients. What's the rationale for not, at least, making their source code publicly available for reproducible builds (or, if those are too complicated to implement, the same goodwill and transparency I talked about)? You wouldn't even need to actually support the source releases.

bradfitz · on April 13, 2023

FWIW, you can run open source Tailscale on macOS and Windows without the GUI.

And you can already do reproducible builds of our Windows build: the `tailscaled.exe` service and `tailscale.exe` CLI are open source. Only the GUI systray client (tailscale-ipn.exe) is closed.

For macOS and iOS, our development environment is kinda hell. It's great when it's finally working, but hard to get it set up and keep it in a happy place ... you have to get users into the right Apple teams for the right Network Extension Entitlements/notarization/etc, disable SIP to work on certain types of builds, be sure to clean up Xcode temp folders in ~/Library/ so the system doesn't pick up the wrong builds, etc, etc. Then you think things are good and in a few months a random keychain cert expires and you have to repeat the dance. Which sometimes involves a few macOS reboots for some reason.

Yes, maybe we could say that the macOS/iOS/Windows GUI source releases are "not supported" but that will stop approximately nobody from asking questions anyway and consuming time.

Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

The common reply is: but trust! but security! but auditability! Know how many corporate/paying customers have asked for open source Windows/macOS/iOS GUIs? Zero that I've heard about. Their trust relationship is with other companies, not with codebases they don't have time to build or audit anyway. Or they trust us that all the interesting code (non-GUI wrappers) is open source anyway and read that.

So, yes, we _could_ open source our GUIs. But it's not worth the resulting pain. It'd save me writing comments like this one, but then I'd be answering Xcode/mac build questions instead, and I'd much prefer writing this.

freedomben · on April 13, 2023

> Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

Yes, exactly. I'm one of the biggest FOSS advocates around, but if someone is running Mac or Windows, I have a hard time accepting criticism from them about closed source software. They are obviously pretty comfortable with closed-source/proprietary blobs. If you're running closed systems, you're voting for closed software. It's pretty hypocritical to criticize others to go open when you don't.

With my FOSS projects I do try to support macs, and if it's relatively easy then windows, but I don't blame anyone for taking a "like for like" approach to their software. Open for people who vote open, closed for people who vote closed. Live by the sword, die by the sword, or something like that.

Seems quite fair to me. It's also arguably better for Linux/open source, because if more software vendors took this approach with their software, we'd see more people choosing Linux because it would give us another competitive advantage. That leads to more Linux users, and the more Linux users there are the better it is for the whole community. More software vendors will support Linux, for example.

slimsag · on April 13, 2023

You don't need to be vegan to care about animal rights.

I can consume closed-source software and care about FOSS at the same time, do not discard people who are on your side just because they aren't extreme enough for your tastes.

freedomben · on April 13, 2023

> You don't need to be vegan to care about animal rights.

Yes, but that's not really equivalent to this situation. This isn't just a "do you care about FOSS" question like the "do you care about animal rights" analogy that you raised.

In this situation, a person who pays for and uses closed source software (and not just one application, but the entire OS), is criticizing another company for making closed source software. It would be more equivalent to somebody who is a meat purchaser/eater criticizing a company for making meat.

Operyl · on April 13, 2023

As a paying customer, this is 100% true. Reputation helps a lot, the core of the product being opensource for a quick skim for some sanity helped. Support being there to help with questions about why X was done the way it was helped. We don't care or have the time to audit further. The company pays _others_ to help audit them, iirc Latacora.

bradfitz · on April 13, 2023

Yes, we work with Latacora. (https://tailscale.com/blog/latacora-and-tailscale/)

pdpi · on April 13, 2023

> Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

Because it's not black and white.

I can easily see myself in a position where I have influence over the VPN we use, while not having control over the finance people using Windows (because of the Windows-only accounting software) and the designers using Macs (for all the usual reasons). As a more general point, "I have stricter openness requirements for my infrastructure than I do for client devices" seems like a reasonable position to hold.

I do appreciate that the armchair CIO crowd keeps raising auditably/trust/security as being much bigger issues than they are in practice, but I have had engagements in the past where the client demanded that any softare we wrote on their behalf had to be open-sourced. So such requirements are in practice unusual but by no means unheard of.

e28eta · on April 13, 2023

Have you considered the benefit of open source serving as an example of how to make a non-standard use case work? </sarcasm>

I think I just wanted to point out the similarity. You _have_ to maintain the dev environment, and you _have_ to have it documented in some form for new hires. If that was public, we could be reading a blog post 6 months from now from some company that ran into similar-shaped problems in their dev environment, and solved it using “the power of open source”

I don’t think that’s a compelling enough reason for someone trying to run a business, but I hope at least some people appreciate the irony here.

ignoramous · on April 13, 2023

> Their trust relationship is with other companies, not with codebases they don't have time to build or audit anyway.

Sums up the book, Selling the Invisible.

(a pretty good summary: https://penniesintofortunes.com/2016/09/20/selling-the-invis...)

oefnak · on April 14, 2023

While I totally agree with you, the fact that nobody's both paying you and asking for more open source, could also be due the fact that they are no customers because it's not more open source.

jzelinskie · on April 13, 2023

I don't think that their choice to keep their server software proprietary is baffling. They're going to architect it so that it's easiest/best for their team to run which is not the same what's easiest/best for EVERYONE to run as is the typical goal of open source software.

ceejayoz · on April 13, 2023

There's two concerns: "can I run it myself", and "what does this security-critical third-party code do?"

Open sourcing helps with the latter, even if the code's not very well optimized for general purpose use for the former case.

judge2020 · on April 13, 2023

The concern is that people will assume the OSS is for both scenarios, so even if you close issues and have a PR bot rejecting PRs, you'll still get both support tickets and (good|bad) PR from people trying to run the mainline system themselves.

Supporting development on a made-for-self-hosting OSS project is the most user friendly option and allows them to avoid needing to handle the woes of public code.

daveidol · on April 13, 2023

What about open sourcing the code with a README stating it is only for transparency, comes with no support, etc. and disabling GitHub issues?

Operyl · on April 13, 2023

Users will still just spam the ticketing system. (Their support email).

wg0 · on April 13, 2023

There is some money (rightly and legitimately) to be made as well. We all are Craftsmen. We all deserve some.

There's headscale[1] that fills the gap somewhat.

[1] https://github.com/juanfont/headscale

ehPReth · on April 13, 2023

woah, they have an employee contributing to head scale now?

skrtskrt · on April 13, 2023

parent comment was edited, originally said "i thought they [Tailscale] hated that thing [Headscale]"

they literally have on their website that they encourage people to use Headscale if they want an open source solution for the coordination server or just to learn how the internals work, they coordinate with Headscale maintainers to avoid breakages, and they don't discourage (but also don't require) employees contributing to Headscale

EMIRELADERO · on April 13, 2023

> parent comment was edited, originally said "i thought they [Tailscale] hated that thing [Headscale]"

I don't know where you got this from. I did not say that anywhere.

ehPReth · on April 13, 2023

that was me! then i realized in the apps (except iOS?) they allow you to specify your own coordinator server... so if they hated them then they'd never build that in! so i removed that part of my comment before it had any replies that I was aware of

edit: see comment below me; iOS now has that ability!

mihaip · on April 13, 2023

Also possible on iOS starting in 1.38.1: https://tailscale.com/changelog/#2023-03-14-client

ehPReth · on April 13, 2023

Awesome!

bradfitz · on April 13, 2023

iOS allows that now too as of two releases ago (in 1.38.1).

Go to the system Settings app, scroll down down down through your apps to Tailscale, and set the "Alternate Coordination Server URL".

skrtskrt · on April 13, 2023

you are not the parent comment to my comment - not sure if I used "parent comment" clearly

ehPReth · on April 13, 2023

ah! very cool. thanks for the context :). I must have missed the shoutout on the website. It's cool that they coordinate with headscale to avoid breakages as well

api · on April 13, 2023

Heh... we went spelunking into Apple's open source for a not too different reason a few years ago!

https://www.zerotier.com/blog/how-zerotier-eliminated-kernel...

https://apple.stackexchange.com/questions/337715/fake-ethern...

We did this to eliminate the need for kernel extensions to create L2 interfaces.

skrtskrt · on April 13, 2023

funny side note about Tailscale/iOS: one of the first things I tried with Tailscale was TailDrop-ing a random file to my iPad.

I chose a dotfile like .gitconfig, but then the iOS Files app wouldn't show it to me. It would show me that there was one item of nonzero size in the folder, but I could not see the actual file, and now (I assume) I can't delete it either unless I delete the folder.

spiantino · on April 13, 2023

I wonder if this will help with the iOS app's battery drain issues. VPNs of any kind come with a battery penalty, but some of these issues might have exacerbated that

neilalexander · on April 13, 2023

Using regular Wireguard with on-demand enabled and persistent keepalives disabled is about as good as VPN on iOS can possibly get. Without the keepalives, there is no idle protocol traffic and it is therefore barely consuming any battery power until traffic is sent over the tunnel.

bradfitz · on April 14, 2023

This is part of it. There are about five more parts.

heyflyguy · on April 13, 2023

I love tailscale, but an unusual thing happened when I upgraded my home pc to Windows 11. Now my tailscale still connects to all of my other computers and servers, but I can't browse any network shares. I guess authentication changed in Windows 11 and is not backward-compatible. Tailscale worked marvelously for me for ages, but Windows 11 for sure threw a wrench in the gears!

unraveller · on April 13, 2023

I had to manually enable workstation service on my win11 install to get my home network shares working and before that in win10 I had force guest mode just to browse freely and remote desktop.

  AllowInsecureGuestAuth 1
 
 HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters

heyflyguy · on April 13, 2023

You are quite a person and I thank you

skrtskrt · on April 13, 2023

their clients have to do a lot of platform specific configuration to make it appear magic- OS/distro-specific network configuration is always going to be a hairy, moving target - open an issue on their GitHub and they'll probably help you start poking at it

omneity · on April 13, 2023

Slightly related, I use Zerotier for some personal stuff as a managed VPN, including connecting several Apple devices. I never see it mentioned in HN (Zerolove for Zerotier lol), does anyone who used both knows what am I missing out for not using Tailscale?

stingraycharles · on April 13, 2023

We recently migrated away from our old OpenVPN deployment, and Zerotier was on the table. In the end we picked Tailscale because it’s based on open protocols and more transparent in how they work.

From a different perspective, it appears Zerotier is investing a lot of engineering efforts in maintaining its own protocol, with arguable benefits. Tailscale seems to be deploying new features on an almost weekly basis and are having huge velocity, and to me it just appeared their trade-offs made more sense, and I have more confidence in their business doing great on a long term.

sconi · on April 13, 2023

yeah, different protocol. Similar concepts.

Tailscale has captured mindshare, reminds me some of Cloudflare. There are many solutions out there offering unique value props relative to what Tailscale is doing, and it all comes down to customer needs. I'm not sure how broadly Tailscale has been adopted outside of the dev / HN crowd, so that's always something to dig into.

stingraycharles · on April 14, 2023

Let’s just say that as the person who introduced Tailscale within our org, it has been one of the most easiest solutions I ever had to convince my org of. Given the relative intrusiveness of switching a VPN provider (it touches a lot of core infrastructure), I expected a lot more resistance, but everybody was on board immediately.

In all honesty, Tailscale being so good, we never seriously looked at Zerotier. We only evaluated it on paper, not actually tried to use it.

ElectricalUnion · on April 14, 2023

Zerotier doesn't have support for runnning without TUN devices without their "Business Source License"d libzt.

Being able to run Tailscale under otherwise very restricted userland environments, without changing too much code is nice.

alex_dev · on April 14, 2023

I also use ZeroTier because I had several problems using TailScale on Android. They're known issues and don't seem like they'll ever be fixed.

i386 · on April 13, 2023

Product person here, I’ve been using and loving Tail scale for about a year now and I don’t understand when I have to pay and why and I really don’t understand why any of it is OSS. Where’s the revenue and where’s the moat?

stingraycharles · on April 13, 2023

We’re a customer and paying Tailscale a decent amount of $ a month and planning on expanding our usage more and more. They’re probably just using a generous free tier as a growth tool.

dgentry · on April 13, 2023

https://tailscale.com/blog/free-plan/

Being a mesh means most connections are direct and do not go through any infrastructure which costs Tailscale money, making a Free tier economically workable.

0xbadcafebee · on April 13, 2023

Isn't there usually vendor documentation of internal system calls? I know Microsoft used to provide it, though I think you had to pay for MSDN? I assume Apple has something similar.

In terms of finding a syscall, you don't usually need to dig through source code. On Linux, strace and gdb will show you a program's active system calls, and on MacOS it'd be dtruss.

raggi · on April 13, 2023

You need both at depth. strace and dtruss translate binary information to a textual representation but they’re only as good as their coverage. When you get closer to the edges of common use they fall down. On macOS / iOS using dtruss for all programs comes with extra burdens, too.

We (tailscalar here!) have another post out today on more throughput optimizations and the story there is related in this way: when we started working on that code path the docs didn’t exist! (There are some docs now, in Linux). But either way the stuff is seldom used outside of specialist implementations - so when you need to understand what’s going on you’re headed for the source, or a debugger, or a symbolic tracer rather than an interpretive one.

macshome · on April 13, 2023

Things like this are documented in the Apple sources and header files. Even if there are stubs of documentation they pretty much just give you the names of files, or man pages, to go read.

Once you get up out of the POSIX layers the Apple documentation has improved a lot lately.

EdSchouten · on April 14, 2023

I see this was eventually fixed by manually declaring a structure that matches struct ifreq.

Instead of doing that, I would recommend submitting a patch against golang.org/x/sys to get that autogenerated. The Go folks tend to merge such PRs very quickly (<48h).