Hacker News new | past | comments | ask | show | jobs | submit login
Improving Tailscale via Apple’s open source (tailscale.dev)
441 points by mfiguiere on April 13, 2023 | hide | past | favorite | 62 comments



There are Apple-supported ways to solve both these issues built in to the `NEPacketTunnelProvider` framework. No need to drop into obscure `ioctls`!

The correct way to solve the looping issue is by configuring the `excludedRoutes` property on the `NEIPv4|6Settings` object associated with your network extension. See: https://developer.apple.com/documentation/networkextension/n.... I imagine this is an important property for a dynamic mesh network to keep updated regularly as the topology changes.

The traffic egressing the wrong interface issue happens because the system handles swapping between WiFi and Cellular radios quite judiciously, and the BSD socket API isn't plugged into this management (AFAIK there's no way to get notified when the delegated interface changes, so you don't really know when to re-bind). Instead, you should use the provided `packetFlow` to write IP packets to the tunnel, which will always egress via the "correct" (whichever one the system decides is best at the moment) interface.

Tailscale's go library would need to implement hooks to call out to the platform implementation for Apple vs others, which is annoying. But, it's the right way and avoids these weird edge cases and bugs.


It is the usual issue that Linux folks keep looking at NeXTSTEP derivatives as yet another UNIX clone, instead of what is on top of it.


Would be interesting to hear the thoughts of Tailscale folks regarding your suggested fix.


In addition to bradfitz returning to comment, I managed to get a response out of apenwarr on Twitter https://twitter.com/apenwarr/status/1647058229304651782


We know about that. Very early versions of Tailscale tried to use it. It's not enough, though. A number of things in Apple's APIs are 90% of what we need.


> Very early versions of Tailscale tried to use it, It’s not enough though.

I know you’re not the author, but I think that type of background would have been an interesting/helpful addition to the blog post and have given it more grit/mileage. Personally I found the post to be link heavy and shallow. I read it wondering why you’re going through all this trouble, instead of understanding a clear problem and solution. A reader like I would be super interested in an explanation of the shortcomings of Apple’s framework and what product requirements or technical constraints make Tailscale’s bespoke approach necessary. I think it sets the stage better for a “we’re off the rails good thing Apple publishes the ifconfig sources, +1 for open source” type of message.

> A number of things in Apple’s APIs are 90% of what we need.

I know the feel.


Also note there's an Android analog: https://developer.android.com/reference/android/net/VpnServi.... You call `protect` on sockets that should not send data through your tunnel. It's a bit more flexible and straightforward IMHO (:


Shared / Open source is super helpful when you're trying to figure out how to do something unusual that's not on the paved path covered by the docs and examples.

In the early 2000s, when doing Mac drivers for an OS-bypass HPC NIC (Myrinet, a sort of pre-cursor to inifiband/roce), I spent 90% of my time reading the Apple source and about 10% of my time coding. In my case the first goal was to figure out how to use BSD ioctls rather than Mach based userclient stuff, as we had tons of shared kernel and userspace code for ioctls (across linux, dec unix, solaris, freebsd, windows, etc). And the second was figuring out exactly which iomemory descriptor variant would work well for registering memory.


> Shared / Open source is super helpful when you're trying to figure out how to do something unusual that's not on the paved path covered by the docs and examples.

Not just the unusual, but the usual too: Just this past month or so, a user on Fly.io forums helped debug just why reading from stdin / writing to stdout didn't work for non-root users (https://community.fly.io/t/10375/8). It all started from the fact that Fly.io had open sourced a snapshot-in-time of their init process back in 2021.

Also, it isn't uncommon in the Android world for developers to routinely find the right APIs to use by reading the OS source code.


A very meaningful point in my learning as a junior developer was when I realized I should regularly click go to definition on standard library functions.

I needed to tweak the behavior of an Android standard library function, so I copied it's source code (~20 lines) and made some edits. Blew my mind at the time it was that simple.


Interesting background. I've recently been toying with the idea of writing some open-source Mac DriverKit drivers for Mellanox cards, so I'm not forced to pay through the nose for ATTO cards (which are just rebranded Mellanox cards.)

Sadly all the DriverKit stuff appears to be completely closed source. At least its user-space though, so perhaps a bit less painful than normal kernel driver development...


On the flip side of that, I've recently had to deal with a few issues with some proprietary software, where it would have been really, really useful to be able to look through the source code, but I couldn't. Instead I had to rely on incomplete documentation, and customer support that didn't have enough technical knowledge to fully understand the problem.


Nice blog post! I like the way this problem is approached and solved. Peeking under the "table" to see how the internals work is actually creative when trying to fix things "upstream" on an upper abstraction layer.

Unfortunately, I still don't love Tailscale. I do like it, a lot even. But their refusal to open-source all their clients (and the server) is baffling, especially considering that they have an employee contributing to Headscale, the community-led FOSS tailscale server. At that point just open-source the damn thing!

Issues aside, it's still a great product. It actually felt like magic when I first used it in a way few technologies have.


We'd planned to open source our control server but it wasn't in a good shape to release at the time we released the other stuff. Then Headscale came along and removed all need for us to do so. Headscale is _much_ easier for people to run & understand. The Tailscale closed source one is kinda a monster, built for a very different scale. We're busy enough without also helping people struggling to run our control plane. We'd rather focus the community (or the subset of the community that wants to run their own server) to use Headscale instead.

And if you don't trust our server, use https://tailscale.com/blog/tailnet-lock/ and then you don't need to trust us. Or run Headscale. :)


Thank you for your answer.

Honestly I (and I suspect most here) wouldn't mind if the server was not easy to set up. The goal here would be transparency. It's true that, with lock, a user can run Tailscale without having to trust it. But it is still a good show of good faith and goodwill to have everything in your infrastructure be as transparent as possible, barring actual user data and service credentials.

Same concept applies to the proprietary GUI clients. What's the rationale for not, at least, making their source code publicly available for reproducible builds (or, if those are too complicated to implement, the same goodwill and transparency I talked about)? You wouldn't even need to actually support the source releases.


FWIW, you can run open source Tailscale on macOS and Windows without the GUI.

And you can already do reproducible builds of our Windows build: the `tailscaled.exe` service and `tailscale.exe` CLI are open source. Only the GUI systray client (tailscale-ipn.exe) is closed.

For macOS and iOS, our development environment is kinda hell. It's great when it's finally working, but hard to get it set up and keep it in a happy place ... you have to get users into the right Apple teams for the right Network Extension Entitlements/notarization/etc, disable SIP to work on certain types of builds, be sure to clean up Xcode temp folders in ~/Library/ so the system doesn't pick up the wrong builds, etc, etc. Then you think things are good and in a few months a random keychain cert expires and you have to repeat the dance. Which sometimes involves a few macOS reboots for some reason.

Yes, maybe we could say that the macOS/iOS/Windows GUI source releases are "not supported" but that will stop approximately nobody from asking questions anyway and consuming time.

Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

The common reply is: but trust! but security! but auditability! Know how many corporate/paying customers have asked for open source Windows/macOS/iOS GUIs? Zero that I've heard about. Their trust relationship is with other companies, not with codebases they don't have time to build or audit anyway. Or they trust us that all the interesting code (non-GUI wrappers) is open source anyway and read that.

So, yes, we _could_ open source our GUIs. But it's not worth the resulting pain. It'd save me writing comments like this one, but then I'd be answering Xcode/mac build questions instead, and I'd much prefer writing this.


> Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

Yes, exactly. I'm one of the biggest FOSS advocates around, but if someone is running Mac or Windows, I have a hard time accepting criticism from them about closed source software. They are obviously pretty comfortable with closed-source/proprietary blobs. If you're running closed systems, you're voting for closed software. It's pretty hypocritical to criticize others to go open when you don't.

With my FOSS projects I do try to support macs, and if it's relatively easy then windows, but I don't blame anyone for taking a "like for like" approach to their software. Open for people who vote open, closed for people who vote closed. Live by the sword, die by the sword, or something like that.

Seems quite fair to me. It's also arguably better for Linux/open source, because if more software vendors took this approach with their software, we'd see more people choosing Linux because it would give us another competitive advantage. That leads to more Linux users, and the more Linux users there are the better it is for the whole community. More software vendors will support Linux, for example.


You don't need to be vegan to care about animal rights.

I can consume closed-source software and care about FOSS at the same time, do not discard people who are on your side just because they aren't extreme enough for your tastes.


> You don't need to be vegan to care about animal rights.

Yes, but that's not really equivalent to this situation. This isn't just a "do you care about FOSS" question like the "do you care about animal rights" analogy that you raised.

In this situation, a person who pays for and uses closed source software (and not just one application, but the entire OS), is criticizing another company for making closed source software. It would be more equivalent to somebody who is a meat purchaser/eater criticizing a company for making meat.


As a paying customer, this is 100% true. Reputation helps a lot, the core of the product being opensource for a quick skim for some sanity helped. Support being there to help with questions about why X was done the way it was helped. We don't care or have the time to audit further. The company pays _others_ to help audit them, iirc Latacora.



> Plus I always come back to the question: if you care about open source so much, why aren't you running Linux?

Because it's not black and white.

I can easily see myself in a position where I have influence over the VPN we use, while not having control over the finance people using Windows (because of the Windows-only accounting software) and the designers using Macs (for all the usual reasons). As a more general point, "I have stricter openness requirements for my infrastructure than I do for client devices" seems like a reasonable position to hold.

I do appreciate that the armchair CIO crowd keeps raising auditably/trust/security as being much bigger issues than they are in practice, but I have had engagements in the past where the client demanded that any softare we wrote on their behalf had to be open-sourced. So such requirements are in practice unusual but by no means unheard of.


Have you considered the benefit of open source serving as an example of how to make a non-standard use case work? </sarcasm>

I think I just wanted to point out the similarity. You _have_ to maintain the dev environment, and you _have_ to have it documented in some form for new hires. If that was public, we could be reading a blog post 6 months from now from some company that ran into similar-shaped problems in their dev environment, and solved it using “the power of open source”

I don’t think that’s a compelling enough reason for someone trying to run a business, but I hope at least some people appreciate the irony here.


> Their trust relationship is with other companies, not with codebases they don't have time to build or audit anyway.

Sums up the book, Selling the Invisible.

(a pretty good summary: https://penniesintofortunes.com/2016/09/20/selling-the-invis...)


While I totally agree with you, the fact that nobody's both paying you and asking for more open source, could also be due the fact that they are no customers because it's not more open source.


I don't think that their choice to keep their server software proprietary is baffling. They're going to architect it so that it's easiest/best for their team to run which is not the same what's easiest/best for EVERYONE to run as is the typical goal of open source software.


There's two concerns: "can I run it myself", and "what does this security-critical third-party code do?"

Open sourcing helps with the latter, even if the code's not very well optimized for general purpose use for the former case.


The concern is that people will assume the OSS is for both scenarios, so even if you close issues and have a PR bot rejecting PRs, you'll still get both support tickets and (good|bad) PR from people trying to run the mainline system themselves.

Supporting development on a made-for-self-hosting OSS project is the most user friendly option and allows them to avoid needing to handle the woes of public code.


What about open sourcing the code with a README stating it is only for transparency, comes with no support, etc. and disabling GitHub issues?


Users will still just spam the ticketing system. (Their support email).


There is some money (rightly and legitimately) to be made as well. We all are Craftsmen. We all deserve some.

There's headscale[1] that fills the gap somewhat.

[1] https://github.com/juanfont/headscale


woah, they have an employee contributing to head scale now?


parent comment was edited, originally said "i thought they [Tailscale] hated that thing [Headscale]"

they literally have on their website that they encourage people to use Headscale if they want an open source solution for the coordination server or just to learn how the internals work, they coordinate with Headscale maintainers to avoid breakages, and they don't discourage (but also don't require) employees contributing to Headscale


> parent comment was edited, originally said "i thought they [Tailscale] hated that thing [Headscale]"

I don't know where you got this from. I did not say that anywhere.


that was me! then i realized in the apps (except iOS?) they allow you to specify your own coordinator server... so if they hated them then they'd never build that in! so i removed that part of my comment before it had any replies that I was aware of

edit: see comment below me; iOS now has that ability!


Also possible on iOS starting in 1.38.1: https://tailscale.com/changelog/#2023-03-14-client


Awesome!


iOS allows that now too as of two releases ago (in 1.38.1).

Go to the system Settings app, scroll down down down through your apps to Tailscale, and set the "Alternate Coordination Server URL".


you are not the parent comment to my comment - not sure if I used "parent comment" clearly


ah! very cool. thanks for the context :). I must have missed the shoutout on the website. It's cool that they coordinate with headscale to avoid breakages as well


Heh... we went spelunking into Apple's open source for a not too different reason a few years ago!

https://www.zerotier.com/blog/how-zerotier-eliminated-kernel...

https://apple.stackexchange.com/questions/337715/fake-ethern...

We did this to eliminate the need for kernel extensions to create L2 interfaces.


funny side note about Tailscale/iOS: one of the first things I tried with Tailscale was TailDrop-ing a random file to my iPad.

I chose a dotfile like .gitconfig, but then the iOS Files app wouldn't show it to me. It would show me that there was one item of nonzero size in the folder, but I could not see the actual file, and now (I assume) I can't delete it either unless I delete the folder.


I wonder if this will help with the iOS app's battery drain issues. VPNs of any kind come with a battery penalty, but some of these issues might have exacerbated that


Using regular Wireguard with on-demand enabled and persistent keepalives disabled is about as good as VPN on iOS can possibly get. Without the keepalives, there is no idle protocol traffic and it is therefore barely consuming any battery power until traffic is sent over the tunnel.


This is part of it. There are about five more parts.


I love tailscale, but an unusual thing happened when I upgraded my home pc to Windows 11. Now my tailscale still connects to all of my other computers and servers, but I can't browse any network shares. I guess authentication changed in Windows 11 and is not backward-compatible. Tailscale worked marvelously for me for ages, but Windows 11 for sure threw a wrench in the gears!


I had to manually enable workstation service on my win11 install to get my home network shares working and before that in win10 I had force guest mode just to browse freely and remote desktop.

  AllowInsecureGuestAuth 1
 
 HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\LanmanWorkstation\Parameters


You are quite a person and I thank you


their clients have to do a lot of platform specific configuration to make it appear magic- OS/distro-specific network configuration is always going to be a hairy, moving target - open an issue on their GitHub and they'll probably help you start poking at it


Slightly related, I use Zerotier for some personal stuff as a managed VPN, including connecting several Apple devices. I never see it mentioned in HN (Zerolove for Zerotier lol), does anyone who used both knows what am I missing out for not using Tailscale?


We recently migrated away from our old OpenVPN deployment, and Zerotier was on the table. In the end we picked Tailscale because it’s based on open protocols and more transparent in how they work.

From a different perspective, it appears Zerotier is investing a lot of engineering efforts in maintaining its own protocol, with arguable benefits. Tailscale seems to be deploying new features on an almost weekly basis and are having huge velocity, and to me it just appeared their trade-offs made more sense, and I have more confidence in their business doing great on a long term.


yeah, different protocol. Similar concepts.

Tailscale has captured mindshare, reminds me some of Cloudflare. There are many solutions out there offering unique value props relative to what Tailscale is doing, and it all comes down to customer needs. I'm not sure how broadly Tailscale has been adopted outside of the dev / HN crowd, so that's always something to dig into.


Let’s just say that as the person who introduced Tailscale within our org, it has been one of the most easiest solutions I ever had to convince my org of. Given the relative intrusiveness of switching a VPN provider (it touches a lot of core infrastructure), I expected a lot more resistance, but everybody was on board immediately.

In all honesty, Tailscale being so good, we never seriously looked at Zerotier. We only evaluated it on paper, not actually tried to use it.


Zerotier doesn't have support for runnning without TUN devices without their "Business Source License"d libzt.

Being able to run Tailscale under otherwise very restricted userland environments, without changing too much code is nice.


I also use ZeroTier because I had several problems using TailScale on Android. They're known issues and don't seem like they'll ever be fixed.


Product person here, I’ve been using and loving Tail scale for about a year now and I don’t understand when I have to pay and why and I really don’t understand why any of it is OSS. Where’s the revenue and where’s the moat?


We’re a customer and paying Tailscale a decent amount of $ a month and planning on expanding our usage more and more. They’re probably just using a generous free tier as a growth tool.


https://tailscale.com/blog/free-plan/

Being a mesh means most connections are direct and do not go through any infrastructure which costs Tailscale money, making a Free tier economically workable.


Isn't there usually vendor documentation of internal system calls? I know Microsoft used to provide it, though I think you had to pay for MSDN? I assume Apple has something similar.

In terms of finding a syscall, you don't usually need to dig through source code. On Linux, strace and gdb will show you a program's active system calls, and on MacOS it'd be dtruss.


You need both at depth. strace and dtruss translate binary information to a textual representation but they’re only as good as their coverage. When you get closer to the edges of common use they fall down. On macOS / iOS using dtruss for all programs comes with extra burdens, too.

We (tailscalar here!) have another post out today on more throughput optimizations and the story there is related in this way: when we started working on that code path the docs didn’t exist! (There are some docs now, in Linux). But either way the stuff is seldom used outside of specialist implementations - so when you need to understand what’s going on you’re headed for the source, or a debugger, or a symbolic tracer rather than an interpretive one.


Things like this are documented in the Apple sources and header files. Even if there are stubs of documentation they pretty much just give you the names of files, or man pages, to go read.

Once you get up out of the POSIX layers the Apple documentation has improved a lot lately.


I see this was eventually fixed by manually declaring a structure that matches struct ifreq.

Instead of doing that, I would recommend submitting a patch against golang.org/x/sys to get that autogenerated. The Go folks tend to merge such PRs very quickly (<48h).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: