Hacker News new | past | comments | ask | show | jobs | submit login
Our User-Mode WireGuard Year (fly.io)
482 points by xrd on Feb 9, 2022 | hide | past | favorite | 165 comments



Back in the day (nearly 30 years ago) people would run a user-mode stack to obtain Internet connectivity via a (dial-up) Unix shell account. The program was "slirp" which was named after SLIP/CSLIP, but then upgraded to support PPP once that became a thing.

https://en.wikipedia.org/wiki/Slirp


I was using SLIRP a few years ago to tunnel traffic through a cheap OpenVZ VPS, (for downloading Linux ISOs). I had to manually patch the TCP window size and recompile the binaries to get decent speed as the original code doesn't support TCP Window Size Scaling. I tried to upstream the patch to Debian but the maintainer wouldn't reply. It worked well enough but now I use wireguard.

Modified SLIRP code is also found in VirtualBox, Qemu, UML and other virtualization software, for sharing the host connection in NAT mode.


SLIP was a problem for us when I worked at an ISP in 1995. Cheapsakes would buy our barebones "shell access only" package and then run SLIP on our shell server. We eventually wrote a cron job to kill their processes when we found them. Sneaky ones just renamed the binary.


Heh. I used Netcom and they did not outright ban slirp, but they did have a script running that would renice (lower the priority) of user processes once they reached a certain total cpu time (two minutes). I discovered this feature shortly after they implemented it because my slirp network performance went down. Note that I am referring to cpu time and not run time. Most processes spend most of their time waiting for something, so accumulating two minutes of cpu time under slirp usually took several hours.

My solution was to write a program (I called checkcpu) that would spawn a process (slirp) and periodically check its total cpu usage. When it hit the threshold (110 seconds), it would spawn a child and suspend the parent (seamlessly passing the current run state to the child). It worked great and they either never noticed what I was doing, or they did not care. Over time, the number of suspended parent processes would rise, but it never became a problem.


I was that guy. I didn't even know ISPs didn't like it.

I thought that's just how you got into the interwebs without running that obnoxious, buggy faux winsock client.


Man I miss the old internet when things were more playful and that was the extent of harm being done. Not that I condone cheapskates stealing your bandwidth.

Maybe it's just rosy nostalgia.


Oh, believe me, that wasn't the worst. The worst was an undergrad in a university computer lab who harnessed many dozens of workstations to DDoS us off the net when he got into an IRC argument with someone.


Haha, right, my nostalgia blocked off that part of it. That was definitely a thing. IRC BNC:s became mandatory to avoid the risk.


Remember when security was pretty much just scanning floppies for viruses before you ran the program?



Heh, as a middle time between the dial-up days and the WireGuard days, I used to use PPP as a poor-man's VPN. SSH to a host inside the work network and run pppd on both sides. Tada, suddenly my home computer is on the work network, NATted from the jump box.


What a trip back in time. I used this and TIA [1] as a youth.

[1] https://en.wikipedia.org/wiki/The_Internet_Adapter


Usermode WireGuard would be a big deal. I maintain a list[0] of tunneling solutions, and one of the only limitations of systems built on WireGuard is the requirement for admin privileges. Even with the performance hit from running outside the kernel, UDP-based tunnels have a lot of advantages for multiplexing channels. Pretty much your only mainstream options today are QUIC and WireGuard, and only QUIC is intended to run in userspace.

I'd have to dig into the details more, but something like this might allow you to implement a simple tunneling system based on WireGuard that runs in the kernel if you have the privileges, otherwise falls back to usermode and is no worse than QUIC in terms of performance. That would be awesome.

[0]: https://github.com/anderspitman/awesome-tunneling


Very cool!

Found a gap, Linux Foundation's FD.io's VPP (a high performance network virtual switch) has native wireguard support as well, all in userspace. Support here means you can do full kernel bypass from the app all the way down to the NIC card (e.g. via DPDK).

https://docs.fd.io/vpp/20.09/d5/d54/wireguard_plugin_doc.htm...

I'll open a PR on this later.


> Pretty much your only mainstream options today are QUIC and WireGuard, and only QUIC is intended to run in userspace.

Not sure it fits your "mainstream" qualification, but many projects ago I used Airhook to help create a userspace, application-layer('ish), multi-channel virtual network: http://airhook.ofb.net/ (https://github.com/egnor/airhook)

Airhook is a relatively low-level library that handles framing and flow control; it's not a functional solution on its own. But that seems to be what you're getting at--something you can deeply integrate into your application, not a separate service. Though, I guess containers have sort of muddied that distinction.


It sounds like we could have a generic userspace tool that proxies any connection to a WireGuard server. Similar to ssh -L, it would listen on a TCP/UDP port locally (or talk the SOCKS protocol) and convert that to IP packets over the WireGuard connection (using a userspace TCP or UDP implementation for that side).

It looks like Fly.io has all the bits, they just need to be packaged as a stand-alone tool rather than built into flyctl and only talk SSH.


Tailscale will do this!

    tailscaled --tun=userspace-networking --socks5-server=localhost:1081


Sadly, on the only machine that I would have wanted this on, where I didn't have root access, this has never worked for me.

I should try to recreate the logs and issue for the tailscale folks.


Our customers at Render run Tailscale in user mode every day. Here's the repo they use: https://github.com/render-examples/tailscale.

An example of using Tailscale to access VSCode in the cloud: https://render.com/blog/host-a-dev-environment-on-render-wit...


I had issues with it as well, was following the Tailscale guide for getting userspace running in Azure App Service and I could not get it to work.


With any WireGuard server or a custom tailscale server/service?


SSH can do this without any WireGuard:

https://rkeene.org/viewer/tmp/ssh-ip-tunnel.txt.htm


There's nothing userspace about this, you are using a tun device with the actual kernel IP stack.


One thing to watch out for with that setup is using SSH for TUN devices can suffer from TCP-over-TCP performance issues, aka "TCP meltdown", when there's packet loss. You can avoid this by using normal SSH tunnels, a la ssh -L or -R, which unpack the individual TCP streams and multiplex them over a single connection. Or if you need a more traditional VPN setup use WireGuard.


Is https://tailscale.com/ not "usermode WireGuard"? I've been playing with it for a while now (it has a fairly generous free tier) and am quite impressed. I can access any of my LAN machines (my servers, my NAS, etc.) from anywhere that is also connected to the same network, and the names work for DNS as well.


1. Tailscale is amazing. I hate them so much. (We use Tailscale and are very happy with it.)

2. Tailscale is user-mode WireGuard.

3. "User-mode WireGuard" in the sense this post uses the term is a misnomer and refers to the fact that we run TCP/IP itself in userland (Tailscale normally runs through a tunnel device and uses your native TCP/IP stack).

4. But Tailscale also has code to do user-mode TCP/IP (they've got it running in a browser with wasm).


Last I heard[0] they were experimenting but hadn't shipped it. AFAIK their client still requires root, no?

Running on wasm sounds awesome. This[1] looks like it. Do you know how they're doing the actual networking? WebRTC tunnel?

[0]: https://news.ycombinator.com/item?id=24483173

[1]: https://twitter.com/bradfitz/status/1451423386777751561?lang...


> Last I heard[0] they were experimenting but hadn't shipped it. AFAIK their client still requires root, no?

Tailscale's gvisor/netstack-based userspace networking mode has been supported and in wide use for quite some time. It's the default on Synology DSM7, for instance.

You don't need root when you run tailscaled with `--tun=userspace-networking`.

Peers can still connect inbound to the non-root tailscaled, but to connect _out_ to other peers, you need to use tailscaled's HTTP or SOCKS5 proxy, which are also flags to tailscaled, to specify what port they listen on.


Thanks for the update!

Do you have any links that talk more about how the wasm stuff works? I'd love to read more about that.


Yeah, their client is always going to require privileges, because it needs to enable every other program on the system to interact directly with remote hosts transparently. User-mode TCP/IP works for us because we own the client-side program that our users run to talk to stuff on Fly.io.


I think Tailscale uses user-mode TCP/IP (also gVisor netstack) for some client devices, like iOS? But could be wrong here.


We use it on all platforms _except_ iOS, for binary size/memory reasons.

(iOS 15 bumped the Network Extension memory limit to 50 MB, but we still need to be super trim for iOS 14's 15 MB limit)


LOL, I was precisely wrong.

Is there actually a preference for user-mode networking? I assume that’s primarily about control and flexibility?

Either way, I hope that the PacketBuffer changes can help reduce footprint after issues are shaken out.


Fascinating!


It depends on what sort of tunneling you're doing. If you just want a general-purpose private VPN, Tailscale is amazing. That list is more focused on the use case where you want to host a public server on a machine that isn't accessible to the internet (NAT, corporate firewall, etc). Think a shared Jellyfin server for your friends and family.

You can use Tailscale here but you'll need to separately run a reverse-proxy on a public machine. There are more moving pieces but if you're using Tailscale already then it's a good option.


I wish I could run two separate Tailscale networks on a single device, one for business and one for personal (for example). Would make it tremendously more useful.


There's an existing GitHub ask for this to be implemented. It's not terrible jumping between work and personal and work environments but it would be nice if I didn't have to.


Is this different from sharing one device with other accounts?


Headscale seems even better! They've taken what tailscale has done and improved it even more by allowing it to be a completely self hosted and private solution.


TunSafe also runs on userspace: https://github.com/TunSafe/TunSafe


It looks like from the source code that TunSafe opens up a tunnel device, in which case it's doing TCP/IP in the kernel, not in userland.


It opens tun/tap device just to get/push packets from/to virtual interface. Using tuntap is very common for all VPNs like OpenVpn2.

There isn't kernel driver like in real WireGuard.


I feel like the lede is buried here... If I'm reading correctly I can use flyctl to set up an SSH encrypted tunnel over user-space TCP/IP over wireguard encrypted IP packets over UDP over WebSockets over HTTP over TLS encrypted TCP/IP? And not even just for fun, but to solve real problems in production. I can see why you might have mixed feelings about that!


What’s even funnier is that if I read it right, it’s ipv6 only, and WSL in Windows doesn’t support IPV6 routing over WireGuard. The kernel is missing some important CONFIG flags when it was compiled to route the packets properly.


It's running its own TCP stack in userspace though so the hosting kernel's routing abilities aren't an issue.

I am both impressed and horrified, and I mean that in the best possible way.


> I am both impressed and horrified, and I mean that in the best possible way.

This sentence style sounded familiar in my head. Looking at the username. Oh, of course, that's him!


mstcat approves of this :D

(for those who don't know, http://trout.me.uk/mstcat.jpg )


This post pairs nicely with Julia Evans’ post on why most people use the Linux kernel’s TCP/IP stack and why a few others would bother with a userland stack.

In her post she doesn’t mention fly.io’s motivation for doing userland TCP-IP: a nicer end user client experience.

If I read this correctly then fly.io did all this work to make their CLI user experience markedly better. That’s pretty cool. Twist yourself in interesting knots to make your user’s lives better, maybe even in ways they won’t notice!


I was using wireguard-go on FreeBSD jail running on top of an APU2C2 board. Torrenting from my laptop caused wireguard-go cpu usage to spike to high loads and 30-50% CPU usage. Loading wireguard-kmod on the host machine plus some devfs rules dropped the CPU load to 0s.

Not sure what happened there. The processor seems to score less than an RPi4 on Geekbench.


I use one of these as a firewall (running OPNSense) and they're very nice but the CPU is indeed _slow_. It's plenty good enough for everything the firewall does but booting it up takes minutes and that's saying something for FreeBSD.


Odd. I run openbsd on a similar one, and booting is reasonably fast. I even have a linux vm running on it in vmd, and haven’t noticed performance issues with that either.


Pure FreeBSD boots in less than a minute from the SDCard. Have you tried plugging in a serial cable to check what service is slowing down the boot?


> The processor seems to score less than an RPi4 on Geekbench.

The apu2 is an embedded AMD quad core 1GHz SoC consuming 5W. It is not a powerful system by any means and not surprised rivaled by a >1GHz quad core Arm.


The AMD Jaguar cores came out in 2013. On the other hand, the Cortex A53 came out in 2012 so it's still a bit embarrassing for AMD. That was before the AMD renaissance, though.


The Raspberry Pi 4 uses a quad core cluster of Cortex A72 cores, not A53. A72 was released in 2016, but the Pi 4 also has them clocked to run at 1.5GHz. Either way, I don't think it's embarrassing for AMD's 2013 Jaguar to be beaten by cores that are 3 years newer and running at a 50% higher clock speed. I thought Jaguar was pretty cool at the time that it came out, but technology has continued to move swiftly since then.


Fly.io's blog posts are incredible, they really seem to really enjoy what they do and want to share what they've made with everyone else. I love them for that.

I wish that more companies could be like this and skip the corporate BS, it shows that they really have something outstanding to offer.


> I wish that more companies could be like this and skip the corporate BS, it shows that they really have something outstanding to offer.

The nature of the blog typically cater towards the intended audience.

The CIO of Disney doesn't give a sh*t if the protocol is called WireGuard or OpenVPN or that if it uses AES-256 encryption - he/she wants someone to tell them that their developers are securely accessing their infrastructure. Full stop. If/when Fly gets to that level (let's say $500M in revenue) their blog tone will likely change - their audience is almost primarily developers and startup CTOs...for now.


For better or worse, I can guarantee you that we won't ever write articles for the Disney CIO. Unless I get fired.

Whitepapers. They want whitepapers and magic quadrants.


White papers are to CxOs what TED talks are for the uninformed, an easy way to get someone up to speed on a highly complicated subject, and more fool them (meaning the uninformed) if they think it means they are now an expert, which sadly a lot do (think they are now an expert).

I kind of miss the IBM ITSO Redbooks. No idea if they still maintain the same quality today, but in the 90s and pre-internet/google/wiki etc, they were fonts of deep knowledge.


> we won't ever write articles

> Whitepapers.

I probably should have said "content" instead of blog because I agree 100% with this. Point still stands.

What the OP was referring to will likely become part of an engineering blog. a la : https://codeascraft.com/


Correction, magic kingdom quadrants.


So Eden sank to grief, so dawn goes down to day.


I think their super power here is employing a renowned security expert who is an incredibly good communicator!

(And happens to be a top HN contributor)


Nit: THE top HN contributor


my goodness, you're right... https://news.ycombinator.com/leaders (edit: by a factor of ~2, no less!)


I like how he decided he should only copyright his comments from years 2010 + Fn.


And they also employ Phoenix framework creator.


I'm pretty sure their experience running an ISP helps too heh.


It's definitely all my talent that keeps this place running. I'm definitely not just a noisy message board guy who got hired after most of this infrastructure was built and deployed and then just proceeded to make a bunch of message board noise about it.


see, this is why the people love reading what you write. keep giving credit but also having fun!


A lot of (small–medium sized, tech) companies just don’t have a process to get things out on their blog like this. It might be that only a few senior people have the ability to write posts and they are not interested or busy with other things, or it might be that there is a slow review process for posts that makes writing them unpleasant, or it might be that they don’t want to reveal IP or have an opinion and so have little to talk about. Another company that does a good job of writing blog posts, often timely posts about current (and relevant) events, is cloudflare though their posts have a quite different energy to Fly.io’s.



It think it's the only corporate blog I know of that's on my much-read list (ie. every post goes to the top of the pile).


They really are. Feels like working there would be really fun.


Because 'sho_hn brought this up, here's a stab at a pro/con list of building TCP/IP directly into our API the way `flyctl` does:

Pro:

+ Can just run "native" SSH directly over it (or, in our case, use x/crypto/ssh, without modification).

+ Lets `flyctl` offers a `flyctl proxy` command to users, so they can plug their own programs into whatever application they need to use, without asking us to change some proxy we run in our infrastructure.

+ Offers a single security and access control model (IPv6 private networking), rather than something we have to think about on a per-app basis.

+ In theory, we get all this right and never have to think about another network protocol in our infrastructure.

+ Allows existing network management tools and libraries to function directly with Fly.io infrastructure.

+ With the WebSockets gateway, we can do all of this stuff directly in browsers as well; that is, we can present TCP/IP as an API to browser Javascript to do UI stuff (and we're doing more and more UI stuff these days, in Elixir.)

+ Puts more IPv6 in the world.

+ Get to talk to Jason Donenfeld more.

+ Get to write blog posts like this.

Con:

- Spends one (maybe multiple) innovation tokens or whatever you want to call them.

- Way more things can go wrong; relies on state synchronization and on a clear network path between our users and our gateways. Right now, we have to care whether you can speak 51820/udp.

- User-mode TCP/IP via Netstack is probably significantly slower than a simple TCP proxy would be.

- Required `flyctl` to run a background agent process to manage multiple connections through WireGuard.

- The agent process adds to the list of things that can go wrong (hopefully we're ironed most of them out now).

I can probably come up with more cons.

...

It's buried in the middle of the post but I want to say it again because I think it's important: this sort of started out as a stunt; it's what I put together to allow people to SSH into their instances without having to install WireGuard locally, and that's all it was. I don't have to write a soul-searching pro/con list on stunts I use to give people SSH access, because lots of providers have super janky "pop a private terminal" setups. But all this stuff took on greater importance when we used it to run Docker for our remote builders.

I like the approach we're taking a lot! I don't... regret it? I don't think? I think I'm happy with it. But it's complicated.


We recently moved our entire app deployment over to Fly and are mostly loving it, but one of the mildly janky features is hallpass. For instance, (1) connections often fail if you have X forwarding enabled (even if you did no specifial config on the machine), and (2) port forwarding doesn't work. While these aren't really a big deal since (1) you can just disable X forwarding in ssh_config and (2) port forwarding is unnecessary if you can tunnel in via Wireguard, it makes me wonder why a native SSH server isn't used with a small script to manage the required config changes.

While on the topic, we're also eagerly awaiting improved autoscaling (e.g., more responsive, using additional metrics, and scaling down properly). I'd be really curious if you could leverage the more detailed access to instance-level metrics to implement some cool new queue-theoretic modeling: You know roughly how long it takes for an app to launch, you know the current request rate, and you know the time to service requests. You could apply a lightweight Markov model to predict the probability of a given queueing delay in each region within the average launch time and, if so, preemptively launch a new instance before queueing delays even occur. This could be configured to balance a client's tolerance for queueing delays with over-provisioning budget.


As for autoscaling, our hands are tied as long as we're running on Nomad. Right now our autoscaler is nothing more than some ruby that loops over data from prometheus and changes counts in Nomad. It's slow and buggy, but worse we don't have control over where Nomad places VMs or which ones it stops when scaling down.

We're working on a replacement for Nomad (called flyd) that gives us full control over VMs. Once apps are running on that we can do a lot of cool things. Better autoscaling is one, but I'm really excited about suspending idle VMs that our proxy wakes up on demand. That'll cover most use cases without forcing customers to worry about counts or blowing through a budget.


I’d love to hear more about this move away from Nomad.

We haven’t had too good a time with nomad, but not sure if it’s just our limited understanding. It doesn’t help that there are very few people out there that know it.


We'll write about it when the time comes. To be fair, Nomad and Consul have served us well. Most of our troubles stem from abusing them in ways they weren't designed to handle.


That doesn’t surprise me. I think our biggest issue is not knowing it well enough and not being able to find people that know it.

At that point it doesn’t matter how good the tech is.


Hallpass is a truly trivial piece of code --- it might be less than 400 lines all in. All it really does is run certificate authentication off of a root cert we store in `_orgcert.internal` in DNS.

If you like, you can roll a Dockerfile that runs OpenSSH directly on your internal network address (bind it to `fly-local-6pn`), and then use native WireGuard to talk to it.

I've got a branch on hallpass that does port forwarding, but I never merge it, because you're right: using port forwarding on Fly.io is weird, because we already provide you direct access to any port you're exposing, and you can't talk to any of this stuff without WireGuard already. I think it would just confuse people more if I made port forwarding work.


Thanks for the pro-con list!

The experience of...

1) build a thing because it's immediately useful for a specific use-case, 2) someone reuses it for another use-case because it's already there and saves some work, 3) times passes 4) oops really important stuff now relies on this thing in ways that weren't originally intended

... seems like a common pattern (see: JWT succeeding as the format for interoperable tokens by dint of just being around).

In this case, it seems like the pros are basically user-centered pros (`flyctl proxy`, existing tool interop, etc.) and the cons are basically fly-centered cons (state synchronization, maintaining the agent and making it work right).

The cons that do affect users (slowness, maybe they can't speak 51820/udp) seem _annoying_ but not deal-breaking for a lot of use cases. If the slowness persists over a long time it will be interesting to see how users opt to route around it (architect applications / processes to not rely on this channel).


Question: ultimately all the packets are actually being sent via a Golang net.UDPConn right? ie you're simulating raw network packets by wrapping them in UDP packets, then running TCP in Golang over those wrapped packets?


That is effectively what we're doing, yeah.


Cool, thanks!


This is an answer to this question by sho_hn: https://news.ycombinator.com/item?id=30276877


How complete is the ssh implementation? I'm thinking I probably want to at least run git/hg push, and maybe even do port forwarding.


Not very. You can scp and rsync over it. You can run with or without a pty. That's pretty much it. It should work with git!

You probably shouldn't do port forwarding on Fly.io; if you're running into an actual need for that, we should talk about extending our network access control model.


Perhaps atypical, but about 50% of my ssh use is port forwarding to construct impoverished man's VPNs. Like I send mail by forwarding localhost:25 to localhost:25 on the mail server.

If I were running PoE (Postgres on Edge) I'd probably want to connect a local client for poking around, but without the bother of meshing my laptop into the cloud.


Most port forwarding you need to connect to Fly apps is baked in. Here's how to get at a remote postgres:

    $ flyctl proxy 15432:5432 -s -a fizz-db
    ? Select instance:  [Use arrows to move, type to filter]
    > gru.fizz-db.internal
      iad.fizz-db.internal
      lax.fizz-db.internal
      lhr.fizz-db.internal
      ord (fdaa:0:446b:a7b:20db:0:77a5:2)
      ord (fdaa:0:446b:a7b:20dc:0:784c:2)
      yyz.fizz-db.internal
That forwards whichever you select to local port 15432.


Awesome, this is a killer promo for Fly btw :D


Ah!


Thanks!


> being able to pop a shell on a running app was table-stakes for the platform.

Tangent: that's debatable IMO. In my company's current AWS infrastructure, there's no shell access to either the production containers or the host machines. I did write a script to create an ephemeral container that lets me (and future staff) run a shell inside the production network. And the thing I usually do in that shell is run psql; I suppose that's not ideal for auditability. But still, I can't poke around in the live containers or the host machines; in theory they could be distroless, with no shell at all. I'm trying to take immutable infrastructure to the max here; this seems like a good thing for security. It does mean that for debugging production problems, I can only go by what I find in logs and the database. But that has been acceptable so far.

Edit: Given tptacek's security background, I was surprised that he considered production shell access essential for a new app platform.


It's funny you bring this up, because I had the same thought when I was first implementing SSH (the initial implementation relied on native WireGuard, and just plugged a client certificate into your running SSH agent). I thought people might not want to enable SSH access, and so I made the provisioning of the root SSH certificate optional: you have to run `flyctl ssh establish` to tell us to set up a root cert for your organization.

It's turning out to have been a misfeature that confuses people more than it helps anyone, and we may get to a place soon where we just automatically provision a root cert for new organizations.


So do you think I'm being too extreme on this? Or did you just implement SSH because customers want it?


I think it's sensible to run an application fleet without SSH access, but it's tough for a hosting provider that has to support lots of different application fleets to not offer a way to get a shell. Our authorization systems are about to get sharply more interesting as we roll out Macaroon-style tokens this quarter, so I'm optimistic we'll get to a place where we make both styles of application owners happy.

I was more on your side when I wrote the feature, and I'm less on your side now. Also, I SSH into instances to debug things all the time. :)


Might be noob question (ahead of time, even), why/when would one prefer authz/acl with Macaroons over Zanzibar? All I could gather from summaries is that, for Cloud, Macaroons are better suited since 'decentralised'; whereas, Zanzibar appears to favour consistency with centralised architecture capable of expressing relationship graphs and permission inheritance.


I think Linode’s approach to this is best: they offer a “virtual console” that basically consists of an SSH gateway that pipes to your VPS’s virtual serial port.


It's useful when developing. Not being able to shell into the production system is fine if you can shell into a staging/development system.

If you are running Docker containers and you can shell into local containers, that is usually "close enough" that you can do useful troubleshooting. But fly.io (and CloudFlare workers, etc) are different enough from off-the-shelf containers that it is very important to be able to poke at containers when they break, even if they are not the actual production containers.


> In my company's current AWS infrastructure, there's no shell access to either the production containers or the host machines. I did write a script to create an ephemeral container that lets me (and future staff) run a shell inside the production network.

I'm a fan of this approach, it is hard improve upon the security of a server that doesn't exist.

I recently setup an AWS serverless (mainly ECS Fargate) stack for a project and took this approach of spinning up an ephemeral EC2 server as part of a "breakglass" runbook for the rare cases where such access is needed.

This, combined with Tailscale userspace networking [0] (so the "breakglass" servers can run on a private subnet and are never exposed to the internet), Pulumi [1] (for managing the lifecycle of the "breakglass" instances) and Yubikey based MFA for short lived credentials [2] (required to spin up the server via Pulumi), I found to work well.

This approach is also useful for ensuring that whenever a "breakglass" server is started it is using the latest AMI version (Pulumi's `aws.ec2.get_ami_output()` is useful for this) and runs the usual security updates on startup. The ssh keypair for the server can also be created (and later destroyed) on the fly so there is no need to manage any long term ssh credentials.

[0] https://tailscale.com/kb/1113/aws-lambda/

[1] https://www.pulumi.com/

[2] https://aws.amazon.com/blogs/security/enhance-programmatic-a...


> The Consul cluster would hold an Entmoot

Hah! That one got me. They know their audience :)


This one made me laugh out loud and I had to come to HN to post this. You beat me to it.


Help me out. This threw me for a loop and Google wasn't helpful.


Weird. For me, every Google result for [entmoot] is appropriate and basically gives the answer to your question I would have given. It's a Lord of the Rings reference.


Right - I understood that it was the walking tree from Lord of the Rings. But, what does it mean for Consul to hold a walking tree?

Is it something like: "Consul bears a heavy weight (has a lot of responsibility and so responds slowly)"?


A walking tree is an Ent. A meeting of Ents, with ponderous deliberation — ponderous on _Ent_ timescales, that is — is an Entmoot.


Ohhhh. Their clocks run slowly. A lot of time passes between events. Totally makes sense. Thanks!


I always love the style how fly.io's blog is written, and a big fan of their freemium product. I am see a group of enthusiastic hackers behind the product, and keeps improving it in a reasonable way, or a cool way that doesn't sound boring.


I like it too but I feel like they overdo it just a tad.


We have the same concern! That's the whole thrust of the post. :)


Keep it up. Even when you overdo a little it's still miles ahead of BS corp-speak


First off, you guys rule for doing this. I been dreaming about a general purpose userspace/unprivileged wireguard wrapper, this gets us closer to that.

1. Could you include more info on which userspace tcp/ip stack you use and why? I presume doing userspace UDP is relatively trivial/fast compared to what slirp had to do with tcp.

2. How does flyctl hijack syscalls across Linux and Windows? Is there some abstraction to do that? Wasn't even aware this was a pattern on Windows.

I realize I could read the code, but would appreciate some direction.

Kudos for the websocket proxy too! Would be really cool if this and the unprivileged wireguard became standard parts of wireguard toolset.


We use netstack. It's great. https://pkg.go.dev/inet.af/netstack

I could try to give you reasons we use it but the truth is that I mentioned something about wanting a user-mode TCP and Jason Donenfeld said netstack was there, and then got it to work.

We don't do any system call hijacking! Don't have to. Go abstracts Dialers and Listeners, and the WireGuard part itself is just a vanilla UDP protocol spoken over a PacketConn.


Not even 30 mins ago I set up WireGuard to connect to a PG instance on Fly.

I expected this to be a headache but it took less than 5 mins to download WG, generate the conf with the fly CLI and paste it into WG. Done.


You shouldn't have to do this unless you want to (for instance, to make a permanent WireGuard connection) --- you can just run `flyctl proxy` to set up a connection to 5432/tcp.


Really?

I followed the instructions in their docs and that's what was recommended.

https://fly.io/docs/reference/postgres/#connecting-to-postgr...

Also, wouldn't I need to install Wireguard anyway to use fly proxy?


Nope! That's what this whole post is about. :)


As I read this, the tone and some of the topics started to ring a bell. Fly? Wireguard? I scrolled back up to the top to see who the author was and sure enough it was Thomas from the Security, Cryptography, Whatever podcast that I've been listening to for the last few months. For anyone that enjoyed this kind content you should also definitely check out the podcast he is a part of, the other hosts are great as well.


Thanks for the pointer to that podcast.

https://securitycryptographywhatever.buzzsprout.com/


How do they manage their mesh?

I've just been doing research on setting up my own wireguard mesh (currently using a spoke/hub setup with pi-hole/pivpn).

I found https://github.com/HarvsG/WireGuardMeshes today which is awesome, but I'm curious what fly.io / other readers here may be using.


I usually build my own solutions, but I've played with Netmaker and it seems solid.

https://github.com/gravitl/netmaker


Netmaker is SSPLd. Careful using it with anything at all (or, casually recommending it!). If your personal project connects to your proprietary server, then you're in a tough spot in terms of license compliance.


I thought wireguard-go requires the kernel tun device. See for example this Docker image for wireguard-go which maps the tun device and requires elevated permissions - https://github.com/masipcat/wireguard-go-docker


TFA is surprisingly unapologetic about the Rube Goldberg SSH solution.

On the other hand fly.io looks really interesting, I want to try it out. The infrastructure described (other than the SSH hack) feels like how a modern cloud platform should be built.


Incredible blog post. Thank you very much for sharing. Signing up now.

Also, this post makes me update my prior on top HN contributors being unproductive (i.e. that they spend their time on this board all the time instead of working).


is it possible to force fly.io to use a particular region ? India has these strict data residency laws for financial/healthcare companies (and I know that singapore has them too).

So we need assured guarantees that data is not traversing outside of the region. most cloud vendors have specific data residency compliance for India - https://aws.amazon.com/compliance/india-data-protection/


You control which region your app and it’s volumes are in. Metrics, logs, and volume snapshots end up on servers in the USA. We haven’t addressed data residency for those platform services yet, but we might someday if there’s enough interest.


I really recommend tackling this.

Many EU companies are increasingly concerned about data location, with some going into full panic mode.

> Metrics, logs, and volume snapshots end up on servers in the USA

This will unfortunately prevent me from using/recommending Fly for EU customers at the moment. It should also be pretty prominently stated in the docs.


WireGuard is just a transport protocol, so of course you could use it in place of SSL/TLS if you wanted. Interesting though, and I prefer it to SSL/TLS because X509 certs suck.


WireGuard isn't really the interesting bit here, it's running TCP/IP over it in userland. You cannot straightforwardly do that with SSL/TLS, but it is in fact the API that WireGuard provides.


This is where the article lost me a little bit. I (think I) technically got the part of running a TCP/IP stack in an unprivileged user process, so you don't have to elevate privilege for adding a network interface and using the host OS TCP/IP stack. And maybe that's already very cool. But:

- What other benefits does it give you?

- This isn't a new problem and presumably has prior best practices for mitigation. What is this replacing, what was the landscape like before this? What was the most similar already?

- Have people been looking for a better solution like this for some time?

- What's the over/under on the maintenance cost? You added in another TCP/IP stack to look after. You maybe save on static configuration and can make your system more dynamic. Pros, cons ... let's list them.

The talks a bit about problems they were trying to address, but not in a way that clearly answered the above for me. It's of course valid to write a piece with a more informed audience in mind, but in something that aims to spread the virtues of an idea I think it could do more.


This is a good question and part of the reason you didn't get a clear answer from the article is that I'm not sure if I have a clear answer.

What I think user-mode TCP/IP gives us is the ability to build arbitrary services --- Postgres, Redis, SSH, network management, whatever --- without having to make infrastructure changes. We don't have to have some weird API or application proxy that knows what's running and who's allowed to run what. Instead, that's simply baked into the network, and flyctl, by dint of netstack, can just use it. If somebody comes up with a cool network service to plug flyctl (or any other tool someone wants to write) into, it will just work.

But things like the maintenance cost, well, yeah, that's most of what the post is about. The maintenance cost was not especially low.†

There's a natural inclination to read any post like this as a kind of brag, but I'm really just experimenting with trying to show the good with the bad here. User-mode TCP/IP is a weird choice! Nobody else I know of does it! It might have been the wrong choice! Even though I love it!

It's actually not low even right now; I'm spending the first half of the day deploying code that relays stats from Netlink on our gateways through our GraphQL API, so that flyctl can check WireGuard gateway health. That is not a thing we would be spending time on if we had just written an explicit proxy for Postgres or whatever, rather than providing a generic network transport.


That's cool, and I appreciate you sharing these experiments.

I'm in automotive/embedded at the moment, and our daily battle is making decisions on how static vs. dynamic we want our system to be - static (e.g. resource allocations or baked-in scheduling decisions) makes it easier to reason about the system and provide guarantees, but generally lowers efficiency at runtime. Dynamic can make the system much better at serving a wide range of usage scenarios, but makes it harder to eliminate the risk of pathological cases. It's hard not to see things through that lens. The way to construct and run services you've described here to me is an interesting option on that type of axis, in the sense of where the costs/friction goes.


I wish I'd thought of writing a simple pro/con list when I wrote this post. I'll think about that!


> If somebody comes up with a cool network service to plug flyctl (or any other tool someone wants to write) into, it will just work.

Sure, just how many distinct WireGuard configurations would the gateways be comfortable with per-Fly app? 1K+? 100K+? 1M+?


The answer by tptacek with pros/cons is here: https://news.ycombinator.com/item?id=30277278


Thanks to both of you!


The problem isn't new but the previous best practices involved giving the tool super user. Usually through an installation process. See most other VPNs. UserMode TCP/IP stacks aren't very common in practice. This is why what fly.io did is interesting.


Or network namespaces? User-space tcp might also make it easier to do tcp checkpoint restore and container/app migration. Interesting write-up.

I keep thinking of the nightmare of keeping up with the world of Internet middleboxens, broken net layer implementations and icmp hacks that the Linux kernel supports and makes 'just work'. The jump to usermode tcp seems interesting if you're not worried about that (and I've been watching the formal-proven ip/tcp stack space like a hawk for years), but I've been burned so many times with non standard stacks and 'oh you need to connect to that non-updated lynxos system and huh' or 'hah could you enable ecn or this obscure tcp option because... Legacy?'... And sometimes I need tc/netem and netlink and I don't know...


We embed LwIP in an Android app, and it doesn't even implement most of TCP/IP features let alone handle most of the protocol's quirks. But, it mostly works as expected, because of Postel's law on the other side (for ex, https://apenwarr.ca/log/20090222)


It is not just that. WireGuard is TCP tunneled through UDP. A TCP connection tunneled through another layer of TCP will suck badly performance wise. The sliding window algorithm in both layers will fight each other.


Replacing SSL/TLS with wireguard is cool but aren't you just going to run into the same headaches of rotating certificates/keys? No one is really going to rely on using the same wireguard key indefinitely, right?


It's pretty easy for us to rotate keys now, since new WireGuard peers are extremely cheap to bring up (part of the point of the post is that for most of the last year, that was the opposite of the case, and a new peer was a very painful thing to ask for). But rotating WireGuard keys with Fly.io makes about as much sense as rotating the OAuth2 API token `flyctl` uses (the token is strictly more powerful than the WireGuard key), and people generally don't do that.


Please write a bit about the secrets-infrastructure at fly.io! The cert store, the token store, the trade-offs, the protections around it (though, I'm sure we will judge you for it, especially if it isn't "secure enough" for any made up definition of "secure").


> It looks like all of Fly.io isn’t working, when in fact the only part of Fly.io that isn’t working is the part that allows you to use it.

What's wrong with that? ;)


i had something similar in mind but using libp2p[0]. use as a universal mesh network as library but without central control server, better p2p/NAT traversal, no need to mess with keys*.

[0]: https://github.com/libp2p/


Man, where are all those virtualization fanboys who said usermode linux was a dead end? :)


Well gVisor uses mostly the same method of system call emulation (PTRACE_SYSEMU).

It's also one of the three major projects that use it besides User-mode Linux and rr.


Been using tunsafe for years. It's solid wireguard-compatible, works on user-space, and had extra features like TCP handshake+UDP data, https obfuscation, etc.

Author is @strigeus of uTorrent/Spotify fame.


It's a little surreal to see all this cheering and adoration of fly.io, but they can't even keep their blog webserver up. It's not a great look for a cloud-related company.


Our blog webserver is fine. What kind of error did you see?


Times out, rarely gets the page out.


Does debug.fly.dev work? Can you run `curl -v https://fly.io/blog/ -sS -o /dev/null -D-` and post what you see?


debug times out as well. curl output:

   flake: curl -v https://fly.io/blog/ -sS -o /dev/null -D-
   *   Trying 2a09:8280:1::a:791...
   * TCP_NODELAY set
   * Connected to fly.io (2a09:8280:1::a:791) port 443 (#0)
   * ALPN, offering h2
   * ALPN, offering http/1.1
   * successfully set certificate verify locations:
   *   CAfile: /etc/ssl/certs/ca-certificates.crt
     CApath: /etc/ssl/certs
   } [5 bytes data]
   * TLSv1.3 (OUT), TLS handshake, Client hello (1):
   } [512 bytes data]
   * OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to fly.io:443 
   * stopped the pause stream!
   * Closing connection 0
   curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to fly.io:443
Seems to work when forced to IPv4 though.


Ok last thing, can you try `curl https://debug.fly.dev --ipv4` and tell me what the `Fly-Region` header says?

Thank you for helping with this!


Fly-Region: fra


Different poster here but just curious: Are you a Deutsche Telekom user, by chance?


Nope. Still only working very sporadically for me.


Can someone explain to me why wireguard is implemented as a kernel module? Yes I get it, more performance. But isn't it completely and absolutely insane to run a complicated piece of software that is open to outside connections with kernel privileges?


The WireGuard protocol is deliberately designed to be straightforward to run in the kernel. In steady state, it doesn't even require dynamic memory allocation. It uses timers in lieu of extra statekeeping. It has a simplified networking model ("cryptokey routing") that defers to the host TCP/IP stack a bunch of stuff that other VPN protocols take upon themselves to build. It has just one keying mechanism and an API to build more interesting authentication features (like SSO integration) on top of it, rather than having it invade the core design.

It helps that it was designed and implemented by a kernel exploit author.


Running complex software open to outside connections in the kernel is pretty standard - the TCP/IP stack is in the kernel too!


Performance.


It also helps with availability. If you got a recent kernel, it‘s already there.


Isn't most networking in the kernel? It's pretty complicated already.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: