Hacker News new | past | comments | ask | show | jobs | submit login
Would we still create Nebula today? (defined.net)
121 points by carride on Oct 13, 2023 | hide | past | favorite | 39 comments



In the self-hosted space, I've been really enjoying playing around with decentralized encrypted overlay mesh networks like Nebula. Here's the current list of my faves (all Wireguard based).

Open-source projects not-quite-prod-ready:

- WebMesh: Golang, decentralized nodes https://github.com/webmeshproj

- InnerNet: Rust, with subnet ACLs https://github.com/tonarino/innernet

- Wesher: Golang, simple mesh with pre-shared key https://github.com/costela/wesher

- Wiresmith: Rust, auto-configs clients into a mesh https://github.com/svenstaro/wiresmith

Open source projects with company-backed SaaS offerings:

- Netbird: Golang, full-fledged solution (desktop clients, DNS, SSO, STUN/TURN, etc) https://github.com/netbirdio/netbird

- Netmaker: Golang, full-fledge solution https://github.com/gravitl/netmaker

Honorable mention:

- SuperHighway84 - more of a Usenet-inspired darknet, but I love the concept + the author's personal website: https://github.com/mrusme/superhighway84 https://xn--gckvb8fzb.com/superhighway84


No love for tinc[1]?

It's the granddaddy of mesh networking, long before Wireguard, and while it's not quite zeroconf, it's very simple to setup and maintain. It also runs on everything.

[1]: https://tinc-vpn.org/


> No love for tinc[1]?

I actually looked into tinc, and really wanted to love it because of its simplicity.

Unfortunately though, it seems that the development scene around it has been stalled for several years now, and benchmark reports show that it can't keep up with cloud speed tests when compared with Wireguard:

- Wireguard: 390.3 Mbps

- Netmarker: 369.3 Mbps

- Tailscale: 62.5 Mbps

- ZeroTier: 56.8 Mbps

- Nebula: 38.4 Mbps

- Tinc: 34.7 Mbps

- OpenVPN: 22.3 Mbps

Source: [disclaimer, article written by Netmaker CEO, so implicit bias]

(https://medium.com/netmaker/battle-of-the-vpns-which-one-is-...)


Hhmm I don't remember it being that slow, but then again, I didn't need much throughput for my use cases. Frankly, I'm surprised that Zerotier and Tailscale are that slow as well, so I'm not sure how trustworthy that article is.

I found this other benchmark[1] that places them much closer to native WG performance, with Netmaker still in the lead, and somehow faster than a direct connection. This is probably due more to Netmaker using the in-kernel WG, instead of the userspace implementation. I'm kind of curious to give it a try, TBH. :)

Hell, even OpenVPN is _much_ faster than that[2,3].

Also, yeah, tinc development has been slow for years now, but I didn't experience any issues with the prerelease versions. It's certainly behind the times now, but I also enjoyed how easy it was to configure and use.

[1]: https://techoverflow.net/2022/08/19/iperf-benchmark-of-zerot...

[2]: https://www.zerotier.com/blog/benchmarking-zerotier-vs-openv...

[3]: https://www.wireguard.com/performance/


Tailscale is using GSO/GRO now. They blogged about reaching the 10Gbit/s milestone.

https://tailscale.com/blog/more-throughput/


Replying to my own comment as I can no longer edit it:

The folks over at the Nebula team had an interesting discussion regarding the the "battle of the VPNs" article published by Netmaker I sourced in my parent comment:

https://github.com/slackhq/nebula/discussions/911


Yeah it's still my go-to because it's fully self hosted. Nebula can be too but it uses certs that can expire whereas tinc just uses keys.

And really I've been using tinc for almost a decade and I didn't really see the benefit of changing. It's rock-solid.

With the exception of one thing: I use some central nodes on cloud VPSes and they can access everything. As far as I know a nebula lighthouse can't access any of the clients. So I've been meaning to give nebula another try.

But zerotier and tailscale aren't options for me because they rely on their cloud infrastructure. I only want stuff that's fully self-hosted.

There's a great tinc android client these days too.


> It's rock-solid.

Unfortunately, I cannot confirm. Sharing my experience:

I used tinc over multiple years on production servers and it would sometimes create netsplits that did not recover. I also suspect that there's a race or bug in re-keying, which also causes disconnects.

On the netsplit issue, it was me posting alone on the relevant issue [1] over multiple years without response. (I don't expect to get any from free-time maintainers, especially on hard-to-reproduce issues, but it's still important to know that such unsolvable hurdles exist.)

When I switched to Nebula, it improved this situation. But both Nebula and tinc max out at around 1 Gbit/s on my Hetzner servers, thus not using most of my 10 Gbit/s connectivity. This is because they cap out at 100% of 1 CPU. The Nebula issue about that was closed due to "inactivity" [2].

I also observed that when Nebula operates at 100% CPU usage, you get lots of package loss. This causes software that expects reasonable timings on ~0.2ms links to fail (e.g. consensus software like Consul, or Ceph). This in turn led to flakiness / intermittent outages.

I had to resolve to move the big data pushing softwares like Ceph outside of the VPN to get 10 Gbit/s speed for those, and to avoid downtimes due to the packet loss.

Such software like Ceph has its own encryption, but I don't trust it, and that mistrust was recently proven right again [3].

So I'm currently looking to move the Ceph into WireGuard.

Summary: For small-data use, tinc and Nebula are fine, but if you start to push real data, they break.

[1]: https://github.com/gsliepen/tinc/issues/218

[2]: https://github.com/slackhq/nebula/issues/637

[3]: https://github.com/google/security-research/security/advisor...


That's good to know, thanks.

My tinc usage was far less demanding. Just a handful of nodes and light traffic, so I didn't experience any issues.

I did migrate to WireGuard about a year ago, and instead of a mesh network, I ended up with a hub-and-spoke configuration[1] which worked fine for my humble needs.

[1]: https://www.procustodibus.com/blog/2020/11/wireguard-hub-and...


Hey there - sorry for closing your issue! I saw Nate had recommended some commands to run and didn't see a response 8 months later, so I assumed you had moved on.

I've reopened the issue so you can continue to document and look for solutions. :)


I'll throw in INET256 https://github.com/inet256

It's a specification for identity based networking. There is a meshnet and a centralized implementation. You can layer IPv6, IPv4, or application traffic on top of any compatible implementation.


There's also Yggdrasil [0], an ipv6 based completely decentralised network. You can join the public network or restrict access to known keys for a private network.

[0] https://yggdrasil-network.github.io/


filling a slightly different niche, there's also yggdrasil, which can be used to create private overlay network taking advantage of public relay nodes.


Nebula is such a great tool. If you haven't tried it yet, you should really give it a shot. It's easy to self host and to set up, and has been absolutely rock solid. I have it on all my devices, plus several Raspberry Pis set up at unattended remote sites that I rarely have access to serving as gateways to internal LANs and they all just work, all the time.

Tailscale gets most of the attention on HN, and I'm sure that it's a wonderful product too, but Nebula is a nice, simple, "do one thing well" product.


I've tried Defined recently and it did the job just fine, but the thing about Tailscale (and others in the space like ZeroTier) that put it ahead of something like Defined/Nebula for me is that I don't have to run my own servers/infrastructure for lighthouses and relays. I understand that everyone has their own preferences and some might want to host this themselves for privacy reasons and whatnot, but for me as a single end user I'm glad Tailscale just handles all the infra for me.


That's totally reasonable, and I agree that using something hosted entirely by a 3rd party makes sense for some use cases. Our reason goes a bit beyond security concerns, in this case. We built Nebula for large scale deployments, and because of that, we have made decisions that lean into that model for hosting.

Our decision to leave lighthouse hosting in the hands of users has one primary rationale: We want users to have complete control their network availability. Any downtime of our service should not impact their network availability. You can even host some of your lighthouses inside of network boundaries to ensure that an internal network functions properly if its connection to the internet is interrupted. Other overlay options may continue to work for some time, but new connections are often not possible, and the network can degrade rapidly.

Relays are are a similar story, but with an additional reason: We don't have to limit our customers' relay bandwidth due to cost. When hosting relays on behalf of others, we would be transiting a lot of traffic, which has an associated (sometimes unpredictable) cost. By letting our customers host relays, they can ensure relay traffic is just as fast as direclt connections.


This is indeed understandable. For me, the fact that I can run my own lighthouses and depend on / trust no third party is the big plus. Both cases (third-party and self-hosted) are in demand.


I don't understand why I should use something like Nebula or Tailscale. I've got a Wireguard setup where all of my users connect into a VPN. What does Nebula give me?


Probably not much? I tried to set up Wireguard to my home network before traveling for the holidays last year. I was too dumb to do it, or at least in a way that didn't feel fragile. I had Tailscale up with subnet routing set up the way I wanted in 15 minutes.


I think mesh networks start to become more valuable when you want/need various users or sites to be able to talk directly to each other with significant bandwidth, or when latency might be constraining, and start to scale up. To take my own examples, I've got my main Site A which has a fixed IPv4, then I have a Site B which is the primary offsite but still "local" (<1hr drive) backup location with a few hundred terabytes of redundant spinning rust and no fixed IP, and then small sites C through M that also have their own local NAS. All are within a few hundred miles of each other. Everything is managed from A, so WireGuard would be fine for handling all remote access, even if I'm in another country since there isn't really a significant extra latency hit from bouncing through A. But I want C-M to backup to B as well, and in a hub/spoke that would mean all that traffic must also go through A. Obviously if I rented a fixed IP for B, I could then set B up as a new hub and add new VPNs and rules for C-M for B as well as A. But that's already a certain amount of extra work or setting up automation, and if down the road I had a B2 or B3 that'd multiply. Even automation though won't solve the problem of latency if things spread out further, if A was a thousand miles away from some new site, now bouncing through A starts to add a potentially really noticeable latency hit for no reason.

Having a mesh means only the lighthouses need fixed IPs, and then everything can talk directly with no extra work. That is significantly more scalable, adding a new client requires nothing from any of the other clients. And it cuts out unnecessary legs in the trip and the traffic and latency that involves.

Wireguard is awesome and it sounds like you probably don't need to bother in your use case, but if you start providing distributed services not funneling through hubs starts to be worth considering. Particularly if you start caring about one service site going down affecting others.


Whether you shape it as a mesh, hub-and-spoke, or whatever is orthogonal; Wireguard can do that too.


We have a section for overlay networks on the tunneling list[0] I maintain. This is a very interesting space with some excellent software.

I certainly have my gripes about the closed nature of Slack itself, in particular using a closed protocol when the model is clearly "federated" between multiple servers internally. That said, the contribution of something on the scale and quality of Nebula back to the open source community is hard to argue with.

[0]: https://github.com/anderspitman/awesome-tunneling#overlay-ne...


I am using Defined.net to manage my nebula deployment in my datacenter rack and it has made operationalizing an overlay network a breeze. It's like having my own basic private VPC with security groups (roles) without a cloud provider.

They added in tag support [1] a few months ago which I have yet to try out but it looks very promising. The defined.net API [2] is very easy to use for host management and I am able to auto enroll new hosts and remove them after I deprovision them.

I also made a GitHub Action [3] which I use to allow for my Actions to communicate with resources on my overlay network.

[1] https://docs.defined.net/guides/creating-firewalls-using-rol...

[2] https://docs.defined.net/api/host-create/

[3] https://github.com/quickvm/action-dnclient


(*blog post author here)

Thanks for sharing this on HN! I'll keep an eye on the comments and try to answer questions that come up.


Love Nebula, keep up the good work!

Do you thing the tech landscape today would have allowed for nebula to be born? Lots of companies now have strict IP agreements they have team members sign.


That's a great question! One of the things I enjoyed during my time at Slack was their willingness to contribute to open source projects. We had similar IP clauses, but asking permission to open source things was straightforward.

The most important concern (IMO), was considering whether we could commit to properly maintaining a project. Before open sourcing anything, you need to discuss how you'll go about managing an issue and pull request backlog, so that people don't come across "dead" projects under your stewardship.

In a high growth startup, I do think something like this could happen again, but as a company grows, there are certainly more layers that can make it difficult to share things openly.


Seeing as Slack was born as a tool inside Glitch which existed only because of a side project called Flickr…

I don’t really think it’s the size or layers of a company that prevent; it’s the culture. This culture of creation permeates everything I’ve seen Stewart Butterfield do. At least from the outside. Admirable and extremely profitable.


Just wanted to chime in along the others that I also love Nebula, and I'm really grateful to have a mesh option that is modern, truly decentralized and self hostable. Nebula is also just an plain elegant IMO, one of those pieces of software that clicks right away top to bottom. Now I just hope it gains momentum to make it into a wider variety of tools so it becomes ever more accessible. So again, thank you and everyone else who had a hand in it!


Hey Ryan! Love Nebula, miss you at the day job


Thanks Harrison, hope you're well!


I really like a lot of Tailscale, but I just finished implementing it for my company using headscale (I couldn't get the funding to buy from Tailscale). This is across ~200 machines.

I'll be honest: If I could do it again, I'd use Nebula. The primary issues I have are that Tailscale has a lot of magic which I can see some cases it being nice, but it does make some of the routing and firewalling I'm doing on machines, and in particular the thing where it sets up Tailscale routes to network routes as higher priority than local interfaces leads to problems in my environment.

The other thing is just Headscale itself, it works quite well but does have some rough edges. It's entirely too easy to kill your whole mesh by flubbing an ACL, and currently restarting headscale to pick up ACL changes is taking 3-5 minutes.

I do, however, really prefer the Tailscale ACLs over Nebula's.

One thing that led me to Tailscale was the ability for it to relay around network routing problems, and it looks like Nebula has added that since I started. Around the time I was evaluating Nebula vs. Tailscale we had a ~1 day network routing issue where some of my users were blackhole routed in Comcast, and Tailscale just worked around it.


Hey there - I'm an employee at defined.net and was wondering if you'd be open to answering some questions about your experience evaluating Nebula and related products? If so, please shoot me an email at john@defined.net. Thanks!


Big fan of Nebula, especially Defined, which makes it real easy to setup/maintain


very interesting soft sell. they don't name any competitors, or specifically compare the alternate approaches taken by them, which is IMHO not the greatest SEO but what do I know. maybe they do that elsewhere on the site.


It may be a surprise that not all posts are advermarketingpr, although it does feel like that some days.


Aside from defined.net, what are the best frontend/management tools for nebula? Last I looked it was all manual config (which is fine for most of us, but limits adoption elsewhere)


I'm not a aware of any other full-fledged solution. There is some opinionated tooling written in Python [0] and Go but nothing coming close to defined.net. This is definitely a point where other overlay solutions look way better.

After having searched (and implemented) this myself for work, the only practical solutions I found were 1) smallstep [1] or 2) Terraform (with the nebula provider [2]) and a CM tool of your choice. The latter can be nicely combined with the ansible provider if that's your CM of choice.

0: nebula-cert-py 1: https://smallstep.com/docs/step-ca/integrations/#nebula 2: https://registry.terraform.io/providers/TelkomIndonesia/nebu...


And some kind of cert renewal tool I guess.


Can Nebula work with VPN exit nodes (similar to tailscale + mullvad)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: