Hacker News new | past | comments | ask | show | jobs | submit login
The Architecture of Open Source Applications: Nginx (2012) (aosabook.org)
119 points by kristianpaul on Nov 2, 2021 | hide | past | favorite | 28 comments



I used to use Nginx for a long while, but for the past several years I’ve been using Caddy instead and it’s been serving me well. Caddy, like Nginx, is open source.

What originally made me switch is how dead simple it was to set up Caddy to use Let’s Encrypt with renewal via DNS records using the Caddy Cloudflare plugin. It’s basically just download Caddy with the plugin (or build from source if you prefer), generate token for domain in Cloudflare, paste token into Caddy config file and never worry about it again.

The reason that DNS based Let’s Encrypt renewal is superior is that you can even use it on servers that are not serving the site to the Internet. And aside from that just how simple, reliable and free of worry the setup is with Caddy and the Caddy Cloudflare plugin.


Frustratingly, the problem with automated DNS-based is that it doesn't work with all domain registrars, which is a showstopper when you are not allowed to change a customer's existing registrar nor move its NS records.


One workaround can be picking up a cheap domain and CNAMEing _acme-challenge.unsupported-provider.com to _acme-challenge.supported-provider-cheap-domain.com. The rest of the records can be left alone.

This is listed on LetsEncrypt as a "delegate" subdomain [1] and on an EFF article as a "throwaway" domain [2]. Some clients just call it "CNAME support" [3].

All the different names muddle search results. I've used a Reddit guide [4] for Cloudflare + goacme/lego.

[1] https://letsencrypt.org/docs/challenge-types/#dns-01-challen...

[2] https://www.eff.org/deeplinks/2018/02/technical-deep-dive-se...

[3] https://go-acme.github.io/lego/dns/#experimental-features

[4] https://old.reddit.com/r/selfhosted/comments/je2041/how_do_y...


> automated DNS-based is that it doesn't work with all domain registrars

https://acme.sh automates dns01 challenges with support for over 100+ registrars: https://github.com/acmesh-official/acme.sh/wiki/dnsapi.

Another good thing is that one can move their registrations and change their nameservers, at will.


Its one of those problems blockchain is actually a good solution for but theres no money you can tokenize to sell to others on this idea.


For my self hosted setup I've shifted to Caddy and not bother about setting up crons etc for SSL. It's just so much nicer to have the Reverse Proxy handle that for you automagically.

And the config file is just 2 lines to reverse proxy instead of 20+ equivalent in NGINX.


Nginx is making the same mistakes that dethroned Apache. No one wants to fiddle with a web server. Default to $currentIndustryStandard, make it work, forward to my application.


The Let's Encrypt Certbot does this automatically for nginx and apache as well, just with a bit of extra config: https://certbot-dns-cloudflare.readthedocs.io/en/stable/


> nginx (pronounced "engine x") is a free open source web server

As a relatively young dev, the idea of a "web server" as a standalone binary that serves your application (vs a library that you use to write your own "server") feels strange.

Of course nginx is more than that- in my limited experience I mostly see it used as a static file server and/or reverse-proxy and/or load balancer. But I believe (someone correct me if I'm wrong) that the classic Apache-style paradigm was to have "the server" be generic and explicitly support certain application languages, which it knows how to load and interpret for handling requests.


>the classic Apache-style paradigm was to have "the server" be generic and explicitly support certain application languages

The first wave was cgi-bin based fork() and exec() a process.

Followed by what you're describing...things like mod_perl and mod_php, where the app code was running inside the apache process.

Then, later, a separate daemon that kept some pool of processes running for apache to open up connections to, like fastcgi.


Timeline-wise fastcgi was a thing at the same time that in-process interpreters were popular. I remember implementing a dumb polling feature as fastcgi in 1998 or thereabouts.

(We were also running the Netscape web server, because we depended on a plugin for it to insert inventory from our NetGravity ad server. And boy now I feel old.)


They did overlap in existence, but at least in my world, nsapi/fastcgi connections to a pooled app server wasn't as popular until later. I think because in the beginning, both nsapi and fastcgi rolled out with commercial, non-free web servers. There wasn't a free fastcgi capability for Apache until 1999 or so, where mod_perl was around since 1996, I think.


I wonder- was this the de-facto pattern because at the very beginning a "web server" was just a static file server? So these hooks for handling requests dynamically were just an extension on top of that?


For everyone other than people on .gov, .mil and .edu campus networks, the internet was dog slow anyway, so fork() and exec() wasn't a problem. Given the low traffic, it was probably also better for overall memory utilization. Memory was more precious than CPU cycles, and fairly limited if you didn't want a lot of swapping.

At the time, most stuff worked like that from inetd. It would listen on lots of ports and fork off the right program for the incoming connection, then that process would exit when it was done. Long running daemons and lots of processes weren't as common.

The common pattern prior to that was people logging in, reading already synced newsgroups, editing stuff, compiling, logging off, etc. Transient processes that went away.


> The first wave

Whoa. This is what made me feel old for the first time, ha!


Why is it strange? Would you rather be concerning yourself with SSL termination and handling files (efficiently thru the kernel) instead of application logic?


You misunderstand. The norm today is to have a library/framework which handles those things, instead of a separate binary running on the system which your code talks to. It's roughly the same level of abstraction, just a different programming model.

I guess in some cases like Node and Python(?) you're technically talking to an underlying server written in native code which ends up working kind of the same way (though I'm not sure if it lives in a separate process). But still, the programming model exposed to the application programmer is "Download Express/Flask, define some routes, then call a function to tell the server to start up and listen on some port". I'm just musing on the mindset shift that's happened.


I’d be wary as a “young dev” to claim he misunderstands.

It will become very obvious to you later why Apache and Nginx exist.

A very very small example is:

So you want multiple “apps” on the same server? How ya gonna bind to the same port, 443? How are you going to play nice with other apps? How are you going to keep up with security loopholes? Does your app and every other app need access to the private key?

There hasn’t been a mindset shift. You simply haven’t experienced the whole set.

Even super large apps such as Atlassian’s suite rely largely on a reverse proxy being in place of you don’t want to tell your users to use some random port.

These web servers are fast and powerful.

How do you think load balancing is accomplished?


> I’d be wary as a “young dev” to claim he misunderstands.

I only meant they misunderstand what I myself am saying. There's no need to get combative.

> Even super large apps such as Atlassian’s suite rely largely on a reverse proxy

> How do you think load balancing is accomplished?

If you read my original comment, I specifically carved out these cases as not being what I'm talking about:

> Of course nginx is more than that- in my limited experience I mostly see it used as a static file server and/or reverse-proxy and/or load balancer.


The conceptual difference is that Flask etc are task oriented and Nginx is technology oriented. Specifically it's a kind of expanded instantiation of the entire HTTP request cycle with hooks that allow some of the sub-events to be customised.

This sounds like a good thing, but is actually a problem in practice.

Nginx is one of the few pieces of software that has literally enraged me. The configuration process is so obtuse and full of random gotchas where it does what it wants for no obvious reason - with minimal docs - that it's very hard to get it working the way you want to.

One obvious example is that index can trigger an internal rewrite. So you set up a location and Nginx will find your path and index correctly - and then it throws it away and decides to use a different location. Which most likely gets you a 404.

You can stop this if you know the magic words, and no doubt there is a Very Good Reason for it.

But intuitive it isn't.


I think about it as, there's a bunch of code you want to run to serve a webpage: Business logic, SSL termination, caching, lets-encrypt, load balancing, etc etc. Separately from the question of what you want your web server to do is the question of how you cut up the cake of all that stuff. It makes the most sense to write some of this code (eg caching) in highly optimized native language. And it makes the most sense to write your business logic in whatever language your team is most productive. (Eg python or ruby or whatever).

The question of process architecture (how you cut the cake) has a ton of options! For example, a nodejs process can:

- Run other javascript code by including it via npm. (Directly or in a worker thread)

- Run C code by compiling it to asmjs and including it like any other javascript code

- Run C / Rust / Go code by compiling it to wasm and including it via the Webassembly constructor

- Run C code by compiling it to a napi module, and linking to it via nodejs's native modules API

- Run code in any language in a child process (via spawn / exec) and communicate to it via stdin / stdout pipes

- Run code in a sibling process and communicate via named pipes, IPC or via network requests

There are engineering tradeoffs with all of these options. For example, wasm works in the browser, and isn't CPU architecture specific. But wasm tooling is language specific. And it runs about 3x slower than native modules (which don't work in the browser). Child processes run at native speed and work with any language, but "calling" into the child process is much more complicated than calling a wasm / napi function. Errors behave differently too - an uncaught exception in an npm module will often crash your whole nodejs process. But a segfault in a sibling process will just crash that process. (And its not obvious which one is better!). If you want to, you can include the logic for SSL termination, caching, etc in the same programming language as everything else you're doing - which means writing your business logic in C, or your caching logic in javascript. This has runtime performance vs productivity implications. Rust is great for this - but link times get pretty bad.

In short, there isn't a "right" answer to any of this. Basically all of these options are viable ways to divide up your program. When I was getting started, apache + CGI was the common approach. (Your logic ran as a child process of the web server). Now, application servers have complete HTTP stacks and its common to do interop between your app server and nginx via HTTP. It works pretty well, but there's nothing essential or obvious about that choice either.


It feels like there's a missing abstraction here: there are many ways to divide up a program, but it should be possible to separate the description of the data format used to communicate over the divide separately from the transport that ships the data over the divide.


> As a relatively young dev, the idea of a "web server" as a standalone binary that serves your application (vs a library that you use to write your own "server") feels strange.

In my eyes, the ideal setup is one that's layered: where you have an ingress that's basically a load balancer that also ensures that you have SSL/TLS certificates, enforces rate limits, perhaps is used for some very basic logging, or can optionally do any URL rewriting that you need. Personally, i think that Caddy (https://caddyserver.com/) is lovely for this, whereas some people prefer something like Traefik (https://traefik.io/), though the older software packages like Nginx (https://nginx.org/en/) or even Apache (https://www.apache.org/) are good too, as long as the pattern itself is in place.

Then, you may additionally have any sorts of middleware that you need, such as a service mesh for service discovery, or providing internal SSL/TLS - personally Docker Swarm (https://docs.docker.com/engine/swarm/) overlay networks have always been enough for me in this regard, though some people enjoy other solutions, such as Hashicorp Consul (https://www.consul.io/), or maybe something intended for Kubernetes or other platforms that you already may be using, like Linkerd (https://linkerd.io/).

Finally, you have your actual application with its server. Personally, i think that the web server should be embedded (for example, embedded Tomcat with Spring Boot) or indeed just be a library that's a part of the application executable, as long as you can update it easily enough by rebuilding the application - containers are good for this, but aren't strictly necessary, since sometimes other forms of automation and packaging are also enough.

The reason why i believe this, is because i've seen plenty of deployments where that just isn't the case:

  - attempts to store certificates within the application, each application server having different requirements for the formats to be used, making management (and automation) of renewal a total nightmare
  - attempts to use external application servers, like a standalone install of Tomcat/TomEE/GlassFish, which are always a pain to manage and integrate with the application, often leading to inconsistent configurations, especially if something like Ansible isn't used
  - no ability to easily do URL rewriting, whenever you realize that you need to do something a bit differently, or figuring out how to do rate limits or anything of the sort becomes needlessly hard, and even then you can't really protect the app from too much traffic if it is the endpoint for all the requests, especially troublesome if it's also used by other internal services
  - no observability, since it becomes harder to tell whether there's a problem with the load overall (e.g. load balancer nodes are not keeping up), or the actual server implementation within the application
  - actually, not having a separate ingress also makes load balancing too hard in many setups, since service meshes simplify this bunches


Should have 2012 in the title (found as a meta header). nginx is the number one web server by market (although it depends on your data source/their measuring approach, so that's a bit arguable but NetCraft reports it has been since 2019 - https://news.ycombinator.com/item?id=29088538), and while the original component is open source, there's a lot of pro/plus/pay closed source pieces now.


It would be awesome, if some of the CS programs had a required course that covers this book or something very similar in an "official manner".

Many CS programs and (CS in general) can be very "theoretical", yes there are many solutions to sorting in O(1) or the philosophers-going-to-dinner :) But I think a course focus on a few "real-life high-valued-application (of cs)" can be super useful !


Sometimes the universe just throws things your way, I just started learning about nginx and I really like it, this overview is super helpful and I can’t wait to finish reading it.

The only thing I dislike is that there’s no good starter docs that I could find on their official website. There was some old cached article which now points to their new website.

I did a quick search on hacker news and the best article was also redirected to another website. But I found it using way back machine. (https://web.archive.org/web/20130728110847/http://carrot.is/...)


Previous Discussion https://news.ycombinator.com/item?id=10616989

But wow... I remember we had the same submission two to three years ago.... turns out its been six years.


Thanks! Expanded:

The Architecture of Open Source Applications: Nginx - https://news.ycombinator.com/item?id=10616989 - Nov 2015 (126 comments)

Nginx design details - https://news.ycombinator.com/item?id=4211480 - July 2012 (29 comments)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: