I've spent the better part of the year messing around with every HTTP server I could get my hands on. I would not recommend OpenBSD httpd.
It's supposed to be simple, and it is simple compared to Apache, but it also has way fewer eyes on it. Ultimately, a big pile of string-handling C is likely to have some problems. There was a trivial-to-exploit server-crashing segfault in httpd's FastCGI implementation that was only fixed in the last month or so. There were also recent issues related to null bytes and line feeds in headers causing strange, exploitable misinterpretations of incoming requests in relayd. Much of this is now fixed, but I wouldn't be at all surprised if there were more low-hanging fruit remaining.
If you're looking for a lightweight HTTP server written in C, I recommend Lighttpd. It's older, more widely-used, and more standards-compliant. I'm not trying to dump on OpenBSD; I run it on my primary laptop. I just wouldn't use their HTTP tools for anything mission-critical yet.
No HTTP server software can be guaranteed to be completely secure, but OpenBSD httpd is at least privsep, always runs in a chroot, and each process is pledged quite tightly, that means http/tls protocol speakers can't write to the filesystem, can't fork any processes, and can't execve(2).
Surely the most secure would have to be a server written in a memory safe language? While you can do stupid things in any language, I would jump to Caddy (Go) or a Rust based toolkit before anything written in C.
You can do stupid things with "memory safe" languages too, and nothing protects you from misconfiguration. The bsd approach of just locking it down seems to be to be at least as secure if not more to me.
> I love how having a couple of new languages make older, hardened programs insecure in an instant.
As someone who has been dealing with Unix since the early 90s, most of those old C programs were always security nightmares. There were a few exceptions: djb's stuff, dovecot and (surprisingly) Apache. But most of the other popular C servers were absolutely riddled with buffer overflows and other security problems. The Morris worm, the entire rsh family, you name it. Sendmail was awful for a long time, too.
Memory corruption errors still make up 70% of security holes in modern C and C++ code, and that's after decades of improvement. That entire set of vulnerabilities could be avoided in the 90s by using Perl 5 or Java, which were insanely popular. (PHP came with its own supply of security holes, both in the interpreter, the language design, and the awful database APIs.)
We've had popular memory-safe options for decades. All Rust brings to the table is the ability to be memory safe without needing GC (which is great). But with rare exceptions, the popular C servers were always full of holes.
There's no reason you couldn't also pledge an HTTP server written in a safe language too.
One key thing to keep in mind is that exploit mitigations are largely about limiting degrees of freedom. It's quite hard to contain an attacker who has gained code execution in a process (such as due to memory corruption) due to the massive amount of control they have. On the other hand, if you're able to prevent them from gaining such a strong foothold to begin with (such as by using safer languages which don't have such issues), you're in a much better spot because now they don't even gain any control over the process.
They have become insecure not because of new languages but because web is no longer just serving a bunch of files. You are delivering executable text mixed with untrusted input that must not improperly interfere with other users and for that you need impeccable string handling. Just reliably segfaulting every time there's a problem won't do.
Because they used to have Apache in base from 1998 to 2014 (the 5.6 release) and found it frustrating to maintain? Given the developers and what they do with the operating system, I can understand why they would consider having an HTTP server in base something within the scope of the project.
It is an open source project with both commits and discussion happening in the open (tech@ in particular), so the barrier to understanding is reasonably acceptable. At some point it was also debated to move to nginx, but in the end httpd(8) was championed as a natural offshoot from relayd(8) which was already in base.
You are indeed correct. I recalled that the nginx migration was in progress until around the time of 5.4, but did not remember that it was in base at the time. Thank you for correcting me, it was indeed in tree from 2011 to 2014.
OpenBSD maintained several local patches to nginx, such as chroot [0] by default and reallocarray fixes, but it was rejected by the upstream and too big to maintain locally.
OpenBSD httpd would probably have never existed if nginx upstream reacted differently.
For the same reason that CVS is included in base: it's used by the developers for the continued development of the project. OpenBSD.org runs on httpd(8).
CGI in C (well, not CGI itself, but web apps) is super annoying. I spent 3 months writing web apps in C, took forever. There's a lot of convenient web app stuff in other languages that C doesn't make easy. Do not recommend.
Seemingly a million years ago I tried making a CGI application in C. It was a complete pain in the ass. I'm no C wizard (and was even further from that 25 years ago) but it was just an awful experience. Not only did it take me an inordinate amount of time to write it had memory management issues and inexplicable crashes. Total shit show and a waste of time.
I redid the whole project in a weekend in Perl and it was not only a better development experience but worked more reliably and was all around better. I miss CGI for developing web apps. It's straightforward and can easily (or at least in a straightforward way) be tested the same as a CLI tool.
A lot of the problems were my deficiencies in C and just general inexperience. I'm sure today there's a lot of nice CGI helper libraries that would make the process a lot easier. I also have a lot more experience in C so I'm confident I'd just do a better job writing such an app. I have no interest in it though. I'd rather "waste" cycles in a higher level language and solve problems in the domain rather than the tooling. But more power to the crazy diamonds writing web apps in C.
this particular stack - bchs - sounds great in a world where the following never happened:
1) transistor density and memory cost never advanced past what they were in 2000
2) no one ever took the Linux kernel project seriously
Your second to last sentence validates this: Cycles are computing power, and power is cheap. It gets cheaper every day. The person who can write flawless C code, including the boilerplate, is expensive.
At the scale where the savings is meaningful, like millions of dollars meaningful, you should be able to afford a million dollars of efficiency refactors.
You need to make a product that is fast/efficient enough to do the job. It needs to be delivered on a schedule that is probably really stupid from an engineering point of view. Efficiency can be patched over by throwing extra resources at the problem (up to a point). Functionality and market timing can't be patched over in the same way.
I think it makes sense to throw resources at inefficient code to get yourself into a position where that efficiency has a meaningful consequence. Customers aren't buying your product/service based on the number of instructions or memory used to do a task.
I’m not taking an issue with “prioritize time to market”. I’m taking an issue with “processor cycles and memory are actually meaningless now compared to engineering hours”. That is simply not always true.
I agree. In no situation that I would use C for webapp. Just use some framework like Rails or Django, or use Spring Boot or Go if you really want performance
Once, I somehow got roped into writing a CGI application in C that I was supposed to write and test on a Linux system but that was going to run on a SunOS 4 system that I had no access to. That was...fun.
Only after all the compile-time errors. The development process, such as it was, involved me writing and testing some part the program, then emailing it to someone (who had access to the sunos box) who would try to compile/run the program and who would email me back any errors, often several hours later.
I recall imagining that what I was experiencing must have been similar to what early programmers submitting batch jobs went through. 0/10. Do not recommend.
You should have made your first test program take a post request with the URL of the actual binary to run and have it fork-exec that. Now you have a "testing" service. :)
Simiplicity is nice, but there are reasons why Perl and PHP were the popular choices for web stacks in the early 2000's--they are faster and easier to develop with than C and likely safer than C too.
Looks really cool, some feedback from a noob to optionally improve the “trivial” example:
1. can you put a link to the running http file served by the example so we could see how the result looks?
2. Defining CGI would be helpful because I looked it up and found “computer generated imagery” which doesn’t seem like what you meant.
3. Might be good for this example to load the html from a file or SQLite to fully trivialize the whole stack, as this example doesn’t include the S in the acronym.
sometimes I think the Next.js examples folder at https://github.com/vercel/next.js/tree/canary/examples is just an amazing example of how best to market a software product to developers because it’s such a rich source of integrations, almost anyone can find a good starting point for a web app project in there, if BCHS had the 80:20 of examples ready to roll then maybe it could blow up, because BCHS a great idea to use the most battle tested solutions in existence! Keep it up! bravo!
It makes sense, to me, that the recent posts would have the most comments. More people are online with each passing day, and some percentage of them join HN.
Did something similar. But instead of adding plain-text to HTML output, wrote a DOM parser in C that loads a HTML template and then adds the response into proper place of the DOM tree.
What's the benefit of parsing an HTML template, over just putting HTML string together with the data from the backend like "<p>your name: " + name + "</p>"?
In DOM you can add content gradually and asynchronously, which gets harder using HTML template.
Further you can read back previously added contents and make additional chagnes based on those. Like summing up table columns, or adding links to the first column values, or removing or fixing links based on authorization, to name a few.
Syntax aware template languages are a thing in the web space. They aren't the most popular but security people often like them because they lead to less mistakes. Some examples include soy, latte, and hack's XHP.
I am a huge fan of simplicity when it comes to software development (although I must admit I'm not always great at achieving it), but I wonder whether the BCHS stack really offers anything that couldn't be achieved with a more "modern" stack.
httpd would be one of the things I'd look at, it's absolutely great software, but I wonder whether it really achieves the most simplicity from a developer point of view. For instance, in simpler deployments I would generally reach for Caddy [0], which does things such as certificate renewal automatically for me.
However, the part of the stack that really irks me is C. I'm a huge fan of C in an ideal world (where developers are perfect), and I respect the language for its role in the history of software development, and in the context of UNIX, however I just don't understand why use it in 2023 for something such as a web service. A web service is going to handle untrusted user input, deal with network boundaries, and is security-critical. A memory-unsafe language, where undefined behavior is easy to create but hard to find, which doesn't provide a lot of (useful) abstraction primitives other languages would provide, seems like the wrong choice. That's even before we start talking about how cumbersome it is to handle "strings" in C.
I'd wager Golang or Rust are always going to be better alternatives to C when it comes to developing web services. Golang makes deployment specially easy, while Rust provides similar or better performance than C, but provides more safety (memory and UB) and better abstraction primitives.
I believe I understand the purpose of this stack, and roughly who is going to enjoy it, just wondering whether I'm overlooking something, as I must admit I have never actually built a production service using CGI/C/httpd. I see this stack as something that's more philosophical rather than pragmatic towards development, if that makes sense, which is something I respect but wouldn't use (other than if I'm doing it just for fun).
> I must admit I have never actually built a production service using CGI/C/httpd
Nor would you probably want to. In addition to the security nightmere that hooking an inexperienced c programmer's c program directly to the internet is, CGI is not really known for scaling all that well. Like if you were really doing this on a real high performance site you'd probably want to use FastCGI. But also you just wouldn't do this. If you want to be low level, at least use rust.
Much prefer your stack, yours operates at a higher level of abstraction, the one which I would consider correct for web services (your website or REST api doesn't need to do syscalls, or allocate memory manually), while not sacrificing too much performance or simplicity.
This has become my go-to stack for playing around with for the last few months. Go/sqlite(bun)/templ/htmx with a sprinkle of proto-actors. Feels pretty close to phoenix framework with a few helper functions honestly but with the benefit of incredibly fast compilation of Go.
Single binary for distribution with assets/migrations embedded. Still need to build something substantial so I am sure there are edge cases/rough edges but so far it feels like a breath of fresh air compared to nodejs ecosystem.
Very interested in this too.. not clear at what point something like Postgres will become necessary.
Just writing a simple hello world in node/express downloads a gazillion dependencies and code that'll all be points of failure or mystery for lack of understanding. To understand them all to be able to write non trivial stuff is likely no different from doing httpd in c on Linux.
I've done stuff in go and it makes it lot easier to code.
The former appears to retrieve headers via a standard GET request. Apparently, with the latter method, there's a chance you may get different results than you would see from a GET request. (I'm not an expert, so this is just what I discovered after digging a bit for curiosity's sake.)
I think this is the exact difference. `curl -I` makes a HEAD request while the other is making a GET request and showing the response header. Just as a f'instance against a machine running nginx on my local network: the GET response sends me a Transfer-Encoding in the response header while a HEAD request does not. I can see a lot of configurations where a HEAD request returns different headers than a GET.
> However, a server MAY omit header fields for which a value is determined only while generating the content.
I find omitting Transfer-Encoding quite understandable and reasonable; the whole purpose of HEAD is to say “don’t bother doing the work that GET would trigger, I don’t care about exactness, I’m just getting a general idea”. Though I do find cases where Content-Length is omitted, even on static resources, disappointing. Saw that happen for I think the first time a few weeks ago (that is, a Content-Length that was present in GET but absent in HEAD).
But certainly I’ve seen more than a few 405 Method Not Allowed responses to HEAD, which is definitely bad.
“BCHS is a stable, developer-oriented platform.
Get used to minimalism and security”
With all the memory-safety issues you can introduce by improperly using C, is this page meant to be taken sarcastically or are they really serious about this claim?
Why wouldn’t it be serious? C has been used to write safe, stable, & portable software for decades. Compiler warnings & static analysis tools have come a long way to preventing the vast majority of safety issues in (new) C projects.
This[1] is one possible start. Bear in mind though his approach is academic so don’t expect a tidy list of what the working C programmer needs to know.
I don't really see a connection here. Rust doesn't magically solve all problems, it just makes lot of them less likely, which means we can successfully build larger systems.
It's supposed to be simple, and it is simple compared to Apache, but it also has way fewer eyes on it. Ultimately, a big pile of string-handling C is likely to have some problems. There was a trivial-to-exploit server-crashing segfault in httpd's FastCGI implementation that was only fixed in the last month or so. There were also recent issues related to null bytes and line feeds in headers causing strange, exploitable misinterpretations of incoming requests in relayd. Much of this is now fixed, but I wouldn't be at all surprised if there were more low-hanging fruit remaining.
If you're looking for a lightweight HTTP server written in C, I recommend Lighttpd. It's older, more widely-used, and more standards-compliant. I'm not trying to dump on OpenBSD; I run it on my primary laptop. I just wouldn't use their HTTP tools for anything mission-critical yet.