Hacker News new | past | comments | ask | show | jobs | submit login
BCHS software stack: BSD, C, httpd, SQLite (learnbchs.org)
153 points by edward on Dec 28, 2023 | hide | past | favorite | 91 comments



I've spent the better part of the year messing around with every HTTP server I could get my hands on. I would not recommend OpenBSD httpd.

It's supposed to be simple, and it is simple compared to Apache, but it also has way fewer eyes on it. Ultimately, a big pile of string-handling C is likely to have some problems. There was a trivial-to-exploit server-crashing segfault in httpd's FastCGI implementation that was only fixed in the last month or so. There were also recent issues related to null bytes and line feeds in headers causing strange, exploitable misinterpretations of incoming requests in relayd. Much of this is now fixed, but I wouldn't be at all surprised if there were more low-hanging fruit remaining.

If you're looking for a lightweight HTTP server written in C, I recommend Lighttpd. It's older, more widely-used, and more standards-compliant. I'm not trying to dump on OpenBSD; I run it on my primary laptop. I just wouldn't use their HTTP tools for anything mission-critical yet.


No HTTP server software can be guaranteed to be completely secure, but OpenBSD httpd is at least privsep, always runs in a chroot, and each process is pledged quite tightly, that means http/tls protocol speakers can't write to the filesystem, can't fork any processes, and can't execve(2).

lighttpd doesn't even chroot by default.


Surely the most secure would have to be a server written in a memory safe language? While you can do stupid things in any language, I would jump to Caddy (Go) or a Rust based toolkit before anything written in C.


You can do stupid things with "memory safe" languages too, and nothing protects you from misconfiguration. The bsd approach of just locking it down seems to be to be at least as secure if not more to me.


Trye, but 70% of vulnerabilities are memory safety problems. String handling is more of a pain in C, too. C is a terrible choice for this use case.


I love how having a couple of new languages make older, hardened programs insecure in an instant.

These security mechanisms are added and implemented for reasons and they're tested with the flame of time and non-hardened connections.

I'd take a battle-tested and mature implementation over a newer one any day, even if it's written in Malbolge.


> I love how having a couple of new languages make older, hardened programs insecure in an instant.

As someone who has been dealing with Unix since the early 90s, most of those old C programs were always security nightmares. There were a few exceptions: djb's stuff, dovecot and (surprisingly) Apache. But most of the other popular C servers were absolutely riddled with buffer overflows and other security problems. The Morris worm, the entire rsh family, you name it. Sendmail was awful for a long time, too.

Memory corruption errors still make up 70% of security holes in modern C and C++ code, and that's after decades of improvement. That entire set of vulnerabilities could be avoided in the 90s by using Perl 5 or Java, which were insanely popular. (PHP came with its own supply of security holes, both in the interpreter, the language design, and the awful database APIs.)

We've had popular memory-safe options for decades. All Rust brings to the table is the ability to be memory safe without needing GC (which is great). But with rare exceptions, the popular C servers were always full of holes.


There's no reason you couldn't also pledge an HTTP server written in a safe language too.

One key thing to keep in mind is that exploit mitigations are largely about limiting degrees of freedom. It's quite hard to contain an attacker who has gained code execution in a process (such as due to memory corruption) due to the massive amount of control they have. On the other hand, if you're able to prevent them from gaining such a strong foothold to begin with (such as by using safer languages which don't have such issues), you're in a much better spot because now they don't even gain any control over the process.


They have become insecure not because of new languages but because web is no longer just serving a bunch of files. You are delivering executable text mixed with untrusted input that must not improperly interfere with other users and for that you need impeccable string handling. Just reliably segfaulting every time there's a problem won't do.


> Surely the most secure would have to be a server written in a memory safe language?

Like rust which downloads unverified crates from the internet. /s


How does all of that prevent leaks and exploits of _user_ data that httpd has access to?


I never understood why httpd is included in the base of OpenBSD, by an OS that strips all non-essential code out for sake of security.


Because they used to have Apache in base from 1998 to 2014 (the 5.6 release) and found it frustrating to maintain? Given the developers and what they do with the operating system, I can understand why they would consider having an HTTP server in base something within the scope of the project.

It is an open source project with both commits and discussion happening in the open (tech@ in particular), so the barrier to understanding is reasonably acceptable. At some point it was also debated to move to nginx, but in the end httpd(8) was championed as a natural offshoot from relayd(8) which was already in base.


Small correction, wasn’t nginx in base from 5.1-5.4

https://www.openbsdhandbook.com/services/webserver/nginx/#:~....


You are indeed correct. I recalled that the nginx migration was in progress until around the time of 5.4, but did not remember that it was in base at the time. Thank you for correcting me, it was indeed in tree from 2011 to 2014.


OpenBSD maintained several local patches to nginx, such as chroot [0] by default and reallocarray fixes, but it was rejected by the upstream and too big to maintain locally.

OpenBSD httpd would probably have never existed if nginx upstream reacted differently.

https://www.openbsd.org/papers/httpd-slides-asiabsdcon2015.p...

[0] The OpenBSD nginx port still includes the default chroot patch, ~8 years later, see https://raw.githubusercontent.com/sthen/nginx_chroot_patch/a... and https://github.com/openbsd/ports/blob/master/www/nginx/Makef...


For the same reason that CVS is included in base: it's used by the developers for the continued development of the project. OpenBSD.org runs on httpd(8).


CGI in C (well, not CGI itself, but web apps) is super annoying. I spent 3 months writing web apps in C, took forever. There's a lot of convenient web app stuff in other languages that C doesn't make easy. Do not recommend.


Seemingly a million years ago I tried making a CGI application in C. It was a complete pain in the ass. I'm no C wizard (and was even further from that 25 years ago) but it was just an awful experience. Not only did it take me an inordinate amount of time to write it had memory management issues and inexplicable crashes. Total shit show and a waste of time.

I redid the whole project in a weekend in Perl and it was not only a better development experience but worked more reliably and was all around better. I miss CGI for developing web apps. It's straightforward and can easily (or at least in a straightforward way) be tested the same as a CLI tool.

A lot of the problems were my deficiencies in C and just general inexperience. I'm sure today there's a lot of nice CGI helper libraries that would make the process a lot easier. I also have a lot more experience in C so I'm confident I'd just do a better job writing such an app. I have no interest in it though. I'd rather "waste" cycles in a higher level language and solve problems in the domain rather than the tooling. But more power to the crazy diamonds writing web apps in C.


this particular stack - bchs - sounds great in a world where the following never happened:

1) transistor density and memory cost never advanced past what they were in 2000

2) no one ever took the Linux kernel project seriously

Your second to last sentence validates this: Cycles are computing power, and power is cheap. It gets cheaper every day. The person who can write flawless C code, including the boilerplate, is expensive.


At scale a few % efficiency can make a big difference.


At the scale where the savings is meaningful, like millions of dollars meaningful, you should be able to afford a million dollars of efficiency refactors.

You need to make a product that is fast/efficient enough to do the job. It needs to be delivered on a schedule that is probably really stupid from an engineering point of view. Efficiency can be patched over by throwing extra resources at the problem (up to a point). Functionality and market timing can't be patched over in the same way.

I think it makes sense to throw resources at inefficient code to get yourself into a position where that efficiency has a meaningful consequence. Customers aren't buying your product/service based on the number of instructions or memory used to do a task.


I’m not taking an issue with “prioritize time to market”. I’m taking an issue with “processor cycles and memory are actually meaningless now compared to engineering hours”. That is simply not always true.


I agree. In no situation that I would use C for webapp. Just use some framework like Rails or Django, or use Spring Boot or Go if you really want performance

It's a nice practice I guess


Once, I somehow got roped into writing a CGI application in C that I was supposed to write and test on a Linux system but that was going to run on a SunOS 4 system that I had no access to. That was...fun.


I see many 500 errors in your past.


Only after all the compile-time errors. The development process, such as it was, involved me writing and testing some part the program, then emailing it to someone (who had access to the sunos box) who would try to compile/run the program and who would email me back any errors, often several hours later.

I recall imagining that what I was experiencing must have been similar to what early programmers submitting batch jobs went through. 0/10. Do not recommend.


You should have made your first test program take a post request with the URL of the actual binary to run and have it fork-exec that. Now you have a "testing" service. :)


Wasn’t Amazon.com a massive C web app for a surprisingly long time.


I think it was C++ and Perl in the early 2000s, afaik.


Didn't they run Mason and Perl at some point?


The majority of all of the code is likely still C.

The "web framework" might have changed - but it's not like that's the majority of Amazon.com


Related:

BCHS: OpenBSD, C, httpd and SQLite web stack - https://news.ycombinator.com/item?id=29988951 - Jan 2022 (149 comments)

BCHS stack – BSD, C, httpd, SQLite - https://news.ycombinator.com/item?id=28269399 - Aug 2021 (58 comments)

BCHS: The BSD, C, httpd, SQLite stack for the web - https://news.ycombinator.com/item?id=23148871 - May 2020 (1 comment)

OpenBSD, C, httpd and SQLite – Web App stack - https://news.ycombinator.com/item?id=17272225 - June 2018 (174 comments)

BCHS stack – BSD, C, httpd, SQLite - https://news.ycombinator.com/item?id=14580746 - June 2017 (75 comments)

Kwebapp: rapid BCHS web app development - https://news.ycombinator.com/item?id=14454381 - May 2017 (1 comment)

BCHS Stack - BSD, C, Httpd, SQLite - https://news.ycombinator.com/item?id=11763888 - May 2016 (68 comments)


Simiplicity is nice, but there are reasons why Perl and PHP were the popular choices for web stacks in the early 2000's--they are faster and easier to develop with than C and likely safer than C too.

mod_perl (https://perl.apache.org/) and mod_php (https://cwiki.apache.org/confluence/plugins/servlet/mobile?c...) helped to make Apache httpd (https://httpd.apache.org/) the number one web server in the early days of the web.


Looks really cool, some feedback from a noob to optionally improve the “trivial” example:

1. can you put a link to the running http file served by the example so we could see how the result looks?

2. Defining CGI would be helpful because I looked it up and found “computer generated imagery” which doesn’t seem like what you meant.

3. Might be good for this example to load the html from a file or SQLite to fully trivialize the whole stack, as this example doesn’t include the S in the acronym.

sometimes I think the Next.js examples folder at https://github.com/vercel/next.js/tree/canary/examples is just an amazing example of how best to market a software product to developers because it’s such a rich source of integrations, almost anyone can find a good starting point for a web app project in there, if BCHS had the 80:20 of examples ready to roll then maybe it could blow up, because BCHS a great idea to use the most battle tested solutions in existence! Keep it up! bravo!


https://en.wikipedia.org/wiki/Common_Gateway_Interface

The most primitive version is just launching one process per request, piping the HTTP request into stdin, and piping the response out of stdout.

It works, but you can imagine the startup latency is rough and it takes a lot of resources.

There are faster variations that try to reduce the overhead. Ironically FaaS is sort of a rebirth of CGI


> 2. Defining CGI would be helpful because I looked it up and found “computer generated imagery” which doesn’t seem like what you meant.

Common Gateway Interface. It’s how we generated dynamic web content back in 1995.

Doing it with C is extremely painful.


It's typically common gateway interface [1] in the context of web serving.

[1]: https://en.m.wikipedia.org/wiki/Common_Gateway_Interface


> 1. can you put a link to the running http file served by the example so we could see how the result looks?

How would it look any different than if it was served from a bash script or the world's most complicated .NET container in a Kubernetes instance?


> Defining CGI would be helpful

This made me feel extremely old lol


You know, back in my day, every cool url had the letters cgi in there somewhere. Maybe at the end, maybe somewhere towards the middle.


i used to think cgi-bin was a "bin" to put your cgi scripts in (i mean, they were perl, not binary)


Indeed. We are now at a stage where new generations of Web Developers have absolutely no idea what CGI is.


Posted 13-times to HN

This is the post with most comments (149)

[0] https://news.ycombinator.com/from?site=learnbchs.org

[1] https://news.ycombinator.com/item?id=29988951


It makes sense, to me, that the recent posts would have the most comments. More people are online with each passing day, and some percentage of them join HN.


Did something similar. But instead of adding plain-text to HTML output, wrote a DOM parser in C that loads a HTML template and then adds the response into proper place of the DOM tree.


What's the benefit of parsing an HTML template, over just putting HTML string together with the data from the backend like "<p>your name: " + name + "</p>"?


In DOM you can add content gradually and asynchronously, which gets harder using HTML template.

Further you can read back previously added contents and make additional chagnes based on those. Like summing up table columns, or adding links to the first column values, or removing or fixing links based on authorization, to name a few.


Well XSS prevention for one.

Syntax aware template languages are a thing in the web space. They aren't the most popular but security people often like them because they lead to less mistakes. Some examples include soy, latte, and hack's XHP.


Did you use regex for the html part?


Wrote a code parser. Not regexp.

Found the old code on disk, snapshot here. The DOM parser, manipulator and free was 240 LOC.

https://codeshare.io/X8RQxl


Witchcraft! ;-) This is incredible.


That sounds hard!


I am a huge fan of simplicity when it comes to software development (although I must admit I'm not always great at achieving it), but I wonder whether the BCHS stack really offers anything that couldn't be achieved with a more "modern" stack.

httpd would be one of the things I'd look at, it's absolutely great software, but I wonder whether it really achieves the most simplicity from a developer point of view. For instance, in simpler deployments I would generally reach for Caddy [0], which does things such as certificate renewal automatically for me.

However, the part of the stack that really irks me is C. I'm a huge fan of C in an ideal world (where developers are perfect), and I respect the language for its role in the history of software development, and in the context of UNIX, however I just don't understand why use it in 2023 for something such as a web service. A web service is going to handle untrusted user input, deal with network boundaries, and is security-critical. A memory-unsafe language, where undefined behavior is easy to create but hard to find, which doesn't provide a lot of (useful) abstraction primitives other languages would provide, seems like the wrong choice. That's even before we start talking about how cumbersome it is to handle "strings" in C.

I'd wager Golang or Rust are always going to be better alternatives to C when it comes to developing web services. Golang makes deployment specially easy, while Rust provides similar or better performance than C, but provides more safety (memory and UB) and better abstraction primitives.

I believe I understand the purpose of this stack, and roughly who is going to enjoy it, just wondering whether I'm overlooking something, as I must admit I have never actually built a production service using CGI/C/httpd. I see this stack as something that's more philosophical rather than pragmatic towards development, if that makes sense, which is something I respect but wouldn't use (other than if I'm doing it just for fun).

[0]: https://caddyserver.com/


> I must admit I have never actually built a production service using CGI/C/httpd

Nor would you probably want to. In addition to the security nightmere that hooking an inexperienced c programmer's c program directly to the internet is, CGI is not really known for scaling all that well. Like if you were really doing this on a real high performance site you'd probably want to use FastCGI. But also you just wouldn't do this. If you want to be low level, at least use rust.


The cost if launching a process that’s not a JIT interpreter is tiny. It’s also a fixed cost, so it does scale.

In 2023 people are using lambdas on was. Slow cgi is fast enough.


This is close to my stack which is a single Go binary, SQLite db and either OpenBSD or Linux with a side of htmx.


Much prefer your stack, yours operates at a higher level of abstraction, the one which I would consider correct for web services (your website or REST api doesn't need to do syscalls, or allocate memory manually), while not sacrificing too much performance or simplicity.


This has become my go-to stack for playing around with for the last few months. Go/sqlite(bun)/templ/htmx with a sprinkle of proto-actors. Feels pretty close to phoenix framework with a few helper functions honestly but with the benefit of incredibly fast compilation of Go.

Single binary for distribution with assets/migrations embedded. Still need to build something substantial so I am sure there are edge cases/rough edges but so far it feels like a breath of fresh air compared to nodejs ecosystem.


Do you have any demo repos of this on gh


Very interested in this too.. not clear at what point something like Postgres will become necessary.

Just writing a simple hello world in node/express downloads a gazillion dependencies and code that'll all be points of failure or mystery for lack of understanding. To understand them all to be able to write non trivial stuff is likely no different from doing httpd in c on Linux.

I've done stuff in go and it makes it lot easier to code.


from the FAQ:

    > Is BCHS a joke?
    > Software development is full of jokes. This is not one of them.


Does anybody know the motivation behind using `curl -sD- -o/dev/null` instead of `curl -I` on the landing page?


The former appears to retrieve headers via a standard GET request. Apparently, with the latter method, there's a chance you may get different results than you would see from a GET request. (I'm not an expert, so this is just what I discovered after digging a bit for curiosity's sake.)


I think this is the exact difference. `curl -I` makes a HEAD request while the other is making a GET request and showing the response header. Just as a f'instance against a machine running nginx on my local network: the GET response sends me a Transfer-Encoding in the response header while a HEAD request does not. I can see a lot of configurations where a HEAD request returns different headers than a GET.


As a matter of standards compliance <https://www.rfc-editor.org/rfc/rfc9110#section-9.3.2-2>:

> However, a server MAY omit header fields for which a value is determined only while generating the content.

I find omitting Transfer-Encoding quite understandable and reasonable; the whole purpose of HEAD is to say “don’t bother doing the work that GET would trigger, I don’t care about exactness, I’m just getting a general idea”. Though I do find cases where Content-Length is omitted, even on static resources, disappointing. Saw that happen for I think the first time a few weeks ago (that is, a Content-Length that was present in GET but absent in HEAD).

But certainly I’ve seen more than a few 405 Method Not Allowed responses to HEAD, which is definitely bad.


> C is a straightforward, non-mustachioed language.

Don’t all the curly braces count as mustaches?


I remember hooking up opencv with the embeddable server mongoose 15 years back.

Today i would have considered mongoose(10k+ stars) which is also a mature c/c++ web server[1] if not the licence.

Also check out this thread(290+ pts) on tiny http servers in C. [2]

[1] https://github.com/cesanta/mongoose/tree/master/examples

[2] https://news.ycombinator.com/item?id=26671851


> "[C] has full access to the kernel's system calls"

rofl, exactly what i'm looking for in a web framework.

I honestly can't tell if this post is meant as satire or serious.


Not satire. 100% serious. It's made for people like you.


“BCHS is a stable, developer-oriented platform. Get used to minimalism and security”

With all the memory-safety issues you can introduce by improperly using C, is this page meant to be taken sarcastically or are they really serious about this claim?


Kristaps is an OpenBSD developer. These guys are notorious for having five heads each - two remote holes in 30 years, OpenSSH, PF, etc.

That said, the actual secret is to write simpler code. (And maybe use pledge+unveil, if your OS has it.)


Zero network services with open ports by default (not even sshd) also helps for that number.


Yes, "simpler" often means "don't do things you don't have to". A laptop doesn't really need sshd, and OpenBSD makes for an OK laptop OS.


Why wouldn’t it be serious? C has been used to write safe, stable, & portable software for decades. Compiler warnings & static analysis tools have come a long way to preventing the vast majority of safety issues in (new) C projects.


They're serious. C doesn't have to be insecure.


It's just extraordinarily difficult to make it so, and to convince yourself that you've made it so.


Often easier to convince yourself you've made it secure than to actually do so. Which is part of the problem.


If Python is considered secure yet CPython is written in C, then, perhaps, it is not that difficult?


You might want to take a few dozen hours to dig through John Regehr’s work to understand the current state of the art of C programming.


Is there a particular entry point into his work that you would recommend?


This[1] is one possible start. Bear in mind though his approach is academic so don’t expect a tidy list of what the working C programmer needs to know.

[1] https://blog.regehr.org/archives/1520


Any of his articles covering undefined behavior.


His work on fuzzing and test case reduction is really interesting too.


[flagged]


A response with substance would be something we could learn from but a meme isn’t.


The illusion of security start by believing your software is "safe" because written in a new "safe" language

https://nvd.nist.gov/vuln/detail/CVE-2023-22466


I don't really see a connection here. Rust doesn't magically solve all problems, it just makes lot of them less likely, which means we can successfully build larger systems.


If that’s the case then @safety” isn’t binary it’s a gradient and debating C++ vs Rust is a matter of degree instead of a moral imperative.


You don't because you are not competent enough, nothing wrong with that

C has the same tools, they however do not run with the compiler, but accomplish the same goal

Both, are still not immune to incompetence, the user is often the issue, hence the link


I feel like interacting with the internet using C is nowadays as productive as changing your JS framework every other year.


So, peak productivity!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: