Hacker News new | past | comments | ask | show | jobs | submit login
Nweb: a tiny, safe web server (static pages only) (ibm.com)
103 points by josstin on April 1, 2014 | hide | past | favorite | 60 comments



A few months ago, I wrote httpdito, a tiny web server that serves static pages only. It's about the same amount of code as nweb, but less functionality, and I have more confidence in its security: http://canonical.org/~kragen/sw/dev3/server.s, with README at http://canonical.org/~kragen/sw/dev3/httpdito-readme. It's 296 instructions.

I'm not saying it's secure, but I certainly intended it to be, and it doesn't suffer from the particular problems tptacek, evmar, kedean, and nknighthb identify in nweb. I'd like to think I'm not naïve enough to have written problems like that, but that's probably not true.

(I'm pretty sure that "Try my new secure software!" is something that should not be followed with "I wrote it in C!" but usually assembly language is not going to be an improvement. In this case I think it happens to be.)

httpdito was discussed on HN a bit before it was finished; for example, it's no longer completely trivial to DoS it, although I could do more to protect it against that.


Adding to what everyone else has said, this also "how not" to write socket code; for instance, the assumption that you can read a whole HTTP request "in one go" with a single large read call is false.

Also, casting function calls to (void) is nonsensical.

You can perhaps forgive the sprintf() call because, AIX. (Believe it or not, there was a time when snprintf was a portability problem). You can't forgive the log() function that doesn't explicitly bounds check its argument (though it's not exploitable in this code).


> Also, casting function calls to (void) is nonsensical.

This is an older convention to indicate that the programmer knows the function returns a value but has chosen to ignore it. It got around the false positives generated by lint.

This code uses SIGCLD which I don't think is supported by BSD, where it is called SIGCHLD and is slightly different (someone please correct me if I'm wrong). If that's the case, then the author's assertion that it "should run unchanged on AIX, Linux®, or any other UNIX version" is incorrect - it will only run on System V varieties.


Confirmed the SIGCLD problem.

I had to change it to SIGCHLD to get it to compile on my Mac.


There is also the matter of directly stuffing the sockaddr_in with calls like inet_addr(), which has long been considered inferior to alternatives using getaddressinfo().

This also has the side effect of this implementation not supporting IPv6, among others.

So please, don't use this as anything other than an example how not to do sockets.

If you want to learn how to do it right, I can recommend the excellent "Guide to Network Programming Using Internet Sockets" by Brian "Beej Jorgensen" Hall (You should definitely buy the printed version, just to support him :)).


> Also, casting function calls to (void) is nonsensical.

Does extremely pedantic C require the results of function calls to be used?

I know the correct way to mark a variable as unused is to cast it to void, but I'm not sure if you're supposed to do that for function return values as well.


Nothing about the cast is "required". However, it's good practice because the cast explicitly acknowledges the function has a return value that we're throwing out. Looking at the following function call:

    (void)create_widget(&widget);
It's clear that `create_widget` returns some value -- it's probably an error code that tells us whether the widget creation was successful. The source code here says "I know this function returns an important value, but I'm going to disregard it here". This could be useful for debugging if you find that, for example, an error condition is being handled improperly (e.g. "oh, we're obviously throwing out the return value of this function, which we shouldn't be doing"). This function call is much less informative:

    create_widget(&widget);


It's a terrible and (I think) amateurish practice that misapprehends the point of C's type checking while adding lots of visual noise, which is why you'll virtually never see it in well-liked C code.


The list of MIME types looks weird, too. There's stuff like "image/zip" in the source code.


This crowd is always tough to please. There's a description at the top which says, about the 200 loc http server:

You can see exactly what it can and can't do.

Thank you Mr. Griffiths. Your example will help extend my understanding of an http server, even if I don't intend on writing one. I would never read through the 90 klocs of httpd.


My C is a little rusty, but it seems like this web server is definitely not safe. The very first function in the code has a local stack variable and uses sprintf() to fill it. That's almost a textbook example of a buffer overflow vulnerability, if I'm not mistaken. Even if they try and compensate for that by checking the data length before it's passed to that function, it's still scary to see someone using sprintf() instead of snprintf() these days. It's like walking a tightrope without a net.


It's scary style (and speaking of style, this code is really inconsistent in its formatting), but from a quick search of the usages of the logger function, I didn't see any way to overflow the buffer.

* BUFSIZE is 8096.

* logbuffer (the local variable) is BUFSIZEx2

* s1 looks like it's always trusted and an order of magnitude smaller than BUFSIZE.

* The format strings and numbers are nowhere near big enough to make up the difference.

* Where s2 is untrusted data, I think it's always guaranteed to be <=BUFSIZE and zero-terminated.

But there are definitely other possible issues I haven't looked at closely, and I'm certainly troubled that this mess has showed up on an IBM site as an example of a "safe" web server.


I always enjoy articles that reiterate how the simple stuff really is pretty simple. I like thttpd for that reason, its a really simple (and a bit more featureful) webserver than this one, but not by a lot. Easy to comprehend, easy to keep all the moving pieces in your head in one piece.

Folks building embedded stuff have been using this stuff to create their UIs for like forever it seems, and this kind of web server works pretty well in that capacity.

[1] http://www.acme.com/software/thttpd/


This code is really not good, and certainly not worth learning from.

It appears that if you request a path like "//etc/foobar" with two slashes at the front it'll allow traversal outside the starting directory, though it's mitigated by checking file extensions.


That may be, but the docs are quite nice; people who can write better code might still gain from considering how they can make their documentation this instructive.


I am always on the look out for a small, lightweight and secure web server for impromptu file sharing. Right now I use publicfile from djb.[^1] My only complaint is that there is no debian package for publicfile so I have to build my own package. I would love to find an equivalent (ftp not necessary) daemon that is included in debian. Is anyone aware of a something in debian repos that I am overlooking?

[^1]: http://cr.yp.to/publicfile.html


http://smarden.org/pape/Debian/unofficial/why.html

DJB appears to have changed the license to public domain, but at some point in the past there was a license that prohibited distribution of binaries from modified source. There's still the request that the paths not be changed to comply with the FHS.

DJB has strong opinions about the construction and distribution of software. So do quite a lot of debian people. These views are incompatible.


The page you linked to was last updated 11+ years ago. Notice the discussion of qmail|ucspi|djbdns-installer packages? They are not around anymore, you can install qmail/ucspi/daemontools etc. There is even djbdns and the debian tweaked + community patches version known as dbndns. Yes djb and debian devs are quite opinionated, but I do not think that is the reason that publicfile is not in debian despite the fact that so many other djb packages are.

I think the reason is low demand largely due to the fact that gnome-desktop depends on gnome-user-share (AKA: apache).


This is hardly in the category of "lightweight and secure", but for impromptu stuff, this Ruby+Rack one-liner serves directory listings and static files from the current directory:

    ruby -e 'require "rack"; include Rack; \
      Server.start :app => Directory.new(".", \
      Static.new(nil, :urls => ["/"], :root => "."))'


    python -m SimpleHTTPServer


the minimal Ruby equivalent would be

    ruby -run -ehttpd .
Here's a handy snippet for bashrc:

    function S {
      ruby -run -ehttpd . -p${1-8080}  # default to port 8080 unless given as parameter
    }


although that server is very horrible


Works fine for some tests, but it's single threaded. If you need concurrency:

    twistd -no web --path=.


Perhaps, but the use case it usually finds is "I need a webserver, here. Now."

Python is nearly always installed, and that depends on nothing but the standard library. You can stick an alias in your dotfiles (I have, it's called "serve-this") and not have to worry about having twisted¹ getting to wherever your dotfiles get put. (I distribute my dotfiles over git/github, so it's really easy to move them around. More work to get Twisted.)

¹Or Ruby… or Go…


Twisted is pre-installed on OS X and various Linux distributions...


or in go:

    package main

    import (
        "flag"
        "net/http"
    )

    var serveDir = flag.String("d", ".", "Directory to serve from")

    func main() {
        flag.Parse()
        panic(http.ListenAndServe("127.0.0.1:8000", http.FileServer(
            http.Dir(*serveDir))))
    }


What's wrong with it? Honestly, I don't know. I know Python probably isn't a great choice for performance reasons, but anything else? It's super easy to build off and configure. Seems fine for a beginner (which I admittedly am; hence my asking).


Main issue is it's single threaded. So if you use it to share big files with an other machines, only 1 file can be downloaded at a time and a download will block browsing of available files.


It's also very buggy. Its only excuse is being old.


This is mine:

    #!/bin/bash
    unlink /Library/WebServer/Documents
    ln -s "$(pwd)" /Library/WebServer/Documents
    echo "http://localhost/ mounted on $(pwd)"
Of course, it's only "lightweight" in as far as you already have Apache installed and running.


Doesn't DJB's stuff usually compile faster than you can install a Debian package? :)

You could always try to package it for Debian, although I understand that's not a small amount of work.

I don't know of anything else designed for security. Hard to beat DJB in that respect. If you left out the security requirement, I would say just use the Python SimpleHTTPServer. I think it's probably secure, but definitely not explicitly designed for it.


I use equivs to create a virtual package that does nothing but depends/recommends/suggests other packages. I have two dfc.deb and dfc-workstation.deb, dfc-workstation adds GUI/multimedia/end-user tools that I do not need everywhere. When I get a new machine I can install these packages and all of the things I depend on will be installed. If publicfile was in debian it would be automatically installed. As it is I have to copy the package I created with checkinstall and install it manually.


I think my project srvdir might be exactly what you're looking for: https://srvdir.net



Webfsd is in the Debian repos but the stupid person who packaged it created an init script for it as if you want to run it as a service rather than ad hoc. So you may as well build from source...


Or you could contribute something back and file a bug report, or maybe even a patch allowing people to use it either way. Certainly beats insulting someone you don't even know who was trying to make the world a little bit better by doing some free work.


I want to learn a bit about web servers and I think that study the source code of a functional one may worth more than try to build something from scratch at first glance. Since I'm seeing too many comments on the security issues of this particular project, can you guys recommend something more reliable?

Thanks in advance.

Edit: I "know" C and C++ and would like to remain in one of these languages, if it's not asking too much.


If you care about security then C and C++ are out of the game, specially if there is a team of different skill sets involved.

Having said this, have a look at Wt and Poco

http://www.webtoolkit.eu/wt

http://pocoproject.org


Try also libonion: https://github.com/davidmoreno/onion

It is not a HTTP server, but a library to create your own ones. With many examples, as a trivial only share files at https://github.com/davidmoreno/onion/blob/master/examples/ba...


http://www.ibm.com/developerworks/systems/library/es-nweb/si... Direct link to the (200 lines of) source code. I can't speak to the security, but it is a nice little read.


The only thing safer about this that I see is it is extremely small. No high assurance design, etc. What am I missing here that makes them advertise its safety?


Does this do SSL ? If not, there is no reason to use this instead of the (excellent) thttpd.

thttpd is a very, very nice tool. It's very handy sometimes to just fire up thttpd -d /some/dir because you want to look at the contents of the dir in a web browser but don't want to spin up the whole environment and server, etc.

I put thttpd on a lot of informal servers just to have it around when I need something like that...


This is not intended for real world usage. Its purpose is to help others learn how web servers work.


Just because the code is tiny doesn't mean that it is safe. How do we know that the code is not vulnerable to e.g. buffer overflow exploit?


I find your comment particularly funny, because in the information security course I took for my masters degree, we had to perform remote buffer overflow exploits using an older version of this exact software.


The code is ugly, checkout https://github.com/cesanta/mongoose (GPL & MIT) or https://github.com/sunsetbrew/civetweb (GPL)


If I had infinite time and patience, I'd tinker with this to show the differences in socket code, specifically with the approaches outlined in the C10k document.

Although it was largely unfinished, the approach outlined in C10m would be interesting to see implemented here (via the intel user-space driver).


For those of you who are interested in a tiny, safe, static file server that provides secure, public URLs from any machine (ngrok-style), I have a simple project called srvdir that will probably be useful to you: https://srvdir.net


Nice, but the writing suggests that it's secure as in end-to-end encryption, what people expect from HTTPS, while in fact you tunnel everything in plain through a central server. You should make this clear on the site.


> if LINUX sleep for one second to ensure the data arrives at the browser

can someone explain that please?


It's explained further down:

After the last byte of the file is sent, the nweb web server web() function stops for one second. This is to enable the file contents to be sent down the socket. If it immediately closes the socket, some operating systems do not wait for the socket to finish sending the data but drops the connection very abruptly. This would mean that some of the file content would not get to the browser, and this confuses the browser by waiting forever for the last bit of the file and often results in a blank web page being displayed.


This text has an error rate of about one factual error per sentence, as you'd expect from someone who capitalizes "Linux" as if it were an acronym.


We're definitely in "works on my machine" territory here.

Still it's good for building your confidence as a programmer, no?


ah, sorry - i just read a part and ctrl+f'd for this, but searched for the wrong terms apparently.

would this cause connection timeouts if copying data to the socket took longer than one second? i'm always a bit sceptical when i encounter such seemingly arbitrary timing assumptions.


This is a great example of how to clearly document a project.

The technical criticisms are confirmation of that. How many HN submissions to open source projects are as well explained?


today i learned: there's still AIX. Huh, really?


Yup. Banks still have lots of these, and Solaris is being phased out more actively than AIX.


Not a prank!


This is actually the first thing I thought, after having a quick look at this code, and being aware of todays date ;-)


All you really need is

> python -m SimpleHTTPServer




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: