But websocket connections are usually long lasting. So the cost of the fork is l...

jchw · on Jan 26, 2019

It probably scales better than a forking HTTP server, but probably not much; modern HTTP connections tend to be at least a little bit long lasting and few people would dare serve a large site on a forking webserver (in part thanks to the fact that most webservers have moved on to workers or event loops.)

It certainly would hold up inordinately poor to a DDoS attack.

toast0 · on Jan 26, 2019

A process per client with keepalive is definitely a losing proposition. But if you're willing to run old-school one request per connection, it's not unreasonable (although adding TLS to that is). Many years ago, while I was at Yahoo, someone made a clever hack: have a daemon that holds keepalive sockets and passes them to the (y)Apache daemon when they have something to read -- when Apache is done with the request, give it back to the daemon. (Sockets passed back and forth as file descriptors on a Unix socket). A further many years ago, David Filo came up with the idea of accept filters -- allowing a program to request the kernel to accept connections and have accept only return sockets that have a fully formed http request already, so an Apache (or whatever crazy webserver before Yahoo switched to Apache) wouldn't have to wait for the client there either.

paulddraper · on Jan 26, 2019

> modern HTTP connections tend to be at least a little bit long lasting

But CGI doesn't fork per TCP connection; it forks per HTTP request/response.

jchw · on Jan 26, 2019

True, but as far as I know none of the major HTTP servers use forking anymore either. I believe there was a time when forking per connection was fairly standard for servers.

ignoramous · on Jan 26, 2019

I think it all boils down to how much can it scale. At what point do # of processes tip over a server versus how many # of threads can it handle versus how many # green threads a program runtime can manage.

limsup · on Jan 26, 2019

It's not so much the fork but the memory cost. Each of those subprocesses has at least one call stack = 2 megabytes of memory. 2 megabytes per connection is many many orders of magnitude more that you would use in an asynchronous server.

sebcat · on Jan 26, 2019

1) that's virtual size, and most likely (depending on OS/cfg) COW (assuming no call to execve).

2) that's a default - most systems allow tuning

You can have pretty decent performance with forking models if you 1) have an upper bound for # of concurrent processes 2) have an input queue 3) cache results and serve from cache even for very small time windows. Not execve'ing is also a major benefit, if your system can do that (e.g. no mixing of threads with forks). In forking models, execve+runtime init is the largest overhead.

It will not beat other models, but forking processes offer other benefits such as memory protection, rlimits, namespace separation, capsicum/seccomp-bpf based sandboxing, ...

YMMV

MrTonyD · on Jan 26, 2019

I think you guys are both right. Back in the days when I measured UNIX performance, it was fork that was expensive due to memory allocation - but not the memory itself. It takes time to allocate all the page tables associated with the memory when you are setting up for the context switch. But I should admit that it was a long time ago that I traced that code path.

sebcat · on Jan 26, 2019

prior thread with some ad-hoc measurements: https://news.ycombinator.com/item?id=16714403

hendry · on Jan 26, 2019

Socket connection alone is too expensive.

IRC shows us that maintaining a reliable socket for most folks is next to impossible.

vidarh · on Jan 26, 2019

On the contrary. IRC shows that maintaining a consistent network where every part can always reach every other part is tricky, but the most common problem with irc networks is not clients getting booted off, but net-splits that usually automatically resolves pretty quickly.

That they're visible to clients is an issue with how channels spans servers and how operator status and channel membership is tied to who happens to be on a channel on a specific partition at a certain time, and how messages are propagated when splits resolve, and how inter-server communications happens.

So it has plenty of lessons if you want to build a chat network, but nothing with it suggests maintaining a connection is otherwise a big problem.

In general what it boils down to is the word "reliable": You would want to write your app so that a client that disconnects and reconnects gets a sensible behavior on reconnecting, e.g. by queuing messages when it makes sense, and discard them if it does not, to paper over temporary connection failures.

But you would need to do that if you were to use stateless request/response pairs anyway.

lexicality · on Jan 26, 2019

I disagree. I've not seen a netsplit in months, but clients constantly timing out and rejoining is a fact of life.

emj · on Jan 26, 2019

That is a different problem, clients can always have bad connections that is not solved by UDP/HTTP.

hendry · on Jan 26, 2019

Well http is a short lived request/response.

Sockets need keep alives and resources allocated et al.