Hacker News new | past | comments | ask | show | jobs | submit | more acidx's comments login

Files are not decompressed in the server: it sends the unmodified deflate stream back to the user.


See my comment about this upstream, here: https://news.ycombinator.com/item?id=37410473


Lwan: https://lwan.ws

Been working on this for exactly 11 years today! It's just a webserver on the outside but it has become an umbrella for experimentation.


Lwan (https://lwan.ws) can also be used like this, although I wouldn't call it a "one liner" like many of the examples listed there:

    lwan -r /path/to/files/to/serve


Nice article and analysis! I'm actually considering scrapping the trie used in my project to something based off of this one, with some modifications:

For instance, find_node(c, Node48) could avoid the branch if a non-existing index points to an additional entry in child_ptrs that's always NULL. Lookup would be comparable to the Node256 version.

Another thing that could be done, is to scrap the Node48 entirely, and implement two new structs to replace it: Node32 and Node64, and use respectively AVX2 and AVX512. These can be based off of the Node16 version. It remains to be seen if these will yield better performance than the branchless Node48 above, especially if power management kicks in when mixing AVX512 with older SIMD generations.

The trie in Lwan (https://lwan.ws) does an interesting trick to reduce the amount of memory used in the equivalent of a Node256: instead of 256 pointers to a node, it has only 8 pointers. Characters are hashed (MOD 8). The leaf node contains a linked list of key/value pair, and an actual string comparison is performed at the end. (Lwan cheats here by avoiding a string comparison if the linked list contains only 1 element.) Works pretty well, as it's part of the URL routing mechanism.

One other experiment I've been making with tries, is to use the idea of key compression and use it in a different way: slice it every 4 or 8 bytes, consider those bytes as an arbitrary integer, and add every chunk of it to a hashmap<int, some_struct>, building a chain for the next lookup in some_struct. The prototype I wrote works pretty well.


Another standard trick to reduce memory is to store a bitfield telling you which pointers are not null, and then store an array of only the pointers that are non null. For example for a Node64 you store a 64 bits bitfield plus only the pointers that are not null, so if that node has only 3 children you store 64 bits plus 3 pointers instead of 64 pointers. You can index that structure by doing a shift + popcount: if you want to find the pointer at index n you first check the n-th bit in the bitfield to see if the pointer is null, and if it isn't you count the number of set bits in bitfield[0..n] to find the index into the pointer array.


The AVX512 power issues and startup latency likely mean that using it it will likely be a loss except in microbenchmarks or unusual workloads.


I like this trick that I use in C. It's limited, as it only matches string prefixes, but it works pretty well in the end: https://tia.mat.br/posts/2018/02/01/more_on_string_switch_in...

It allows you to write something like this: https://github.com/lpereira/lwan/blob/master/src/lib/lwan-ti... -- which is generated as a binary search by GCC.


Something we’ve used before is preprocessor string hashing similar to [1]. Basically we hash strings at compile time to ints and use those for our case statements. Word of caution though, there’s the possibility of collisions. It works great on a limited set of strings like XML tags and such.

1. http://lolengine.net/blog/2011/12/20/cpp-constant-string-has...


You know, I don't quite understand the point the author is making there. They concede that the compiler is smart enough to elide memcpys, but not smart enough to vectorize their strlen function? What?


I'm not aware of it. I'm learning Rust (just bought the "The Rust Programming Language" book, FWIW), so I will check it out.


Yes, Lwan's thread scheduler is pretty simple, and can be borderline useless if a thread blocks. This 2014 blog post has more details; not much in that regard has changed since then: https://tia.mat.br/posts/2014/10/06/life_of_a_http_request.h...


There is SSE, which, although unidirectional, can be used in lieu of websockets for a chat application (receive messages/events via SSE, send via another endpoint).

However, as I said before, this is an open source project; if you have an itch, you can scratch it. PRs are open. :)


What is SSE in this context?



The authors of Seastar used to use Lwan to benchmark their HTTP server implementation. When Seastar isn't using DPDK/usermode TCP stack, and drops to the POSIX API, Lwan is still faster.


Yes. And in other news a bicycle is faster than a Ferrari with no engine

EDIT: touché! i agree!


Curious that you compare Seastar with a Ferrari: those cars are known for an internal finish that's not compatible with their price tag... and Seastar is nothing like that. It's a seriously well engineered piece of software, and it shows: its readability and code quality is superb.


It's a known issue with TWFB. I don't know if they're planning on fixing it, but most frameworks that are not in newer rounds are not appearing in older rounds any longer.


Oh, that's a shame. Why is lwan no longer in the newer rounds?


Because I failed to move the benchmark harness from the Lwan repository to the TWFB repository. Also, I'm not really spending a lot of time on Lwan any longer; there are so many things I'd like to do, and so little time. :)


Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: