KestrelHttpServer SuperCharged MemoryPoolIterator

jsingleton · on Nov 22, 2016

Ben has been once of the main external contributors to Kestrel and is a big reason of why it's so fast. Great work! I have no idea how he manages to do all of this and write a game at the same time.

https://www.techempower.com/blog/2016/11/16/framework-benchm...

https://www.techempower.com/benchmarks/#section=data-r13&hw=...

NKCSS · on Nov 22, 2016

quote:

"benaadams commented on 3 Oct • edited I couldn't believe there wasn't a number in the 2^64 space that did this. However, after exhaustive internet searching I couldn't find one. So I went to sleep, woke up in the middle of the night with a headache and idea of a method to try and actually derived it.

Having derived it, I did some searching with the search engines and couldn't find any hits either in its decimal or hexadecimal from so called it Ben's Magic Number

I will write a blog post on the method I used."

Then:

"I tried to come up with a similar magic number that would allow to obtain the index after shifting its bits, and found a different one. Wouldn't this number just do the same and be a bit easier to understand? 0x01020304050607 We know that after doing x=(v & (-v)) we get a power of 2, which when multiplied with the magic number will shift its bits by a multiple of 8 positions: So after being multiplied by x, 0x0001020304050607 is turned into

0x0304050607000000 when shifted by a 3-byte offset 0x0405060700000000 when shifted by a 4-byte offset 0x0506070000000000 when shifted by a 5-byte offset and so on... We then simply have to read the 1st byte of the resulting number to find the offset, ie: offset = (((v & -v) * Magic) >> 56) & 0x7"

Followed by:

"benaadams commented on 4 Oct • edited @nicodeslandes Congats! That is a lot more obvious and what is happening; and drops a shift. Already superseded; that's the power of OSS collaboration!"

The comments here are very nice and worth a read.

Arnavion · on Nov 22, 2016

If anyone wonders like I did what markdown magic could've made the collapsible "Details" sections in the second comment, it's actually standard HTML5 - https://developer.mozilla.org/en-US/docs/Web/HTML/Element/de...

bhrgunatha · on Nov 22, 2016

How do you use github flavoured markdown to embed <details> tags - surely you can't just use the html tags? Isn't that the whole point of markdown - to limit users html input so there aren't any security escalations?

jfroma · on Nov 22, 2016

It is not the point of markdown, in fact any html tag is valid in markdown if you read the original description [1].

Github does strip some tags for security reasons like <script>. Some of the parsers I have seen offer this feature.

1: https://daringfireball.net/projects/markdown/syntax#html

SeriousM · on Nov 22, 2016

Thank you, I was really wondering if github is collapsing too long code snippets on mobile.

hitr · on Nov 22, 2016

It seems like kestrel part of .net core got some amazing performance improvements contributions from the open.I see the performance of kestrel is much better than any versions of IIS +ISAPI or IIS7 + Asp.net modules/handlers ever produced.May be this is partly to do with how simple the middleware (just a function/method it is ).But the request parsing logic got really well and I see that kestrel could hit 5 Million RPS disucssed on this talk[1]( compared to 50K of old asp.net) . Some crazy optimizations and benchmarks are discussed in that video like static byte arrays,memory pools,custom awaiter,bit manipulations for string comparisons etc

Kestrel will be one of the best when it comes to benchmark

[1] https://vimeo.com/172009499

[edit] added video url

philliphaydon · on Nov 22, 2016

Some of the contributions have been amazing. There's some exceptionally smart people in the community that are contributing for the greater good of the community.

I think it's one of the best decisions MS has made, to go open source.

Sacho · on Nov 22, 2016

Here are the latest benchmarks that the ASP.NET Core team is running to test their stack: https://github.com/aspnet/benchmarks

pkhuong · on Nov 22, 2016

TIL: .NET doesn't expose bitscan instructions.

jongalloway2 · on Nov 22, 2016

From how I read this comment, it's not exposed on x86 because it's extremely slow on Atom:

https://github.com/dotnet/coreclr/blob/cb9cbef5cd036bb761720...

omgtehlion · on Nov 22, 2016

When I needed these instructions, I created this https://github.com/omgtehlion/netintrinsics

Guys in OP seem to be more clever, though

benaadams · on Nov 23, 2016

bitscan would change 3 instructions (xor, mul, -1) with one - wouldn't be a huge difference; in this particular scenario.

eveningcoffee · on Nov 22, 2016

Can somebody ELI5 what is the objective and how it has been accomplished?

benaadams · on Nov 22, 2016

Parse a full HTTP request faster. Method, Path, Version, Headers, start of data.

How its been done is more tricky. Lots of iterative development.

Vectorization, bit twiddling, fast-path inlines, analysing the output asm from the compiler, benchmarking, repeat...

eveningcoffee · on Nov 22, 2016

Thanks! This gives a general picture.

Can you also explain the bit twiddling bit in more detail? What operation it makes faster and how it falls into a larger picture?

to3m · on Nov 22, 2016

Roughly speaking, find index of most significant set bit. (So 255 gives you 7, 254 gives you 7, 256 gives you 8, etc.) At a guess, they are comparing things 8/16 bytes at a time, and getting a packed set of byte flags where bytes are set to 0xff to indicate inequality (so, imagining it 4 bytes at a time, comparing 0x12345678 and 0x12125656 would give you 0x00ff00ff) and they want to see where the first different byte is. In this example, the index is 24. (Then: 4-24/8=1. So the first different byte is at offset 1.)

As to how it works: you use the v&-v trick to clear all but the top bit of your comparison result. Because you've only got 1 bit set, this value is a power of two; if you multiply another value by this one, you're getting the equivalent of a left shift by that power (which is relevant - you need to be thinking in shifts rather than multiplies).

Because of the way the predicate results are formed, you only have 5 possible values you're going to get here: 2^31, 2^23, 2^15, 2^7, or 0. The outputs these correspond to:

    31 -> 0
    23 -> 1
    15 -> 2
    7  -> 3
    0  -> (special case - all identical)

So you need a value that when shifted left 31 bits can be easily turned into 0, when shifted left 23 can be turned into 1, and so on.

A suitable value for this is something like 3<<22|2<<14|1<<6. (If I've made a mistake there, which I probably have, hopefully the working will make it obvious how to fix it.) Recalling that multiplying by 2^N gives you a left shift by N:

When multiplied by 2^31, you get 0.

When multiplied by 2^23, you get 1<<29.

When multiplied by 2^15, you get 2<<29|1<<21.

When multiplied by 2^7, you get 3<<29|2<<21|1<<13.

In all cases, you can shift the result right 29 bits (unsigned shift, or signed shift and mask with 3 afterwards) and get the value you're looking for: 0, 1, 2, or 3.

Zero you can check for before doing the multiply; or you could pop a 4 in at the bottom or something so that the result would be 4 in that case. (This is what I figured you'd want originally and that's why the result starts at bit 29. But actually I guess you'd probably check first.)

This extends naturally to wider comparisons. I think they're doing it 64 bits at a time in this case.

Exercise for the reader: (maybe these are too easy?? - however they weren't instantly obvious to me)

- Why is x * 2^N equivalent to x<<N?

- What if you have more than 1 bit set? Like x * (2^N+2^M)?

- Figure out some rules for generating useful constants like these

al2o3cr · on Nov 22, 2016

Highly recommend the book "Hacker's Delight" by Henry S. Warren, Jr for more bit hax0ring like this.

naasking · on Nov 22, 2016

Or check this reference:

https://graphics.stanford.edu/~seander/bithacks.html