Hacker News new | past | comments | ask | show | jobs | submit login

The trick: very briefly:

Use two 16-byte values in SIMD registers as 16-entry lookup tables with "8 buckets" (really just bits) each.

Lookup those two separate tables with the high nibble (bits 4..7) and low nibble (bits 0..3) of your input (you can do this in parallel either 16/32/64 ways depending on whether you are using SSSE3/AVX2/AVX512).

AND together the table results (or OR them, depending on the sense of the table).

PMOVMSKB (or VPMOVMSKB) to extract the results back into GPR-land.

So if you want to match, say 0x12 and 0xab at once, assign 0x12 to 'bucket #0' and 0xab to 'bucket #1'.

High nibble table gives you 0x1 -> 0x1 (that is, 0x10 in the high nibble gives you bucket #0) and 0xa -> 0x2

Low nibble table gives you 0x2 -> 0x1 and 0xb -> 0x2.

Tables are otherwise zero.

You can now look up lots of different useful character classes (not all! That takes three shuffle tables) in about 6 instructions. Sadly the instructions are pretty much a long linear chain of dependencies so that latency is a bit worse than throughput, but we've built some fun stuff around them.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: