Wow, that is pretty fast. I played w/ doing CSV parsing taking advantage of SIMD string lookahead a while back ( https://gist.github.com/netshade/aa9e836e843c8e84b97a ) and found it to be quite fast as well, as I had assumed (perhaps wrongly) that the cost of navigating back and forth between the CPU and the GPU would erase any performance gains. I suspect ( it's been a while! ) that the SIMD approach would be faster than GPU, but tbh after working on it for a bit, then comparing it w/ mawk's (http://invisible-island.net/mawk/mawk.html) performance, mawk still beat my approach handily, and did it w/ way more functionality. Which is all to say that mawk is pretty amazing and worth checking out if you're in the market for parsing CSV fast.
This reminds me a lot of Parabix (http://parabix.costar.sfu.ca/wiki/ParabixTransform) -- does anyone know what the status of parabix is today? Is it a thing that is still potentially useful / is it actually used in anything approaching production?
Funny how the project is named "NvParse", because a lot of Nvidia-originated projects/functions have Nv at the beginning of the name, like "nvenc" or "nvlddmkm"
HTTP header: These are small and so it's unlikely you'd overcome the overhead of sending it to the gpu. I could be wrong in some cases though, like servers which have to parse hundreds of HTTP headers at once (I imagine this kind of thing must happen somewhere...)
JSON: Seems unlikely. Nested structure like that often has bad perf on the GPU. You might be able to organize it to parse a lot of the same kinds of JSON files though.
A decent way to think about the GPU is that it's a couple thousand (for good GPUs) processors of power equivalent to a 1st gen pentium (there are many quirks, like they hate branching, but this is an ok way of thinking about the compute power).
Unless you can actually leverage that parallelism -- and there's a lot there -- it's not going to be a net win. It's also fairly expensive to send data to the GPU, nothing that bad, but enough that for anything fairly small you're way better of staying on the CPU