iirc, it's complicated. Some instructions don't reduce the frequency; some reduce it a little; some reduce it a lot.
I'm not sure AVX2 is as ubiquitous as the README says: "We assume AVX2 support which is available in all recent mainstream x86 processors produced by AMD and Intel."
I guess "mainstream" is somewhat subjective, but some recent Chromebooks have Celeron processors with no AVX2:
It doesn't seem that laughable to me to want faster JSON parsing on a Chromebook, given how heavily JSON is used to communicate between webservers and client-side Javascript.
"Faster" meaning faster than Chromebooks do now; 2.2 GB/s may simply be unachievable hardware-wise with these cheap processors. They're kinda slow, so any speed increase would be welcome.
AVX2 also incurs some pretty large penalties for switching between SSE and AVX2. Depending on the amount of time taken in the library between calls, it could be problematic.
This looks mostly applicable to server scenarios where the runtime environment is highly controlled.
There is no real penalty for switching between SSE and AVX2, unless you do it wrong. What are you referring to specifically?
Are you talking about state transition penalties that can occur if you forget a vzeroupper? That's the only thing I'm aware of which kind of matches that.