If you have special performance requirements you should do your own queue in userland where you can not only control how big it is but also completely avoid context switches and system calls.
Also see Google's fast userspace context switching, which would basically let you implement pipes the way they were originally designed and implemented in Unix, as a kind of coroutine that implicitly passed control flow to the other process when you filled or drained a buffer.
In low latency engineering it's exceedingly common just to pin a core to an IPC task (send or receive) and let it spin on your queue at 100% CPU utilisation.
The kernel CPU scheduler typically adds too much latency (5-10us) to make things like conditional variable synchronisation useful
Some applications with softer low-latency requirements tend to spin for a certain amount of time before going to sleep and await data arrival. Also, if it is hit with a constant stream of data to process, it wouldn't even get a chance to block.
At the other extreme end of RT computing, you usually don't have an OS that blocks using privileged instructions in the first place. You just...schedule things.
There are lots of different mechanisms to use for this depending on your need.
You can do a futex but that still requires the producer to issue system calls.
You could also just have the consumer poll at whatever frequency you please (either continuously spin, or go to sleep for a time).