Would it be that expensive to put ~4K of RAM in the keyboard’s microcontroller to act as an event queue? Write keydowns/keyups into the buffer; then push (or respond to a poll with) the complete contents of the buffer as a single USB packet; and flush the buffer on ACK. (I presume USB does ACKs?) 4K would be more than enough to buffer all possible key events a pair of human hands could create in a reasonable polling interval (~16ms, i.e. 1 frame for a 60Hz game.)
Add a clock chip to the HID device as well, and you could even get finer event-reporting granularity than the OS cares to poll for (i.e. the buffer delivered at (Polling Interval x N)ms, would contain events timestamped for (Interval x (N-1)) ms, (Interval x (N-1) + 1)ms, etc.) The timestamps could just be relative to the beginning of the interval, so they wouldn’t need to take up that many bits at all.
> then push (or respond to a poll with) the complete contents of the buffer as a single USB packet; and flush the buffer on ACK
Two issues. Unless you use USB 2.0 high speed, USB "packets" (transactions) are small[1]. For low speed they can have 8 bytes of data, full speed 64 bytes. USB 2.0 HS is a lot more costly of course.
Secondly, keycodes that are sent in the same report have no defined order[2]. Thus if two keys were not present in previous report and are present in current report, the order in which those two keys are pressed are indeterminate to the OS.
Add a clock chip to the HID device as well, and you could even get finer event-reporting granularity than the OS cares to poll for (i.e. the buffer delivered at (Polling Interval x N)ms, would contain events timestamped for (Interval x (N-1)) ms, (Interval x (N-1) + 1)ms, etc.) The timestamps could just be relative to the beginning of the interval, so they wouldn’t need to take up that many bits at all.