It's hard to give a short answer to that question :)
- Yes, if you know your input is short, you can use a smaller state. The limit is roughly a BLAKE2s state plus (32 bytes times the log_2 of the number of KiB you need to hash). Section 5.4 of https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blak... goes into this.
- But it's hard to take advantage of this space optimization, because no libraries implement it in practice.
- But the reason libraries don't implement it is that almost no one needs it. The max state size is just under 2 KiB, which is small enough even for https://github.com/oconnor663/blake3-6502.
- But it would be super easy to implement if we just put the "CV stack" on the heap instead of allocating the whole thing as an array up front.
- But the platforms that care about this don't have a heap.
@caesarb mentioned really tiny microcontrollers, even tinier than the 6502 maybe. The other place I'd expect to see this optimization is in a full hardware implementation, but those are rare. Most hardware accelerators for hash functions provide the block operation, and they leave it to software to deal with this sort of bookkeeping.
I assume most microcontrollers are unlikely to be hashing things much bigger than RAM.