1. The queue is 4 bytes long, which fits neatly in a two bit counter. You would have to switch to a three bit counter to store the 5th state, which increases the area and power usage by ~50%
2. The MT signal is explictly needed to stall the execution unit when the prefetch queue is empty. If you replaced the flag register with a 5th state of the queue counter, then you would still need combination logic to generate the MT signal (queue[0] != 1 || queue[1] != 1 || queue[2] != 1)
I'm guessing this two bit counter + MT flag scheme is actually optimal from a transistor count perspective.
The queue length isn’t maintained by the queue counter. For a start, there are two queue counters - the write counter and the read counter, each of which is a two bit counter. Each points to one of the four positions in the queue.
The queue itself though can be in one of five states - its length can be 0, 1, 2, 3, or 4.
The difference between the position of the read and write counters (which is always available through the hardcoded XOR subtraction circuit detailed in the article) is either 0, 1, 2, or 3.
The flag allows you to tell whether the 0 result of that subtraction is a zero length queue or a full queue.
Let’s say even if it is possible to do it, would the resulting saving of real estate and power would be worth the effort?