In my experience you get a lot of density moving to a variable-width font, which is quite easy to write for a Z80 system. For example I've designed a couple with the horizontal size in the first byte of the bitmap:
Looking back at it, my Z80 for this isn't that good, but it was still fast enough to redraw a whole line of text in 1 or 2 frames, I'm sure others can do better.
Which leads to the question: what is the smallest Z80 assembly function which takes an ASCII code as an input and returns one of these characters some way? 3x4 is 12 bits so with a little waste one can fit it into a 16 bit register pair. You could thus encode it into a 96*2=192 byte lookup table but isn't there some procedural generation to shrink that?
I believe a table and a lookup function would be smaller than a function for generating bitmaps: just for a single pixel I've got this expression [1].
[1]: not(c0) and not(c2) and c3 and not(c4) and c5 and not(c6) or not(c0) and c2 and c3 and not(c4) and c5 and c6 or c0 and c2 and not(c3) and c4 and not(c5) and c6 or c0 and not(c1) and c2 and c3 and c4 and c6 or c0 and c1 and not(c2) and c3 and c4 and c6 or not(c1) and not(c3) and not(c4) and c5 and c6 or not(c2) and c3 and c4 and not(c5) and not(c6) or not(c0) and c1 and not(c3) and c5 and not(c6) or not(c0) and c1 and not(c2) and not(c3) and c4 or not(c0) and c1 and c3 and not(c4) and c5 or c1 and c3 and not(c4) and c5 and c6 or c1 and c2 and c4 and not(c5) and not(c6) or c0 and not(c1) and not(c4) and c5 and not(c6) or c0 and not(c2) and not(c4) and c5 and c6 or c0 and c1 and not(c3) and c5 and c6 or not(c0) and not(c1) and c4 and not(c5)
Nah, I am thinking like half a table and half some tricks to generate ... back in the day people were doing crazy crafty tricks to squeeze the most out of the rather limited memory of the machines.
(starts thinking in Z80...)