Actually, the test randomizes which passage you receive in which color. So if you got the "harder" one in Beeline, you might have the opposite experience. (I've tried it a couple times to see how it works...)
That's probably the point: Beeline makes it easier to read more text on a small screen. Increasing font size and line spacing doesn't work well on a smartphone, much less a smartwatch.
I think the line spacing and line length matter more than the font size. With these high-resolution displays, the fonts can be pretty small, and they're still readable, but you then need to make the column narrower and add a couple pixels to the line height.