The parent comment is taking about two separate prompts: one with only "he" and one with only "she". Your comment sounds like you're only talking about one prompt (but maybe I misunderstood).
Yes, I get that. I tested GPT 4 with both 'she' and 'he', and in both cases it consistently said that the student was late, across several trials for each gender. Once it said it was an ambiguous statement, but it never seemed to be sexist like older or smaller LLMs.