SDXL(and possibly 2.1) switched to different CLIP implementation that
is geared for sentence-level understanding,
SD1.5 uses old CLIP that works with tag-cloud type prompts.
SDXL actually takes conditioning from either the old or the new CLIP, or both. The malleability of SDXL is not just down to the choice of the new CLIP; the UNet itself is more opinionated.