I just tested it and it seems I can manage fusion and see the effect both if I switch them and if I don't. Now I'm confused. is there something special about these images in particular or has my perception been all wrong?
When I tested it out, I got the proper sense of depth perception when I looked at it correctly (i.e., looking through the plane of the monitor, with left image for left eye) and got no sense of depth perception when looking cross-eyed (i.e., left image for right eye). It's temping to think you're seeing depth as soon as the two images "snap together" when you're cross-eyed, but when I actually consciously checked I noticed that the images in fact looked flat (except for the nose).
As far as why it looked flat rather than inside out, I can only guess that that depth info conflicted with the other depth information you get from vision (parallax, known relative object sizes, perspective lines) and my brain ignored it.
EDIT: Actually, after trying the second view (the one with hands, not the roller coaster) cross-eyed, I saw the inside-out depth. It's most noticeable looking out the window.