Well, we've been writing stories for thousands of years, so I'm a bit skeptical that the concept of "unlikely enough to exist" is a thing. More to the specific example, maybe there isn't a story about this specific character fighting a pterodactyl, but surely there are tons of stories of people fighting all kind of animals, and maybe there are some about someone fighting a pterodactyl too.
Sure, but the evaluation explicitly addresses (among other points) how well that specific character is characterized. If an LLM took a pre-existing story about (say) Superman fighting a pterodactyl, and changed Superman to Ignatius J. Reilly, it wouldn't get a high rating.