I don't think you can do this, it will just make things up. Language models don't have this type of reflection. Google would need to indicate this out of band, like on the page itself, in order for you to be confident about what model you're using.
I'm pretty sure they're trying to suggest that LLMs in general are not useful because they can't do this type of thing. It's just the next iteration of goal post moving and should effectively be ignored.
Many artists and such that I've spoken to about AI work have similar comments about these systems because of the disdain for their existence.
The number of times I hear an argument like "well, they can never taste the tartness of a kiwi and feel the heat of the sun while at the beach" gets quite exhausting. For some reason, many people have this weird notion that this is what AGI means - exactly what humans do, and specifically within the same data domains of humans, but they don't consider working solely outside those domains as a possibility for AGI.
Just wait as the multimodal LLM’s roll out! People will be shoving kiwis into their computer and taking their laptops out for a few rays before you know it.
I tried asking it "what is the difference between the palm language model and the bard language model?" and its reply started off "The main difference between the Palm language model and the Bard language model is the size of the dataset they are trained on. Palm is trained on a dataset of 400 billion parameters, while Bard is trained on a dataset of 540 billion parameters." Which to me is even more interesting that what the OP commenter asserted.
It makes up those numbers, I asked about the difference between the small and large PaLM 2 data set size, and it asserted the small model was trained on 540 billion and the large model was trained on 540 trillion. A different draft instead specified 1.4 trillion for the large.
I didn't mean to argue that everything generated is incorrect. But in my experience, the numbers it generates seem closer to random guesses. If you ask it enough times, it sometimes converges on a number, but I don't think that means it's an accurate value. I was able to make it generate a similar table for the different PaLM 2 sizes, and laMDA, and it listed, PaLM 2 Gecko 137 billion, PaLM 2 Otter 540 billion, PaLM 2 Bison 1.8 trillion, PaLM 2 Unicorn 5.4 trillion, LaMDA 137 billion. For Unicorn, it also lists "Still under development."
Edit: Playing around with it more and it listed WuDao 2.0 1.75 Trillion, Chinchilla 175B, Codex 175B, Dalle2 1.3B, GPT4 1.75T, GPT3.5 540B, GPT3 175B, GPT2 1.37B, GPT 1.3B.
But in the previous question it listed GPT4 540 billion and Codex 5.4 trillion among other contradictions.