> without being able to look up the original texts myself
Rule of thumb: if you can't look up the original texts, you can assume they weren't actually in the training data. The training data is, however, likely to include a lot of people quoting those texts, meaning that the model predicts "SOURCE says OPEN QUOTATION MARK" and then tries to autocomplete it. If you can verify it, you might not need to; but if you can't verify it, it's certainly wrong.
"Rule of thumb: if you can't look up the original texts, you can assume they weren't actually in the training data. "
That's not reliable. I've found them on the Internet in various forms (eg studybible.info). Google Books also has scanned copies of many, ancient writings. There's probably obscure sites people would miss. If searching for them, the search algorithms might avoid them to instead prioritize newer, click-bait content.
Telling what wasn't in the training data for sure should be considered impossible right now. If it matters, we need to use models with open, legal-to-share, training data. If that's impossible, one might at least use a model with training data accessible to them (eg free + licensed).
Rule of thumb: if you can't look up the original texts, you can assume they weren't actually in the training data. The training data is, however, likely to include a lot of people quoting those texts, meaning that the model predicts "SOURCE says OPEN QUOTATION MARK" and then tries to autocomplete it. If you can verify it, you might not need to; but if you can't verify it, it's certainly wrong.