I think what the parent means is that a motivated experiment designer can (even accidentally) create an experiment that has a high false-positive rate, thus providing very little Bayesian evidence given a positive result. Ideally, you'd have the experiment designed by someone who actually wanted to falsify the hypothesis (or at least a neutral party), such that the non-null conclusion, if arrived at, would be really strong Bayesian evidence.
This is subtle but important distinction. It is absolutely possible to do a confirming experiment that can give misleading results. There is a nice explanation in the wikipedia article under "Confirmation Bias".
A striking example is the (2,4,6) test. From wikipedia:
"Wason's research on hypothesis-testing
The term "confirmation bias" was coined by English psychologist Peter Wason.[66] For an experiment published in 1960, he challenged participants to identify a rule applying to triples of numbers. At the outset, they were told that (2,4,6) fits the rule. Participants could generate their own triples and the experimenter told them whether or not each triple conformed to the rule.[67][68]
While the actual rule was simply "any ascending sequence", the participants had a great deal of difficulty in finding it, often announcing rules that were far more specific, such as "the middle number is the average of the first and last".[67] The participants seemed to test only positive examples—triples that obeyed their hypothesized rule. For example, if they thought the rule was, "Each number is two greater than its predecessor", they would offer a triple that fit this rule, such as (11,13,15) rather than a triple that violates it, such as (11,12,19).[69]
Wason accepted falsificationism, according to which a scientific test of a hypothesis is a serious attempt to falsify it. He interpreted his results as showing a preference for confirmation over falsification, hence the term "confirmation bias".[Note 4][70] Wason also used confirmation bias to explain the results of his selection task experiment.[71] In this task, participants are given partial information about a set of objects, and have to specify what further information they would need to tell whether or not a conditional rule ("If A, then B") applies. It has been found repeatedly that people perform badly on various forms of this test, in most cases ignoring information that could potentially refute the rule."
Yes their rules might be more specific than the general rule, but that is not a problem. Their rules were a correct subset of the more general rule (if what you are describing is accurate). Now if they are claiming a broad hypothesis and only providing a set of data that asserts a subset of the hypothesis, that is a problem. They are being misleading one way or another. If the researcher is presenting a hypothesis and misses out on data (for whatever reason), then somebody else will (ideally) point this out. Nonetheless, just acting like this misrepresentation can happen therefore don't trust some particular study is little more than baseless criticism.
They tested the hypothesis that heroin is addictive by itself, the one supported by the self-administration experiment, and got pretty unexpected results. As far as the scientific method is concerned, their experiment looks ok to me.
It's interesting to note that the criticism of the Rat Park experiment uses exactly the same reasoning Rat Park designers used against self-administering experiment, namely that one of seemingly innocuous parts of experimental setup (isolation and genetic variance respectively) was causing a major results bias.
I guess that's one way to interpret the phrase "designed to confirm," but it's also a fairly natural way to describe a legitimate test. I might, for example, say that I "designed an interview process to confirm that candidates are qualified," and of course it's clear that the process will either confirm or deny.