>others really expect data and full transparency to call research "reproducible"
I truly don't understand why that is. Creating the dataset is part of the experiment. It is reproducible because if you follow all the steps in the experiment you will get the same outcome (or, pedantically, a similar outcome, which is true of any experiment unless literally every aspect of the experiment is the same). If you create a dataset of 36M video clips with high-quality manually labelled captions, 170M (video, ASR transcript) pairs from 44.6M YouTube videos and 71.5M (clip, machine-generated caption) pairs from 36.7M
YouTube videos, you will get a similar outcome. If you don't, the experiment is not reproducible.
In fact, fully replicating the dataset is more conducive to proving the effectiveness of a given method to produce a specific outcome, as noted in your first link:
>The standard of reproducibility calls for the data and the computer code used to analyze the data be made available to others. This standard falls short of full replication because the same data are analyzed again, rather than analyzing independently collected data.
Ultimately, the question trying to be answered by the authors is not "does x code, when applied to y dataset, create z outcome?" but rather "how is z outcome created?" To prove that following the sequence of steps outlined in the article is sufficient to produce the claimed outcome it is required to recreate the dataset in any event. Releasing the dataset would help in that regard only inasmuch as it would help other people create their own datasets by way of example. If you think about non-coded experiments, it is clear to see that this is the case. Nobody would ask for someone else's reactants and products that they got at the end of an experiment. They would just ask for the procedure to generate the product. And in fact you will find that often times these procedures are not as exact as you might expect. For example it may say something like "this reactant was heated to x degrees" without mention of how fast to heat it up or how exactly that was done. And it isn't necessary because the goal is not "reproducible" as in a reproducible build artifact where every bit is the same but "reproducible" as in "the experiment once replicated will have the same outcome."
I will try to come back to this paper in a year and see how many citations it has and to what extent it was useful to others. Ie. will it be written something like "we follow the architecture outlined in [1]" or will it only be mentioned as previous work? If it is the former it is hard to say that it isn't considered publishing research if others are using it for their own research.
I don't really care what the definition of "reproducible" is. I have always understood it to mean in the context of experiments as “the extent to which consistent results are obtained when produced repeatedly.” When people talk about an experiment not being reproducible or of the "reproducibility crisis" they are using my definition.
> Ultimately, the question trying to be answered by the authors is not "does x code, when applied to y dataset, create z outcome?" but rather "how is z outcome created?"
Maybe, but the question asked by the people in the field, and even general public, is different: we're asking, "is this even real, or are they exaggerating or plain bullshitting us?". It's a reasonable question to ask when the results seem groundbreaking, and there's lots of money or reputation at stake.
Consider if they came out and said, we've invented machine converting electricity to mechanical motion with near-zero energy loss at room temperature. The design is complex, and uses exotic materials no one outside few megacorps could reasonably afford to get, and we just plan to use it to sell mechanical work as a service, so you're never gonna see how the thing is built (unless you're really important and sign an NDA) - but trust us, it works. Wouldn't some questions about reproducibility be warranted in such a case?
I truly don't understand why that is. Creating the dataset is part of the experiment. It is reproducible because if you follow all the steps in the experiment you will get the same outcome (or, pedantically, a similar outcome, which is true of any experiment unless literally every aspect of the experiment is the same). If you create a dataset of 36M video clips with high-quality manually labelled captions, 170M (video, ASR transcript) pairs from 44.6M YouTube videos and 71.5M (clip, machine-generated caption) pairs from 36.7M YouTube videos, you will get a similar outcome. If you don't, the experiment is not reproducible.
In fact, fully replicating the dataset is more conducive to proving the effectiveness of a given method to produce a specific outcome, as noted in your first link:
>The standard of reproducibility calls for the data and the computer code used to analyze the data be made available to others. This standard falls short of full replication because the same data are analyzed again, rather than analyzing independently collected data.
Ultimately, the question trying to be answered by the authors is not "does x code, when applied to y dataset, create z outcome?" but rather "how is z outcome created?" To prove that following the sequence of steps outlined in the article is sufficient to produce the claimed outcome it is required to recreate the dataset in any event. Releasing the dataset would help in that regard only inasmuch as it would help other people create their own datasets by way of example. If you think about non-coded experiments, it is clear to see that this is the case. Nobody would ask for someone else's reactants and products that they got at the end of an experiment. They would just ask for the procedure to generate the product. And in fact you will find that often times these procedures are not as exact as you might expect. For example it may say something like "this reactant was heated to x degrees" without mention of how fast to heat it up or how exactly that was done. And it isn't necessary because the goal is not "reproducible" as in a reproducible build artifact where every bit is the same but "reproducible" as in "the experiment once replicated will have the same outcome."
I will try to come back to this paper in a year and see how many citations it has and to what extent it was useful to others. Ie. will it be written something like "we follow the architecture outlined in [1]" or will it only be mentioned as previous work? If it is the former it is hard to say that it isn't considered publishing research if others are using it for their own research.
I don't really care what the definition of "reproducible" is. I have always understood it to mean in the context of experiments as “the extent to which consistent results are obtained when produced repeatedly.” When people talk about an experiment not being reproducible or of the "reproducibility crisis" they are using my definition.